[
https://issues.apache.org/jira/browse/ORC-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320868#comment-16320868
]
ASF GitHub Bot commented on ORC-250:
------------------------------------
GitHub user moresandeep opened a pull request:
https://github.com/apache/orc/pull/208
ORC-250 - Create sha256 mask
Masking strategy that masks String, Varchar, Char and Binary types
as SHA 256 hash.
**For String type:**
All string type of any length will be converted to 64 length SHA256 hash.
**For Varchar type:**
For Varchar type, max-length property will be honored i.e.
if the length is less than max-length then the SHA256 hash will be truncated
to max-length. If max-length is greater than 64 then the output is the
sha256
length, which is 64.
**For Char type:**
For Char type, the length of mask will always be equal to specified
max-length.
If the given length (max-length) is less than SHA256 hash length (64)
the mask will be truncated.
If the given length (max-length) is greater than SHA256 hash length (64)
then the mask will be padded by blank spaces.
**For Binary type:**
All Binary type of any length will be converted to 64 length SHA256 hash.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/moresandeep/orc ORC-250_SHA-256_Mask
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/orc/pull/208.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #208
----
commit 86f3d5cfe7b0463faa2ef96db8fdb23c26430166
Author: Sandeep More <more@...>
Date: 2018-01-10T19:01:07Z
ORC-250 - Create sha256 mask
----
> Create sha256 mask
> ------------------
>
> Key: ORC-250
> URL: https://issues.apache.org/jira/browse/ORC-250
> Project: ORC
> Issue Type: Sub-task
> Reporter: Owen O'Malley
> Assignee: Sandeep More
>
> We should also create a DataMask that does sha256 of the data:
> * strings should be sha256 of the utf-8 representation of the string
> represented as hex digits
> * binary should be sha256 of the binary in binary
> * integer types should be sha256 of the little endian representation of the
> number in little endian cut down to the right size (1,2,4, or 8 bytes)
> * floating point types should be sha256 of the binary representation as
> either 4 (float) or 8 (double) bytes
> * timestamps and dates should convert like integers
> * decimal should convert like 128 bit numbers with the result cut to the
> matching number of bytes
> It isn't clear what we should do in the very small data types:
> * boolean
> * byte
> * short
> I'd lean toward either making them null or passing them through unchanged.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)