[ 
https://issues.apache.org/jira/browse/ORC-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320868#comment-16320868
 ] 

ASF GitHub Bot commented on ORC-250:
------------------------------------

GitHub user moresandeep opened a pull request:

    https://github.com/apache/orc/pull/208

    ORC-250 - Create sha256 mask

    Masking strategy that masks String, Varchar, Char and Binary types
    as SHA 256 hash.
    
    **For String type:**
    All string type of any length will be converted to 64 length SHA256 hash.
    
    **For Varchar type:**
    For Varchar type, max-length property will be honored i.e.
    if the length is less than max-length then the SHA256 hash will be truncated
    to max-length. If max-length is greater than 64 then the output is the 
sha256
    length, which is 64.
    
    **For Char type:**
    For Char type, the length of mask will always be equal to specified 
max-length.
    If the given length (max-length) is less than SHA256 hash length (64)
    the mask will be truncated.
    If the given length (max-length) is greater than SHA256 hash length (64)
    then the mask will be padded by blank spaces.
    
    **For Binary type:**
    All Binary type of any length will be converted to 64 length SHA256 hash.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/moresandeep/orc ORC-250_SHA-256_Mask

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/orc/pull/208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #208
    
----
commit 86f3d5cfe7b0463faa2ef96db8fdb23c26430166
Author: Sandeep More <more@...>
Date:   2018-01-10T19:01:07Z

    ORC-250 - Create sha256 mask

----


> Create sha256 mask
> ------------------
>
>                 Key: ORC-250
>                 URL: https://issues.apache.org/jira/browse/ORC-250
>             Project: ORC
>          Issue Type: Sub-task
>            Reporter: Owen O'Malley
>            Assignee: Sandeep More
>
> We should also create a DataMask that does sha256 of the data:
> * strings should be sha256 of the utf-8 representation of the string 
> represented as hex digits
> * binary should be sha256 of the binary in binary
> * integer types should be sha256 of the little endian representation of the 
> number in little endian cut down to the right size (1,2,4, or 8 bytes)
> * floating point types should be sha256 of the binary representation as 
> either 4 (float) or 8 (double) bytes
> * timestamps and dates should convert like integers
> * decimal should convert like 128 bit numbers with the result cut to the 
> matching number of bytes
> It isn't clear what we should do in the very small data types:
> * boolean
> * byte
> * short
> I'd lean toward either making them null or passing them through unchanged.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to