[ 
https://issues.apache.org/jira/browse/NIFI-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607896#comment-16607896
 ] 

ASF GitHub Bot commented on NIFI-5147:
--------------------------------------

Github user alopresto commented on the issue:

    https://github.com/apache/nifi/pull/2980
  
    Thanks for discovering this @thenatog . This is an excellent catch. 
    
    I've added behavior to catch this, better documentation, and unit tests. 
However, I added them on the branch that includes [PR 
2983](https://github.com/apache/nifi/pull/2983). Let's mark this PR as closed 
and just review the other one, as it is more complete and addresses this issue. 
    
    ```
    2018-09-07 21:21:19,784 WARN [Timer-Driven Process Thread-6] 
o.a.n.security.util.crypto.HashService The charset provided was UTF-16, but 
Java will insert a Big Endian BOM in the decoded message before hashing, so 
switching to UTF-16BE
    2018-09-07 21:21:19,797 INFO [Timer-Driven Process Thread-9] 
o.a.n.processors.standard.LogAttribute 
LogAttribute[id=b15f3209-344d-10a6-4a7b-454530bb72fc] logging for flow file 
StandardFlowFileRecord[uuid=a4a223fb-aa11-43b9-93a3-d7675c44593c,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1536378604366-1, container=default, 
section=1], offset=56, length=4],offset=0,name=33467912436349,size=4]
    --------------------[SUCCESS] --------------------
    Standard FlowFile Attributes
    Key: 'entryDate'
        Value: 'Fri Sep 07 21:21:19 PDT 2018'
    Key: 'lineageStartDate'
        Value: 'Fri Sep 07 21:21:19 PDT 2018'
    Key: 'fileSize'
        Value: '4'
    FlowFile Attribute Map Content
    Key: 'filename'
        Value: '33467912436349'
    Key: 'path'
        Value: './'
    Key: 'test_attribute'
        Value: 'hehe'
    Key: 'test_attribute_md5_utf16le'
        Value: '2db0ecc27f7abd29ba95412feb3b5e07'
    Key: 'uuid'
        Value: 'a4a223fb-aa11-43b9-93a3-d7675c44593c'
    --------------------[SUCCESS] --------------------
    hehe
    2018-09-07 21:21:19,799 INFO [Timer-Driven Process Thread-9] 
o.a.n.processors.standard.LogAttribute 
LogAttribute[id=b15f3209-344d-10a6-4a7b-454530bb72fc] logging for flow file 
StandardFlowFileRecord[uuid=b7459e40-500b-488d-a0dc-3e09ebc6b86e,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1536378604366-1, container=default, 
section=1], offset=56, length=4],offset=0,name=33467912436349,size=4]
    --------------------[SUCCESS] --------------------
    Standard FlowFile Attributes
    Key: 'entryDate'
        Value: 'Fri Sep 07 21:21:19 PDT 2018'
    Key: 'lineageStartDate'
        Value: 'Fri Sep 07 21:21:19 PDT 2018'
    Key: 'fileSize'
        Value: '4'
    FlowFile Attribute Map Content
    Key: 'filename'
        Value: '33467912436349'
    Key: 'path'
        Value: './'
    Key: 'test_attribute'
        Value: 'hehe'
    Key: 'test_attribute_md5_utf16'
        Value: 'b0ed26b524e0b0606551d78e42b5b7bc'
    Key: 'uuid'
        Value: 'b7459e40-500b-488d-a0dc-3e09ebc6b86e'
    --------------------[SUCCESS] --------------------
    hehe
    2018-09-07 21:21:19,801 INFO [Timer-Driven Process Thread-9] 
o.a.n.processors.standard.LogAttribute 
LogAttribute[id=b15f3209-344d-10a6-4a7b-454530bb72fc] logging for flow file 
StandardFlowFileRecord[uuid=25c5d1b1-faa4-418d-911c-5c0cea399b83,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1536378604366-1, container=default, 
section=1], offset=56, length=4],offset=0,name=33467912436349,size=4]
    --------------------[SUCCESS] --------------------
    Standard FlowFile Attributes
    Key: 'entryDate'
        Value: 'Fri Sep 07 21:21:19 PDT 2018'
    Key: 'lineageStartDate'
        Value: 'Fri Sep 07 21:21:19 PDT 2018'
    Key: 'fileSize'
        Value: '4'
    FlowFile Attribute Map Content
    Key: 'filename'
        Value: '33467912436349'
    Key: 'path'
        Value: './'
    Key: 'test_attribute'
        Value: 'hehe'
    Key: 'test_attribute_md5_utf16be'
        Value: 'b0ed26b524e0b0606551d78e42b5b7bc'
    Key: 'uuid'
        Value: '25c5d1b1-faa4-418d-911c-5c0cea399b83'
    --------------------[SUCCESS] --------------------
    hehe
    ```


> Improve HashAttribute processor
> -------------------------------
>
>                 Key: NIFI-5147
>                 URL: https://issues.apache.org/jira/browse/NIFI-5147
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 1.6.0
>            Reporter: Andy LoPresto
>            Assignee: Andy LoPresto
>            Priority: Major
>              Labels: hash, security
>             Fix For: 1.8.0
>
>
> The {{HashAttribute}} processor currently has surprising behavior. Barring 
> familiarity with the processor, a user would expect {{HashAttribute}} to 
> generate a hash value over one or more attributes. Instead, the processor as 
> it is implemented "groups" incoming flowfiles into groups based on regular 
> expressions which match attribute values, and then generates a 
> (non-configurable) MD5 hash over the concatenation of the matching attribute 
> keys and values. 
> In addition:
> * the processor throws an error and routes to failure any incoming flowfile 
> which does not have all attributes specified in the processor
> * the use of MD5 is vastly deprecated
> * no other hash algorithms are available
> I am unaware of community use of this processor, but I do not want to break 
> backward compatibility. I propose the following steps:
> * Implement a new {{CalculateAttributeHash}} processor (awkward name, but 
> this processor already has the desired name)
> ** This processor will perform the "standard" use case -- identify an 
> attribute, calculate the specified hash over the value, and write it to an 
> output attribute
> ** This processor will have a required property descriptor allowing a 
> dropdown menu of valid hash algorithms
> ** This processor will accept arbitrary dynamic properties identifying the 
> attributes to be hashed as a key, and the resulting attribute name as a value
> ** Example: I want to generate a SHA-512 hash on the attribute {{username}}, 
> and a flowfile enters the processor with {{username}} value {{alopresto}}. I 
> configure {{algorithm}} with {{SHA-512}} and add a dynamic property 
> {{username}} -- {{username_SHA512}}. The resulting flowfile will have 
> attribute {{username_SHA512}} with value 
> {{739b4f6722fb5de20125751c7a1a358b2a7eb8f07e530e4bf18561fbff93234908aa9d2577770c876bca9ede5ba784d5ce6081dbbdfe5ddd446678f223b8d632}}
> * Improve the documentation of this processor to explain the goal/expected 
> use case (?)
> * Link in processor documentation to new processor for standard use cases
> * Remove the error alert when an incoming flowfile does not contain all 
> expected attributes. I propose changing the severity to INFO and still 
> routing to failure



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to