[
https://issues.apache.org/jira/browse/NIFI-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607896#comment-16607896
]
ASF GitHub Bot commented on NIFI-5147:
--------------------------------------
Github user alopresto commented on the issue:
https://github.com/apache/nifi/pull/2980
Thanks for discovering this @thenatog . This is an excellent catch.
I've added behavior to catch this, better documentation, and unit tests.
However, I added them on the branch that includes [PR
2983](https://github.com/apache/nifi/pull/2983). Let's mark this PR as closed
and just review the other one, as it is more complete and addresses this issue.
```
2018-09-07 21:21:19,784 WARN [Timer-Driven Process Thread-6]
o.a.n.security.util.crypto.HashService The charset provided was UTF-16, but
Java will insert a Big Endian BOM in the decoded message before hashing, so
switching to UTF-16BE
2018-09-07 21:21:19,797 INFO [Timer-Driven Process Thread-9]
o.a.n.processors.standard.LogAttribute
LogAttribute[id=b15f3209-344d-10a6-4a7b-454530bb72fc] logging for flow file
StandardFlowFileRecord[uuid=a4a223fb-aa11-43b9-93a3-d7675c44593c,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1536378604366-1, container=default,
section=1], offset=56, length=4],offset=0,name=33467912436349,size=4]
--------------------[SUCCESS] --------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'lineageStartDate'
Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'fileSize'
Value: '4'
FlowFile Attribute Map Content
Key: 'filename'
Value: '33467912436349'
Key: 'path'
Value: './'
Key: 'test_attribute'
Value: 'hehe'
Key: 'test_attribute_md5_utf16le'
Value: '2db0ecc27f7abd29ba95412feb3b5e07'
Key: 'uuid'
Value: 'a4a223fb-aa11-43b9-93a3-d7675c44593c'
--------------------[SUCCESS] --------------------
hehe
2018-09-07 21:21:19,799 INFO [Timer-Driven Process Thread-9]
o.a.n.processors.standard.LogAttribute
LogAttribute[id=b15f3209-344d-10a6-4a7b-454530bb72fc] logging for flow file
StandardFlowFileRecord[uuid=b7459e40-500b-488d-a0dc-3e09ebc6b86e,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1536378604366-1, container=default,
section=1], offset=56, length=4],offset=0,name=33467912436349,size=4]
--------------------[SUCCESS] --------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'lineageStartDate'
Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'fileSize'
Value: '4'
FlowFile Attribute Map Content
Key: 'filename'
Value: '33467912436349'
Key: 'path'
Value: './'
Key: 'test_attribute'
Value: 'hehe'
Key: 'test_attribute_md5_utf16'
Value: 'b0ed26b524e0b0606551d78e42b5b7bc'
Key: 'uuid'
Value: 'b7459e40-500b-488d-a0dc-3e09ebc6b86e'
--------------------[SUCCESS] --------------------
hehe
2018-09-07 21:21:19,801 INFO [Timer-Driven Process Thread-9]
o.a.n.processors.standard.LogAttribute
LogAttribute[id=b15f3209-344d-10a6-4a7b-454530bb72fc] logging for flow file
StandardFlowFileRecord[uuid=25c5d1b1-faa4-418d-911c-5c0cea399b83,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1536378604366-1, container=default,
section=1], offset=56, length=4],offset=0,name=33467912436349,size=4]
--------------------[SUCCESS] --------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'lineageStartDate'
Value: 'Fri Sep 07 21:21:19 PDT 2018'
Key: 'fileSize'
Value: '4'
FlowFile Attribute Map Content
Key: 'filename'
Value: '33467912436349'
Key: 'path'
Value: './'
Key: 'test_attribute'
Value: 'hehe'
Key: 'test_attribute_md5_utf16be'
Value: 'b0ed26b524e0b0606551d78e42b5b7bc'
Key: 'uuid'
Value: '25c5d1b1-faa4-418d-911c-5c0cea399b83'
--------------------[SUCCESS] --------------------
hehe
```
> Improve HashAttribute processor
> -------------------------------
>
> Key: NIFI-5147
> URL: https://issues.apache.org/jira/browse/NIFI-5147
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: 1.6.0
> Reporter: Andy LoPresto
> Assignee: Andy LoPresto
> Priority: Major
> Labels: hash, security
> Fix For: 1.8.0
>
>
> The {{HashAttribute}} processor currently has surprising behavior. Barring
> familiarity with the processor, a user would expect {{HashAttribute}} to
> generate a hash value over one or more attributes. Instead, the processor as
> it is implemented "groups" incoming flowfiles into groups based on regular
> expressions which match attribute values, and then generates a
> (non-configurable) MD5 hash over the concatenation of the matching attribute
> keys and values.
> In addition:
> * the processor throws an error and routes to failure any incoming flowfile
> which does not have all attributes specified in the processor
> * the use of MD5 is vastly deprecated
> * no other hash algorithms are available
> I am unaware of community use of this processor, but I do not want to break
> backward compatibility. I propose the following steps:
> * Implement a new {{CalculateAttributeHash}} processor (awkward name, but
> this processor already has the desired name)
> ** This processor will perform the "standard" use case -- identify an
> attribute, calculate the specified hash over the value, and write it to an
> output attribute
> ** This processor will have a required property descriptor allowing a
> dropdown menu of valid hash algorithms
> ** This processor will accept arbitrary dynamic properties identifying the
> attributes to be hashed as a key, and the resulting attribute name as a value
> ** Example: I want to generate a SHA-512 hash on the attribute {{username}},
> and a flowfile enters the processor with {{username}} value {{alopresto}}. I
> configure {{algorithm}} with {{SHA-512}} and add a dynamic property
> {{username}} -- {{username_SHA512}}. The resulting flowfile will have
> attribute {{username_SHA512}} with value
> {{739b4f6722fb5de20125751c7a1a358b2a7eb8f07e530e4bf18561fbff93234908aa9d2577770c876bca9ede5ba784d5ce6081dbbdfe5ddd446678f223b8d632}}
> * Improve the documentation of this processor to explain the goal/expected
> use case (?)
> * Link in processor documentation to new processor for standard use cases
> * Remove the error alert when an incoming flowfile does not contain all
> expected attributes. I propose changing the severity to INFO and still
> routing to failure
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)