Github user patricker commented on the issue:
https://github.com/apache/nifi/pull/883
Matt, 'ascii' doesn't quite mean what it apperas :smile: .
If you convert an array of random bytes to UTF8 and then back to bytes you
will find you have considerably more bytes then you started with. This is
because in order to represent certain bytes as actual characters UTF8 has to
insert extra marker bytes; then when you convert back to bytes these extra
bytes come back too. I've see the conversion of binary data -> UTF8 -> binary
data grow by 40%.
The key thing to remember for this processor is that the data coming in
only looks like text because it is contained in an attribute, it's not actually
text, it's raw bytes. In this processor 'ascii' means that each character is
represents a single byte; calling string.getBytes("ASCII") is just a handy
shortcut in Java to get this functionality.
I can rename it to 'onecharperbyte' if that makes more sense.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---