[
https://issues.apache.org/jira/browse/NIFI-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435116#comment-15435116
]
ASF GitHub Bot commented on NIFI-2591:
--------------------------------------
Github user patricker commented on the issue:
https://github.com/apache/nifi/pull/883
Matt, 'ascii' doesn't quite mean what it apperas :smile: .
If you convert an array of random bytes to UTF8 and then back to bytes you
will find you have considerably more bytes then you started with. This is
because in order to represent certain bytes as actual characters UTF8 has to
insert extra marker bytes; then when you convert back to bytes these extra
bytes come back too. I've see the conversion of binary data -> UTF8 -> binary
data grow by 40%.
The key thing to remember for this processor is that the data coming in
only looks like text because it is contained in an attribute, it's not actually
text, it's raw bytes. In this processor 'ascii' means that each character is
represents a single byte; calling string.getBytes("ASCII") is just a handy
shortcut in Java to get this functionality.
I can rename it to 'onecharperbyte' if that makes more sense.
> PutSQL has no handling for Binary data types
> --------------------------------------------
>
> Key: NIFI-2591
> URL: https://issues.apache.org/jira/browse/NIFI-2591
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: Peter Wicks
>
> PutSQL does not call out binary types for any special treatment, so they end
> up being routed through stmt.setObject.
> The problem is that upstream processors have formatted the binary data as a
> string and the JDBC driver doesn't know what to do with a string going into a
> binary field.
> Investigation into the AvroToJSON processor shows that if users are trying to
> load data exported from a source system as Avro Binary that Avro encodes the
> binary data into ASCII (One byte per character).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)