[ 
https://issues.apache.org/jira/browse/NIFI-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852125#comment-15852125
 ] 

ASF subversion and git services commented on NIFI-3055:
-------------------------------------------------------

Commit 376af83a3dcfa5361be0859b54d91d30c685494e in nifi's branch 
refs/heads/master from [~jskora]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=376af83 ]

NIFI-3055 StandardRecordWriter Can Throw UTFDataFormatException
* Updated StandardRecordWriter, even though it is now deprecated to consider 
the encoding behavior of java.io.DataOutputStream.writeUTF() and truncate 
string values such that the UTF representation will not be longer than that 
DataOutputStream's 64K UTF format limit.
* Updated the new SchemaRecordWriter class to similarly truncate long Strings 
that will be written as UTF.
* Add tests to confirm handling of large UTF strings and various edge 
conditions of UTF string handling.

Signed-off-by: Mike Moser <[email protected]>

This closes #1469.


> StandardRecordWriter can throw UTFDataFormatException
> -----------------------------------------------------
>
>                 Key: NIFI-3055
>                 URL: https://issues.apache.org/jira/browse/NIFI-3055
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 0.7.1
>            Reporter: Brandon DeVries
>            Assignee: Joe Skora
>
> StandardRecordWriter.writeRecord()\[1] uses DataOutputStream.writeUTF()\[2] 
> without checking the length of the value to be written.  If this length is 
> greater than  65535 (2^16 - 1), you get a UTFDataFormatException "encoded 
> string too long..."\[3].  Ultimately, this can result in an 
> IllegalStateException\[4], -bringing a halt to the data flow- causing 
> PersistentProvenanceRepository "Unable to merge <prov_journal> with other 
> Journal Files due to..." WARNings.
> Several of the field values being written in this way are pre-defined, and 
> thus not likely an issue.  However, the "details" field can be populated by a 
> processor, and can be of an arbitrary length.  -Additionally, if the detail 
> filed is indexed (which I didn't investigate, but I'm sure is easy enough to 
> determine), then the length might be subject to the Lucene limit discussed in 
> NIFI-2787-.
> \[1] 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/StandardRecordWriter.java#L163-L173
> \[2] 
> http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeUTF%28java.lang.String%29
> \[3] 
> http://stackoverflow.com/questions/22741556/dataoutputstream-purpose-of-the-encoded-string-too-long-restriction
> \[4] 
> https://github.com/apache/nifi/blob/5fd4a55791da27fdba577636ac985a294618328a/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java#L754-L755



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to