[
https://issues.apache.org/jira/browse/NIFI-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856609#comment-15856609
]
ASF GitHub Bot commented on NIFI-3055:
--------------------------------------
Github user mosermw commented on the issue:
https://github.com/apache/nifi/pull/1475
New getCharsInUTF8Limit() function looks good to me. Sorry about missing
the license concern on the last PR.
PR executes as expected. The WARN log message says "Truncating repository
record value for field 'Attribute Value'!" when an attribute value exceeds
65535 bytes. Unfortunately, the actual name of the attribute is not available
unless you go several levels up the stack. Is this good enough or do you want
to throw an exception higher up the stack to make that available?
+1 looks good to me. I will let another review and merge, to get more eyes
on this.
> StandardRecordWriter can throw UTFDataFormatException
> -----------------------------------------------------
>
> Key: NIFI-3055
> URL: https://issues.apache.org/jira/browse/NIFI-3055
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 1.0.0, 0.7.1
> Reporter: Brandon DeVries
> Assignee: Joe Skora
> Priority: Blocker
> Fix For: 0.8.0, 1.2.0
>
>
> StandardRecordWriter.writeRecord()\[1] uses DataOutputStream.writeUTF()\[2]
> without checking the length of the value to be written. If this length is
> greater than 65535 (2^16 - 1), you get a UTFDataFormatException "encoded
> string too long..."\[3]. Ultimately, this can result in an
> IllegalStateException\[4], -bringing a halt to the data flow- causing
> PersistentProvenanceRepository "Unable to merge <prov_journal> with other
> Journal Files due to..." WARNings.
> Several of the field values being written in this way are pre-defined, and
> thus not likely an issue. However, the "details" field can be populated by a
> processor, and can be of an arbitrary length. -Additionally, if the detail
> filed is indexed (which I didn't investigate, but I'm sure is easy enough to
> determine), then the length might be subject to the Lucene limit discussed in
> NIFI-2787-.
> \[1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/StandardRecordWriter.java#L163-L173
> \[2]
> http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeUTF%28java.lang.String%29
> \[3]
> http://stackoverflow.com/questions/22741556/dataoutputstream-purpose-of-the-encoded-string-too-long-restriction
> \[4]
> https://github.com/apache/nifi/blob/5fd4a55791da27fdba577636ac985a294618328a/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java#L754-L755
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)