[
https://issues.apache.org/jira/browse/NIFI-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864854#comment-15864854
]
Mark Payne commented on NIFI-3055:
----------------------------------
[~mosermw] - sorry, I think I didn't explain my suggestion well re: avoiding
throwing Exceptions. You are right - we absolutely should not calculate the
byte length each time in order to avoid thrown the Exception - that would be
very bad! :) I was simply suggesting that we check String.length() because it's
already known. If the length is > the max number of bytes then we also know
that the length of the resultant byte[] will be too long (if it's smaller than
the max number of bytes then we still don't know so we'd have to move ahead
like we are now). So we could just pre-emptively throw the Exception. But it's
a very minor point, because the only time when you'd get into a situation where
something like this really matters is when the process is already in a really
bad state - and then it's probably too late to worry about performance really.
So I'm OK not messing with this. Was just a thought that popped into my head
during the initial review.
Thanks!
-Mark
> StandardRecordWriter can throw UTFDataFormatException
> -----------------------------------------------------
>
> Key: NIFI-3055
> URL: https://issues.apache.org/jira/browse/NIFI-3055
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 1.0.0, 0.7.1
> Reporter: Brandon DeVries
> Assignee: Joe Skora
> Priority: Blocker
> Fix For: 0.8.0, 1.2.0
>
>
> StandardRecordWriter.writeRecord()\[1] uses DataOutputStream.writeUTF()\[2]
> without checking the length of the value to be written. If this length is
> greater than 65535 (2^16 - 1), you get a UTFDataFormatException "encoded
> string too long..."\[3]. Ultimately, this can result in an
> IllegalStateException\[4], -bringing a halt to the data flow- causing
> PersistentProvenanceRepository "Unable to merge <prov_journal> with other
> Journal Files due to..." WARNings.
> Several of the field values being written in this way are pre-defined, and
> thus not likely an issue. However, the "details" field can be populated by a
> processor, and can be of an arbitrary length. -Additionally, if the detail
> filed is indexed (which I didn't investigate, but I'm sure is easy enough to
> determine), then the length might be subject to the Lucene limit discussed in
> NIFI-2787-.
> \[1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/StandardRecordWriter.java#L163-L173
> \[2]
> http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeUTF%28java.lang.String%29
> \[3]
> http://stackoverflow.com/questions/22741556/dataoutputstream-purpose-of-the-encoded-string-too-long-restriction
> \[4]
> https://github.com/apache/nifi/blob/5fd4a55791da27fdba577636ac985a294618328a/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java#L754-L755
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)