[ 
https://issues.apache.org/jira/browse/NIFI-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850708#comment-15850708
 ] 

ASF GitHub Bot commented on NIFI-3055:
--------------------------------------

Github user mosermw commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1469#discussion_r99244109
  
    --- Diff: 
nifi-commons/nifi-schema-utils/src/main/java/org/apache/nifi/repository/schema/SchemaRecordWriter.java
 ---
    @@ -136,4 +144,44 @@ private void writeFieldValue(final RecordField field, 
final Object value, final
                     break;
             }
         }
    +
    +    private void writeUTFLimited(final DataOutputStream out, final String 
utfString) throws IOException {
    +        try {
    +            out.writeUTF(utfString);
    +        } catch (UTFDataFormatException e) {
    +            final String truncated = utfString.substring(0, 
getCharsInUTFLength(utfString, MAX_ALLOWED_UTF_LENGTH));
    +            logger.warn("Truncating UTF value!  Attempted to write string 
with char length {} and UTF length greater than "
    +                            + "supported maximum allowed ({}), truncating 
to char length {}.",
    +                    utfString.length(), MAX_ALLOWED_UTF_LENGTH, 
truncated.length());
    --- End diff --
    
    Can we mention provenance in this message, such as "Truncating provenance 
record value"?  Does this message potentially mix char length and byte length, 
such as "Attempted to write string with char length 40000 and UTF length 
greater than supported maximum allowed (65535), truncating to char length 
39000."?  Perhaps a simpler message such as "Attempted to store string with 
length 40000, truncating to 39000."


> StandardRecordWriter can throw UTFDataFormatException
> -----------------------------------------------------
>
>                 Key: NIFI-3055
>                 URL: https://issues.apache.org/jira/browse/NIFI-3055
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 0.7.1
>            Reporter: Brandon DeVries
>            Assignee: Joe Skora
>
> StandardRecordWriter.writeRecord()\[1] uses DataOutputStream.writeUTF()\[2] 
> without checking the length of the value to be written.  If this length is 
> greater than  65535 (2^16 - 1), you get a UTFDataFormatException "encoded 
> string too long..."\[3].  Ultimately, this can result in an 
> IllegalStateException\[4], -bringing a halt to the data flow- causing 
> PersistentProvenanceRepository "Unable to merge <prov_journal> with other 
> Journal Files due to..." WARNings.
> Several of the field values being written in this way are pre-defined, and 
> thus not likely an issue.  However, the "details" field can be populated by a 
> processor, and can be of an arbitrary length.  -Additionally, if the detail 
> filed is indexed (which I didn't investigate, but I'm sure is easy enough to 
> determine), then the length might be subject to the Lucene limit discussed in 
> NIFI-2787-.
> \[1] 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/StandardRecordWriter.java#L163-L173
> \[2] 
> http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeUTF%28java.lang.String%29
> \[3] 
> http://stackoverflow.com/questions/22741556/dataoutputstream-purpose-of-the-encoded-string-too-long-restriction
> \[4] 
> https://github.com/apache/nifi/blob/5fd4a55791da27fdba577636ac985a294618328a/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java#L754-L755



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to