[ 
https://issues.apache.org/jira/browse/NIFI-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856609#comment-15856609
 ] 

ASF GitHub Bot commented on NIFI-3055:
--------------------------------------

Github user mosermw commented on the issue:

    https://github.com/apache/nifi/pull/1475
  
    New getCharsInUTF8Limit() function looks good to me.  Sorry about missing 
the license concern on the last PR.
    
    PR executes as expected.  The WARN log message says "Truncating repository 
record value for field 'Attribute Value'!"  when an attribute value exceeds 
65535 bytes.  Unfortunately, the actual name of the attribute is not available 
unless you go several levels up the stack.  Is this good enough or do you want 
to throw an exception higher up the stack to make that available?
    
    +1 looks good to me.  I will let another review and merge, to get more eyes 
on this.


> StandardRecordWriter can throw UTFDataFormatException
> -----------------------------------------------------
>
>                 Key: NIFI-3055
>                 URL: https://issues.apache.org/jira/browse/NIFI-3055
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 0.7.1
>            Reporter: Brandon DeVries
>            Assignee: Joe Skora
>            Priority: Blocker
>             Fix For: 0.8.0, 1.2.0
>
>
> StandardRecordWriter.writeRecord()\[1] uses DataOutputStream.writeUTF()\[2] 
> without checking the length of the value to be written.  If this length is 
> greater than  65535 (2^16 - 1), you get a UTFDataFormatException "encoded 
> string too long..."\[3].  Ultimately, this can result in an 
> IllegalStateException\[4], -bringing a halt to the data flow- causing 
> PersistentProvenanceRepository "Unable to merge <prov_journal> with other 
> Journal Files due to..." WARNings.
> Several of the field values being written in this way are pre-defined, and 
> thus not likely an issue.  However, the "details" field can be populated by a 
> processor, and can be of an arbitrary length.  -Additionally, if the detail 
> filed is indexed (which I didn't investigate, but I'm sure is easy enough to 
> determine), then the length might be subject to the Lucene limit discussed in 
> NIFI-2787-.
> \[1] 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/StandardRecordWriter.java#L163-L173
> \[2] 
> http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeUTF%28java.lang.String%29
> \[3] 
> http://stackoverflow.com/questions/22741556/dataoutputstream-purpose-of-the-encoded-string-too-long-restriction
> \[4] 
> https://github.com/apache/nifi/blob/5fd4a55791da27fdba577636ac985a294618328a/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java#L754-L755



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to