Github user joewitt commented on the pull request:

    https://github.com/apache/incubator-nifi/pull/69#issuecomment-116186741
  
    Builds cleanly including the contribution check for licenses and check 
style.
    
    ConvertCSVToAvro
    - Line 258 should use parameters passed in via object[] rather than string 
concatenation.
    
    - What happens if you have processed an entire input and produced no 
successful results due to a DatasetRecordExceptions?  I think it would forward 
an empty flow file since nothing would have been written to it.  Is that what 
you want?  Such a case seems like a FAILURE to me.
    
    - Consider removing the incompatible relationship and simply adding an 
attribute to the success flow file which lists the reasons or add an attribute 
per reason for difficulty converting the original.  If you will keep the 
incompatible relationship then I’d recommend sending it a clone of the 
original item and then adding the relationship.  That would make 
troubleshooting or analysis of that object much easier.  In any case be very 
careful about how large these attributes could become.  If you’re storing a 
series of stack traces this could become unwieldy and memory inefficient really 
fast.
    
    ConvertJSONToAvro
    - Line 160 should use parameters passed in via object[] rather than string 
concatenation.
    
    - Same comments as Convert CSV to Avro.
    
    FailureTracker
    - This class is a cool idea.  My concern with it is that it appears to 
store all the raw reasons which could be quite significant in large files with 
*many* thousands of lines.  This could create issues with the heap, cause 
excessive GC, etc.  Consider refactoring this class to both immediately store 
the summarized representation and to bound the total number of reasons it will 
store.  It can just ignore additional errors as they arrive.  This will allow 
it to behave well in terms of memory consumption even in the presence of 
massive failures as everything will be bounded.
    
    Thanks
    Joe


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to