Github user joewitt commented on the pull request:
https://github.com/apache/incubator-nifi/pull/69#issuecomment-116186741
Builds cleanly including the contribution check for licenses and check
style.
ConvertCSVToAvro
- Line 258 should use parameters passed in via object[] rather than string
concatenation.
- What happens if you have processed an entire input and produced no
successful results due to a DatasetRecordExceptions? I think it would forward
an empty flow file since nothing would have been written to it. Is that what
you want? Such a case seems like a FAILURE to me.
- Consider removing the incompatible relationship and simply adding an
attribute to the success flow file which lists the reasons or add an attribute
per reason for difficulty converting the original. If you will keep the
incompatible relationship then Iâd recommend sending it a clone of the
original item and then adding the relationship. That would make
troubleshooting or analysis of that object much easier. In any case be very
careful about how large these attributes could become. If youâre storing a
series of stack traces this could become unwieldy and memory inefficient really
fast.
ConvertJSONToAvro
- Line 160 should use parameters passed in via object[] rather than string
concatenation.
- Same comments as Convert CSV to Avro.
FailureTracker
- This class is a cool idea. My concern with it is that it appears to
store all the raw reasons which could be quite significant in large files with
*many* thousands of lines. This could create issues with the heap, cause
excessive GC, etc. Consider refactoring this class to both immediately store
the summarized representation and to bound the total number of reasons it will
store. It can just ignore additional errors as they arrive. This will allow
it to behave well in terms of memory consumption even in the presence of
massive failures as everything will be bounded.
Thanks
Joe
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---