Github user jvwing commented on the issue:
https://github.com/apache/nifi/pull/929
Thanks for those improvements, @jdye64, I especially like the updated usage
doc. Two things on the latest code:
1. Did you try an .xls file? There is a problem when the flowfile
attribute is added in the catch block on line ~195. The NiFi framework throws
an exception of it's own, because we can't do `session.putAttribute` inside an
InputStreamCallback for the same flowfile:
> java.lang.IllegalStateException:
StandardFlowFileRecord[uuid=be192381-9475-4c6d-a6ca-43735e5df271,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1489713904793-1, container=default,
section=1], offset=0,
length=26112],offset=0,name=./conf/test-xls.xls,size=26112] already in use for
an active callback or InputStream created by ProcessSession.read(FlowFile) has
not been closed
Something similar happens with the session.putAttribute on ~209. As a
result of these exceptions, the session is rolled back and the flowfile is
returned to the input queue. I think we can throw an exception, though. So if
we caught and rethrew with a different error message, it should work out.
2. In the failure case, we're routing the flowfile to both 'failure' and
'original'. I didn't realize it earlier, but I now believe this to be unusual
in NiFi. Most processors treat failure as an exclusive route, and 'original'
as part of the successful happy path. SplitAvro, SplitJson, SplitText, and
UnpackContent were some examples I looked at. I doubt that's written in stone.
What do you think?
I made a [sample code
fork](https://github.com/jvwing/nifi/commit/2ccf5dec2dcd707c5963716dfb3fbf7813c460ea)
with a unit test for .xls and a suggested approach to solving the
IllegalStateExceptions, and the failure routing. I did not get the logging to
cooperate the way I think it should, but we're not too far off.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---