[
https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994316#comment-14994316
]
Bryan Bende commented on NIFI-994:
----------------------------------
I've been testing this processor for the past two days and overall it is
awesome!
I created one scenario that I have reproduced a couple of times where it seems
like the processor re-reads some lines from the last rolled file that it has
already read. I added some logging to the processor to see what was going on in
recoverRolledFiles() and here is what prints out when I see the problem:
{code}
2015-11-06 14:08:56,882 INFO [Timer-Driven Process Thread-10]
o.a.nifi.processors.standard.TailFile
TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED ROLLED FILES WITH
STATE TIMESTAMP OF 1446836931000
2015-11-06 14:08:56,882 INFO [Timer-Driven Process Thread-10]
o.a.nifi.processors.standard.TailFile
TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED ROLLED FILE
solr.log.1 WITH LAST MODIFIED TIME OF 1446836931000
2015-11-06 14:08:56,882 INFO [Timer-Driven Process Thread-10]
o.a.nifi.processors.standard.TailFile
TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED - firstFile LENGTH
IS 262621 AND state.getPosition() IS 260201
2015-11-06 14:08:56,883 INFO [Timer-Driven Process Thread-10]
o.a.nifi.processors.standard.TailFile
TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED - EXPECTED RECOVERY
CHECKSUM IS 3912972977 AND CHECKSUM RESULT IS 1100203812
{code}
I had TailFile stopped when solr.log rolled, started it shortly after so it
picks up solr.log.1 correctly, determines that new data was written to it since
the last time since the file length is > state.getPosition(), then it
calculates the checksum which ends up not matching the expected checksum. I
can't figure out why the checksum doesn't match, but since they don't match
then it leaves that file in the list to be processed in full.
> Processor to tail files
> -----------------------
>
> Key: NIFI-994
> URL: https://issues.apache.org/jira/browse/NIFI-994
> Project: Apache NiFi
> Issue Type: New Feature
> Affects Versions: 0.4.0
> Reporter: Joseph Percivall
> Assignee: Mark Payne
> Fix For: 0.4.0
>
> Attachments: 0001-NIFI-994-Initial-import-of-TailFile.patch,
> 0002-NIFI-994-Ensure-that-processor-is-not-valid-due-to-t.patch
>
>
> It's a very common data ingest situation to want to input text into the
> system by "tailing" a file, most commonly log files. Currently we don't have
> an easy way to do this.
> A simple processor to tail a file would benefit many users. There would need
> to be an option to not just tail a file but pick up where the processor left
> off if it is interrupted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)