[ 
https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994316#comment-14994316
 ] 

Bryan Bende commented on NIFI-994:
----------------------------------

I've been testing this processor for the past two days and overall it is 
awesome! 

I created one scenario that I have reproduced a couple of times where it seems 
like the processor re-reads some lines from the last rolled file that it has 
already read. I added some logging to the processor to see what was going on in 
recoverRolledFiles() and here is what prints out when I see the problem:

{code}
2015-11-06 14:08:56,882 INFO [Timer-Driven Process Thread-10] 
o.a.nifi.processors.standard.TailFile 
TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED ROLLED FILES WITH 
STATE TIMESTAMP OF 1446836931000
2015-11-06 14:08:56,882 INFO [Timer-Driven Process Thread-10] 
o.a.nifi.processors.standard.TailFile 
TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED ROLLED FILE 
solr.log.1 WITH LAST MODIFIED TIME OF 1446836931000
2015-11-06 14:08:56,882 INFO [Timer-Driven Process Thread-10] 
o.a.nifi.processors.standard.TailFile 
TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED - firstFile LENGTH 
IS 262621 AND state.getPosition() IS 260201
2015-11-06 14:08:56,883 INFO [Timer-Driven Process Thread-10] 
o.a.nifi.processors.standard.TailFile 
TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED - EXPECTED RECOVERY 
CHECKSUM IS 3912972977 AND CHECKSUM RESULT IS 1100203812
{code}

I had TailFile stopped when solr.log rolled, started it shortly after so it 
picks up solr.log.1 correctly, determines that new data was written to it since 
the last time since the file length is > state.getPosition(), then it 
calculates the checksum which ends up not matching the expected checksum. I 
can't figure out why the checksum doesn't match, but since they don't match 
then it leaves that file in the list to be processed in full. 

> Processor to tail files
> -----------------------
>
>                 Key: NIFI-994
>                 URL: https://issues.apache.org/jira/browse/NIFI-994
>             Project: Apache NiFi
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Joseph Percivall
>            Assignee: Mark Payne
>             Fix For: 0.4.0
>
>         Attachments: 0001-NIFI-994-Initial-import-of-TailFile.patch, 
> 0002-NIFI-994-Ensure-that-processor-is-not-valid-due-to-t.patch
>
>
> It's a very common data ingest situation to want to input text into the 
> system by "tailing" a file, most commonly log files. Currently we don't have 
> an easy way to do this. 
> A simple processor to tail a file would benefit many users. There would need 
> to be an option to not just tail a file but pick up where the processor left 
> off if it is interrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to