[ 
https://issues.apache.org/jira/browse/NIFI-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140392#comment-15140392
 ] 

Joseph Witt commented on NIFI-1484:
-----------------------------------

full clean build contrib check solid.  Nice unit testing! Code review follows 
the plan and shows the transition from the previous held listing model to the 
current two-timestamp technique.

Ran some interesting operational cases to ensure it detects new data (and 
quickly).  Validated the state clearing now does cause it to begin capturing 
data again.  

However, in validating across restarts found some wonky behavior.  It seemed to 
grab data again (like the most recent thing it pulled).  And then I found this 

{quote}
Auto refresh started
01:15:56 ESTERROR4b4d8023-8a39-417e-a85e-530a9e9b6520
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] 
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] failed to process session due 
to java.lang.NullPointerException: java.lang.NullPointerException
01:15:56 ESTWARNING4b4d8023-8a39-417e-a85e-530a9e9b6520
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] Processor Administratively 
Yielded for 1 sec due to processing failure
01:15:57 ESTERROR4b4d8023-8a39-417e-a85e-530a9e9b6520
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] 
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] failed to process due to 
java.lang.NullPointerException; rolling back session: 
java.lang.NullPointerException
01:15:57 ESTERROR4b4d8023-8a39-417e-a85e-530a9e9b6520
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] 
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] failed to process session due 
to java.lang.NullPointerException: java.lang.NullPointerException
01:15:57 ESTWARNING4b4d8023-8a39-417e-a85e-530a9e9b6520
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] Processor Administratively 
Yielded for 1 sec due to processing failure
01:15:59 ESTERROR4b4d8023-8a39-417e-a85e-530a9e9b6520
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] 
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] failed to process session due 
to java.lang.NullPointerException: java.lang.NullPointerException
01:15:59 ESTWARNING4b4d8023-8a39-417e-a85e-530a9e9b6520
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] Processor Administratively 
Yielded for 1 sec due to processing failure
01:16:00 ESTERROR4b4d8023-8a39-417e-a85e-530a9e9b6520
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] 
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] failed to process due to 
java.lang.NullPointerException; rolling back session: 
java.lang.NullPointerException
01:16:00 ESTERROR4b4d8023-8a39-417e-a85e-530a9e9b6520
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] 
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] failed to process session due 
to java.lang.NullPointerException: java.lang.NullPointerException
01:16:00 ESTWARNING4b4d8023-8a39-417e-a85e-530a9e9b6520
ListFile[id=4b4d8023-8a39-417e-a85e-530a9e9b6520] Processor Administratively 
Yielded for 1 sec due to processing failure
{quote}

> ListFile holds unbounded list of files with matching time stamps
> ----------------------------------------------------------------
>
>                 Key: NIFI-1484
>                 URL: https://issues.apache.org/jira/browse/NIFI-1484
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core UI, Extensions
>    Affects Versions: 0.4.0, 0.5.0
>            Reporter: Joseph Witt
>             Fix For: 0.5.0
>
>
> ListFile appears to hold an unbounded set of filenames that match the last 
> timestamp.  While this is understandable to handle the edge case of new data 
> arriving at the same time it presents two problems.  First we hold all of 
> this information in state management which could put considerable pressure on 
> both the local and remote stores but we also have it in memory before we 
> persist it.
> Also, the entire state listing appears to show up in the UI without 
> pagination or any limit on number of entries.  This seems like a problem for 
> the client-side as well.  The server side should probably restrict this.
> Finally, it seems like the need for saving filenames seen at a given 
> timestamp is only necessary if we're assuming the listing we do is 'as-of' 
> RIGHT NOW.  What is instead we did the listing based on a last modified time 
> of 'RIGHTNOW'-1 millisecond or something like that?  Then we should not have 
> to worry at all about keeping a listing of names for the timestamp.
> The reason I think this is important is that it is not at all uncommon for a 
> directory with large quantities of files to have data at the same time due to 
> a copy operation not preserving original file attributes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to