Hello Koji,

Thanks for NIFI-4069 (not NIFI-4096 =))

I tested your PR in several ways on version: From a0f2834 on branch
nifi-4069

Test 1:
1. set Target System Timestamp Precision: Auto Detect
2. start ListFile
3. start script for i in {1..10000}; do touch ./test_$i; done

Result: no miss files


Test 2:
1. set Target System Timestamp Precision: Milliseconds
2. start ListFile
3. start script for i in {1..10000}; do touch ./test_$i; done

Result: there are missing files


Test 3 and 4 (100k files):
1. set Target System Timestamp Precision: Auto Detect
2. start ListFile
3. start script for i in {1..100000}; do touch ./test_$i; done

Result: missing 68 and 40 files


In all tests listing.timestamp and processed.timestamp still not have
milliseconds



Summary:
1. Now much better than it was. Thanks Koji for good job!
2. Still do not see milliseconds, however my ext4 file system show modify
date in nanoseconds


Koji Kawamura-2 wrote
> Hi Roman and all,
> 
> As I investigated further on ListFile processor, I found those are two
> different issues.
> Also I found another JIRA related to ListFile. Currently there seem to
> be three issues:
> 
> 1. ListFile can miss files with filesystems those do not provide
> timestamps in milliseconds precision (NIFI-4096)
> 2. ListFile can miss files having the same timestamp same as the
> previously processed latest timestamp (NIFI-3332)
> 3. ListFile can not pickup files whose timestamp is older than the
> previously processed latest timestamp (NIFI-2383)
> 
> # NIFI-4096
> I created JIRA NIFI-4096 to address issue#1 above, by adding
> deterministic logic to detect target filesystem timestamp precision.
> With NIFI-4096, ListFile can list whole 10,000 files created by the
> command you shared before without missing anything:
> 
> ```
> for i in {1..10000}; do touch ./test_$i; done
> ```
> 
> The PR is ready for review. I appreciate if you can test the fix with
> your use case.
> 
> Additionally, I refactored variable names in AbstractListProcessor to
> explain purpose and timestamp unit better. I hope it makes the code
> more readable and maintainable.
> 
> # NIFI-3332
> I'm thinking about adding a processor property to specify whether
> track the listed filenames with the latest processed timestamp.
> Although it will be less efficient, it'd be good for some use cases.
> 
> # NIFI-2383
> This is the most difficult case to handle right with only timestamp.
> We need different processor which can use watch API..
> 
> Any comment would be appreciated.
> 
> Thanks,
> Koji
> 
> On Tue, Jun 6, 2017 at 9:18 PM, Koji Kawamura <

> ijokarumawak@

> > wrote:
>> Hi Roman,
>>
>> I think NIFI-3332 is probably related as I can see timestamps in logs
>> don't have milliseconds.
>>
>> I've been considering how we can support all corner cases with minimal
>> state to persist, and make it works even if the filesystem only
>> provide last modified timestamp in seconds precision.
>> Changing code and testing locally, but not ready for send a PR yet,
>> and I am not fully confident on how to fix.
>>
>> Any suggestion or insight would be appreciated to make these ListXXXX
>> processor better.
>>
>> Thanks,
>> Koji
>>
>> On Tue, Jun 6, 2017 at 8:54 PM, Roman <

> ramon9869@

> > wrote:
>>> Hi there,
>>>
>>> During digging into this issue, I found open issue in jira  NIFI-3332
>>> <https://issues.apache.org/jira/browse/NIFI-3332>  . Can it be
>>> related to my
>>> situation with missed milliseconds?
>>>
>>> Thanks
>>> Roman
>>>
>>>
>>> Koji Kawamura-2 wrote
>>>> Hello Roman,
>>>>
>>>> It seems the resolution of last modified timestamp depends on the file
>>>> system implementation.
>>>> https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond
>>>>
>>>> I reproduced the same behavior on OS X, which uses HFS that has the
>>>> same limitation of resolution in seconds.
>>>> https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java
>>>>
>>>> Which file system are you using on your Ubuntu? If it is ext3, then
>>>> changing it to ext4 may address the issue.
>>>>
>>>> Thanks,
>>>> Koji
>>>>
>>>> On Thu, Jun 1, 2017 at 1:25 AM, Roman <
>>>
>>>> ramon9869@
>>>
>>>> > wrote:
>>>>> Hi there, i need help.
>>>>>
>>>>> We prepare high load project and tested this processors. All time see
>>>>> listing.timestamp and processed.timestamp keys without milliseconds
>>>>> (xxxxxxxxxx000). In this way, if generate several files in one second,
>>>>> not
>>>>> all files will be listened.
>>>>>
>>>>>
>>>>> Test:
>>>>> 1. start processor ListFile/ListSFTP
>>>>> 2. generate 10000 zero size files. my command:  for i in {1..10000};
>>>>> do
>>>>> touch ./test_$i; done
>>>>> 3. see processor stats: out 3952 (0 bytes)
>>>>>
>>>>>
>>>>> I'm somewhere wrong? Or is it a bug nifi/java/etc?
>>>>>
>>>>> Environment
>>>>>
>>>>> Ubuntu 14.04.5 LTS, x64, ext4 file system
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
>>>>>
>>>>>
>>>>> Thanks
>>>>> Roman
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
>>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>>> Nabble.com.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16118.html
>>> Sent from the Apache NiFi Developer List mailing list archive at
>>> Nabble.com.





--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16221.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply via email to