Thanks Joe, I agree with you on the idea to make ListXXX as reliable as possible. If it's done, I'm also interested in providing different means using watch APIs to cover use-cases that ListXXX can't (by timestamps).
Roman, thanks for testing the change. Test 1 and 2 results are expected. Test 3 ... this might have been affected by the issue reported by NIFI-3332 (files having the same timestamp processed at previous cycle). I'll take a look if there's anything we can do. > 2. Still do not see milliseconds, however my ext4 file system show modify > date in nanoseconds Roman, would you try creating a simple Java program to see if the issue resides in NiFi codebase, or native code for your environment? There is a similar issue reported in Stackoverflow: https://stackoverflow.com/questions/24804618/get-file-mtime-with-millisecond-resolution-from-java If the simple program can return timestamp in milliseconds, we should fix something in NiFi. I really appreciate your feedback! Thanks! Koji On Tue, Jun 20, 2017 at 9:17 PM, Roman <[email protected]> wrote: > Hello Koji, > > Thanks for NIFI-4069 (not NIFI-4096 =)) > > I tested your PR in several ways on version: From a0f2834 on branch > nifi-4069 > > Test 1: > 1. set Target System Timestamp Precision: Auto Detect > 2. start ListFile > 3. start script for i in {1..10000}; do touch ./test_$i; done > > Result: no miss files > > > Test 2: > 1. set Target System Timestamp Precision: Milliseconds > 2. start ListFile > 3. start script for i in {1..10000}; do touch ./test_$i; done > > Result: there are missing files > > > Test 3 and 4 (100k files): > 1. set Target System Timestamp Precision: Auto Detect > 2. start ListFile > 3. start script for i in {1..100000}; do touch ./test_$i; done > > Result: missing 68 and 40 files > > > In all tests listing.timestamp and processed.timestamp still not have > milliseconds > > > > Summary: > 1. Now much better than it was. Thanks Koji for good job! > 2. Still do not see milliseconds, however my ext4 file system show modify > date in nanoseconds > > > Koji Kawamura-2 wrote >> Hi Roman and all, >> >> As I investigated further on ListFile processor, I found those are two >> different issues. >> Also I found another JIRA related to ListFile. Currently there seem to >> be three issues: >> >> 1. ListFile can miss files with filesystems those do not provide >> timestamps in milliseconds precision (NIFI-4096) >> 2. ListFile can miss files having the same timestamp same as the >> previously processed latest timestamp (NIFI-3332) >> 3. ListFile can not pickup files whose timestamp is older than the >> previously processed latest timestamp (NIFI-2383) >> >> # NIFI-4096 >> I created JIRA NIFI-4096 to address issue#1 above, by adding >> deterministic logic to detect target filesystem timestamp precision. >> With NIFI-4096, ListFile can list whole 10,000 files created by the >> command you shared before without missing anything: >> >> ``` >> for i in {1..10000}; do touch ./test_$i; done >> ``` >> >> The PR is ready for review. I appreciate if you can test the fix with >> your use case. >> >> Additionally, I refactored variable names in AbstractListProcessor to >> explain purpose and timestamp unit better. I hope it makes the code >> more readable and maintainable. >> >> # NIFI-3332 >> I'm thinking about adding a processor property to specify whether >> track the listed filenames with the latest processed timestamp. >> Although it will be less efficient, it'd be good for some use cases. >> >> # NIFI-2383 >> This is the most difficult case to handle right with only timestamp. >> We need different processor which can use watch API.. >> >> Any comment would be appreciated. >> >> Thanks, >> Koji >> >> On Tue, Jun 6, 2017 at 9:18 PM, Koji Kawamura < > >> ijokarumawak@ > >> > wrote: >>> Hi Roman, >>> >>> I think NIFI-3332 is probably related as I can see timestamps in logs >>> don't have milliseconds. >>> >>> I've been considering how we can support all corner cases with minimal >>> state to persist, and make it works even if the filesystem only >>> provide last modified timestamp in seconds precision. >>> Changing code and testing locally, but not ready for send a PR yet, >>> and I am not fully confident on how to fix. >>> >>> Any suggestion or insight would be appreciated to make these ListXXXX >>> processor better. >>> >>> Thanks, >>> Koji >>> >>> On Tue, Jun 6, 2017 at 8:54 PM, Roman < > >> ramon9869@ > >> > wrote: >>>> Hi there, >>>> >>>> During digging into this issue, I found open issue in jira NIFI-3332 >>>> <https://issues.apache.org/jira/browse/NIFI-3332> . Can it be >>>> related to my >>>> situation with missed milliseconds? >>>> >>>> Thanks >>>> Roman >>>> >>>> >>>> Koji Kawamura-2 wrote >>>>> Hello Roman, >>>>> >>>>> It seems the resolution of last modified timestamp depends on the file >>>>> system implementation. >>>>> https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond >>>>> >>>>> I reproduced the same behavior on OS X, which uses HFS that has the >>>>> same limitation of resolution in seconds. >>>>> https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java >>>>> >>>>> Which file system are you using on your Ubuntu? If it is ext3, then >>>>> changing it to ext4 may address the issue. >>>>> >>>>> Thanks, >>>>> Koji >>>>> >>>>> On Thu, Jun 1, 2017 at 1:25 AM, Roman < >>>> >>>>> ramon9869@ >>>> >>>>> > wrote: >>>>>> Hi there, i need help. >>>>>> >>>>>> We prepare high load project and tested this processors. All time see >>>>>> listing.timestamp and processed.timestamp keys without milliseconds >>>>>> (xxxxxxxxxx000). In this way, if generate several files in one second, >>>>>> not >>>>>> all files will be listened. >>>>>> >>>>>> >>>>>> Test: >>>>>> 1. start processor ListFile/ListSFTP >>>>>> 2. generate 10000 zero size files. my command: for i in {1..10000}; >>>>>> do >>>>>> touch ./test_$i; done >>>>>> 3. see processor stats: out 3952 (0 bytes) >>>>>> >>>>>> >>>>>> I'm somewhere wrong? Or is it a bug nifi/java/etc? >>>>>> >>>>>> Environment >>>>>> >>>>>> Ubuntu 14.04.5 LTS, x64, ext4 file system >>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode) >>>>>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2 >>>>>> >>>>>> >>>>>> Thanks >>>>>> Roman >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html >>>>>> Sent from the Apache NiFi Developer List mailing list archive at >>>>>> Nabble.com. >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16118.html >>>> Sent from the Apache NiFi Developer List mailing list archive at >>>> Nabble.com. > > > > > > -- > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16221.html > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
