Thanks Joe, I agree with you on the idea to make ListXXX as reliable
as possible. If it's done, I'm also interested in providing different
means using watch APIs to cover use-cases that ListXXX can't (by
timestamps).

Roman, thanks for testing the change.
Test 1 and 2 results are expected.
Test 3 ... this might have been affected by the issue reported by
NIFI-3332 (files having the same timestamp processed at previous
cycle). I'll take a look if there's anything we can do.

> 2. Still do not see milliseconds, however my ext4 file system show modify 
> date in nanoseconds

Roman, would you try creating a simple Java program to see if the
issue resides in NiFi codebase, or native code for your environment?
There is a similar issue reported in Stackoverflow:
https://stackoverflow.com/questions/24804618/get-file-mtime-with-millisecond-resolution-from-java

If the simple program can return timestamp in milliseconds, we should
fix something in NiFi.

I really appreciate your feedback! Thanks!
Koji

On Tue, Jun 20, 2017 at 9:17 PM, Roman <[email protected]> wrote:
> Hello Koji,
>
> Thanks for NIFI-4069 (not NIFI-4096 =))
>
> I tested your PR in several ways on version: From a0f2834 on branch
> nifi-4069
>
> Test 1:
> 1. set Target System Timestamp Precision: Auto Detect
> 2. start ListFile
> 3. start script for i in {1..10000}; do touch ./test_$i; done
>
> Result: no miss files
>
>
> Test 2:
> 1. set Target System Timestamp Precision: Milliseconds
> 2. start ListFile
> 3. start script for i in {1..10000}; do touch ./test_$i; done
>
> Result: there are missing files
>
>
> Test 3 and 4 (100k files):
> 1. set Target System Timestamp Precision: Auto Detect
> 2. start ListFile
> 3. start script for i in {1..100000}; do touch ./test_$i; done
>
> Result: missing 68 and 40 files
>
>
> In all tests listing.timestamp and processed.timestamp still not have
> milliseconds
>
>
>
> Summary:
> 1. Now much better than it was. Thanks Koji for good job!
> 2. Still do not see milliseconds, however my ext4 file system show modify
> date in nanoseconds
>
>
> Koji Kawamura-2 wrote
>> Hi Roman and all,
>>
>> As I investigated further on ListFile processor, I found those are two
>> different issues.
>> Also I found another JIRA related to ListFile. Currently there seem to
>> be three issues:
>>
>> 1. ListFile can miss files with filesystems those do not provide
>> timestamps in milliseconds precision (NIFI-4096)
>> 2. ListFile can miss files having the same timestamp same as the
>> previously processed latest timestamp (NIFI-3332)
>> 3. ListFile can not pickup files whose timestamp is older than the
>> previously processed latest timestamp (NIFI-2383)
>>
>> # NIFI-4096
>> I created JIRA NIFI-4096 to address issue#1 above, by adding
>> deterministic logic to detect target filesystem timestamp precision.
>> With NIFI-4096, ListFile can list whole 10,000 files created by the
>> command you shared before without missing anything:
>>
>> ```
>> for i in {1..10000}; do touch ./test_$i; done
>> ```
>>
>> The PR is ready for review. I appreciate if you can test the fix with
>> your use case.
>>
>> Additionally, I refactored variable names in AbstractListProcessor to
>> explain purpose and timestamp unit better. I hope it makes the code
>> more readable and maintainable.
>>
>> # NIFI-3332
>> I'm thinking about adding a processor property to specify whether
>> track the listed filenames with the latest processed timestamp.
>> Although it will be less efficient, it'd be good for some use cases.
>>
>> # NIFI-2383
>> This is the most difficult case to handle right with only timestamp.
>> We need different processor which can use watch API..
>>
>> Any comment would be appreciated.
>>
>> Thanks,
>> Koji
>>
>> On Tue, Jun 6, 2017 at 9:18 PM, Koji Kawamura &lt;
>
>> ijokarumawak@
>
>> &gt; wrote:
>>> Hi Roman,
>>>
>>> I think NIFI-3332 is probably related as I can see timestamps in logs
>>> don't have milliseconds.
>>>
>>> I've been considering how we can support all corner cases with minimal
>>> state to persist, and make it works even if the filesystem only
>>> provide last modified timestamp in seconds precision.
>>> Changing code and testing locally, but not ready for send a PR yet,
>>> and I am not fully confident on how to fix.
>>>
>>> Any suggestion or insight would be appreciated to make these ListXXXX
>>> processor better.
>>>
>>> Thanks,
>>> Koji
>>>
>>> On Tue, Jun 6, 2017 at 8:54 PM, Roman &lt;
>
>> ramon9869@
>
>> &gt; wrote:
>>>> Hi there,
>>>>
>>>> During digging into this issue, I found open issue in jira  NIFI-3332
>>>> &lt;https://issues.apache.org/jira/browse/NIFI-3332&gt;  . Can it be
>>>> related to my
>>>> situation with missed milliseconds?
>>>>
>>>> Thanks
>>>> Roman
>>>>
>>>>
>>>> Koji Kawamura-2 wrote
>>>>> Hello Roman,
>>>>>
>>>>> It seems the resolution of last modified timestamp depends on the file
>>>>> system implementation.
>>>>> https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond
>>>>>
>>>>> I reproduced the same behavior on OS X, which uses HFS that has the
>>>>> same limitation of resolution in seconds.
>>>>> https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java
>>>>>
>>>>> Which file system are you using on your Ubuntu? If it is ext3, then
>>>>> changing it to ext4 may address the issue.
>>>>>
>>>>> Thanks,
>>>>> Koji
>>>>>
>>>>> On Thu, Jun 1, 2017 at 1:25 AM, Roman &lt;
>>>>
>>>>> ramon9869@
>>>>
>>>>> &gt; wrote:
>>>>>> Hi there, i need help.
>>>>>>
>>>>>> We prepare high load project and tested this processors. All time see
>>>>>> listing.timestamp and processed.timestamp keys without milliseconds
>>>>>> (xxxxxxxxxx000). In this way, if generate several files in one second,
>>>>>> not
>>>>>> all files will be listened.
>>>>>>
>>>>>>
>>>>>> Test:
>>>>>> 1. start processor ListFile/ListSFTP
>>>>>> 2. generate 10000 zero size files. my command:  for i in {1..10000};
>>>>>> do
>>>>>> touch ./test_$i; done
>>>>>> 3. see processor stats: out 3952 (0 bytes)
>>>>>>
>>>>>>
>>>>>> I'm somewhere wrong? Or is it a bug nifi/java/etc?
>>>>>>
>>>>>> Environment
>>>>>>
>>>>>> Ubuntu 14.04.5 LTS, x64, ext4 file system
>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>>>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Roman
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
>>>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>>>> Nabble.com.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16118.html
>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>> Nabble.com.
>
>
>
>
>
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16221.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply via email to