Hi Roman and all,
As I investigated further on ListFile processor, I found those are two
different issues.
Also I found another JIRA related to ListFile. Currently there seem to
be three issues:
1. ListFile can miss files with filesystems those do not provide
timestamps in milliseconds precision (NIFI-4096)
2. ListFile can miss files having the same timestamp same as the
previously processed latest timestamp (NIFI-3332)
3. ListFile can not pickup files whose timestamp is older than the
previously processed latest timestamp (NIFI-2383)
# NIFI-4096
I created JIRA NIFI-4096 to address issue#1 above, by adding
deterministic logic to detect target filesystem timestamp precision.
With NIFI-4096, ListFile can list whole 10,000 files created by the
command you shared before without missing anything:
```
for i in {1..10000}; do touch ./test_$i; done
```
The PR is ready for review. I appreciate if you can test the fix with
your use case.
Additionally, I refactored variable names in AbstractListProcessor to
explain purpose and timestamp unit better. I hope it makes the code
more readable and maintainable.
# NIFI-3332
I'm thinking about adding a processor property to specify whether
track the listed filenames with the latest processed timestamp.
Although it will be less efficient, it'd be good for some use cases.
# NIFI-2383
This is the most difficult case to handle right with only timestamp.
We need different processor which can use watch API..
Any comment would be appreciated.
Thanks,
Koji
On Tue, Jun 6, 2017 at 9:18 PM, Koji Kawamura <[email protected]> wrote:
> Hi Roman,
>
> I think NIFI-3332 is probably related as I can see timestamps in logs
> don't have milliseconds.
>
> I've been considering how we can support all corner cases with minimal
> state to persist, and make it works even if the filesystem only
> provide last modified timestamp in seconds precision.
> Changing code and testing locally, but not ready for send a PR yet,
> and I am not fully confident on how to fix.
>
> Any suggestion or insight would be appreciated to make these ListXXXX
> processor better.
>
> Thanks,
> Koji
>
> On Tue, Jun 6, 2017 at 8:54 PM, Roman <[email protected]> wrote:
>> Hi there,
>>
>> During digging into this issue, I found open issue in jira NIFI-3332
>> <https://issues.apache.org/jira/browse/NIFI-3332> . Can it be related to my
>> situation with missed milliseconds?
>>
>> Thanks
>> Roman
>>
>>
>> Koji Kawamura-2 wrote
>>> Hello Roman,
>>>
>>> It seems the resolution of last modified timestamp depends on the file
>>> system implementation.
>>> https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond
>>>
>>> I reproduced the same behavior on OS X, which uses HFS that has the
>>> same limitation of resolution in seconds.
>>> https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java
>>>
>>> Which file system are you using on your Ubuntu? If it is ext3, then
>>> changing it to ext4 may address the issue.
>>>
>>> Thanks,
>>> Koji
>>>
>>> On Thu, Jun 1, 2017 at 1:25 AM, Roman <
>>
>>> ramon9869@
>>
>>> > wrote:
>>>> Hi there, i need help.
>>>>
>>>> We prepare high load project and tested this processors. All time see
>>>> listing.timestamp and processed.timestamp keys without milliseconds
>>>> (xxxxxxxxxx000). In this way, if generate several files in one second,
>>>> not
>>>> all files will be listened.
>>>>
>>>>
>>>> Test:
>>>> 1. start processor ListFile/ListSFTP
>>>> 2. generate 10000 zero size files. my command: for i in {1..10000}; do
>>>> touch ./test_$i; done
>>>> 3. see processor stats: out 3952 (0 bytes)
>>>>
>>>>
>>>> I'm somewhere wrong? Or is it a bug nifi/java/etc?
>>>>
>>>> Environment
>>>>
>>>> Ubuntu 14.04.5 LTS, x64, ext4 file system
>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
>>>>
>>>>
>>>> Thanks
>>>> Roman
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>> Nabble.com.
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16118.html
>> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.