[ 
https://issues.apache.org/jira/browse/NIFI-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Kawamura updated NIFI-4069:
--------------------------------
    Description: 
For some filesystems such as Mac OS X HFS (Hierarchical File System) or EXT3 
are known that only support timestamp in seconds precision. Also some FTP 
server is reported that it can only provides timestamp precision in minutes.

This can cause files to NOT be listed as ListXXX processors logic expects 
timestamps in milliseconds.

Specifically, if generate several files in one second, not all files will be 
listened.

Steps to reproduce:
1. start processor ListFile
2. generate 10000 zero size files with following command:
{code}
for i in {1..10000}; do touch ./test_$i; done
{code}
3. see processor stats: out 3952 (0 bytes)

Current AbstractListProcessor logic adopts LISTING_LAG_NANOS (100ms) and 
postponing the files those have the latest timestamp within a listing iteration 
to next iteration, however with those filesystem without milliseconds 
precision, these logics do not work as expected.

This issue is originally reported at nifi-dev ML. 
http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-td16037.html

  was:
For some filesystems such as Mac OS X HFS (Hierarchical File System) or EXT3 
are known that only support timestamp in seconds precision. Also some FTP 
server is reported that it can only provides timestamp precision in minutes.

This can cause files to NOT be listed as ListXXX processors logic expects 
timestamps in milliseconds.

Specifically, if generate several files in one second, not all files will be 
listened.

Steps to reproduce:
1. start processor ListFile
2. generate 10000 zero size files with following command:
{code}
for i in {1..10000}; do touch ./test_$i; done
{code}
3. see processor stats: out 3952 (0 bytes)

Current AbstractListProcessor logic adopts LISTING_LAG_NANOS (100ms) and 
postponing the files those have the latest timestamp within a listing iteration 
to next iteration, however with those filesystem without milliseconds 
precision, these logics do not work as expected.


> ListXXX processors can miss files those created while the processor is 
> listing and filesystem does not provide timestamp milliseconds precision
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-4069
>                 URL: https://issues.apache.org/jira/browse/NIFI-4069
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.0.0
>            Reporter: Koji Kawamura
>            Assignee: Koji Kawamura
>         Attachments: ListFilesWithoutMilliseconds.png
>
>
> For some filesystems such as Mac OS X HFS (Hierarchical File System) or EXT3 
> are known that only support timestamp in seconds precision. Also some FTP 
> server is reported that it can only provides timestamp precision in minutes.
> This can cause files to NOT be listed as ListXXX processors logic expects 
> timestamps in milliseconds.
> Specifically, if generate several files in one second, not all files will be 
> listened.
> Steps to reproduce:
> 1. start processor ListFile
> 2. generate 10000 zero size files with following command:
> {code}
> for i in {1..10000}; do touch ./test_$i; done
> {code}
> 3. see processor stats: out 3952 (0 bytes)
> Current AbstractListProcessor logic adopts LISTING_LAG_NANOS (100ms) and 
> postponing the files those have the latest timestamp within a listing 
> iteration to next iteration, however with those filesystem without 
> milliseconds precision, these logics do not work as expected.
> This issue is originally reported at nifi-dev ML. 
> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-td16037.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to