[ 
https://issues.apache.org/jira/browse/CAMEL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jyrki Ruuskanen updated CAMEL-8421:
-----------------------------------
    Description: 
I'm a fan of noop=true in file consumers since it means I don't have to worry 
about how many readers I have and where. But eventually I came across a 
scenario where current features are not sufficient.

Let's say we have a source system which writes files with name 
<timestamp>_something.xml, and it won't use temp files or .done marker files or 
anything like that. We want to get the latest file as soon as it's created. 
Consider the following route:

{code}
from("file:////somewhere/data?noop=true&include=.*_something[.]xml&readLock=changed&sortBy=file:name")
        .aggregate(constant(true), new 
UseLatestAggregationStrategy()).completionFromBatchConsumer()
                .to("amq:topic:something");
{code}

When this route is started it will go through the files in order and get the 
last one. Then it will wait for new files. This works fine as long as the 
writer is not "slow".

Now, we had cases of incomplete files being read and I was requested to not to 
read the file before it is 10 minutes old, just in case. If I increase 
readLockCheckInterval to 10 minutes getting to the latest file at route startup 
will take close to forever. The current readLock=changed implementation always 
sleeps for at least one readLockCheckInterval per file.

If we had readLockMinAge option to define the minimum age for the target file 
the consumer could acquire readLock on the first poll and breeze through the 
files until too young a file is reached.

The route below would poll a file every 500ms (default poll delay), while the 
current readLock=changed would take 1500ms (default poll delay + default 
readLockCheckInterval) per file. Consumer goes through the files until it hits 
the end and gets the last one as soon as it becomes old enough.

{code}
from("file:////somewhere/data?noop=true&include=.*_something[.]xml&readLock=changed&readLockMinAge=600000&sortBy=file:name")
        .aggregate(constant(true), new 
UseLatestAggregationStrategy()).completionFromBatchConsumer()
                .to("amq:topic:something");
{code}


  was:
I'm a fan of noop=true in file consumers since I don't have to worry about how 
many readers I have and where. Finally I came across a scenario where current 
features are not sufficient.

Let's say we have a source system which writes files with name 
<timestamp>_something.xml, and it won't use temp files or .done marker files or 
anything like that. We want to get the latest file as soon as it's created. 
Consider the following route:

{code}
from("file:////somewhere/data?noop=true&include=.*_something[.]xml&readLock=changed&sortBy=file:name")
        .aggregate(constant(true), new 
UseLatestAggregationStrategy()).completionFromBatchConsumer()
                .to("amq:topic:something");
{code}

When this route is started it will go through the files in order and get the 
last one. Then it will wait for new files. This works fine as long as the 
writer is not "slow".

Now, we had cases of incomplete files being read and I was requested to not to 
read the file before it is 10 minutes old, just in case. If I increase 
readLockCheckInterval to 10 minutes getting to the latest file at route startup 
will take close to forever. The current readLock=changed implementation always 
sleeps for at least one readLockCheckInterval per file.

If we had readLockMinAge option to define the minimum age for the target file 
the consumer could acquire readLock on the first poll and breeze through the 
files until too young a file is reached.

The route below would poll a file every 500ms (default poll delay), while the 
current readLock=changed would take 1500ms (default poll delay + default 
readLockCheckInterval) per file. Consumer goes through the files until it hits 
the end and gets the last one as soon as it becomes old enough.

{code}
from("file:////somewhere/data?noop=true&include=.*_something[.]xml&readLock=changed&readLockMinAge=600000&sortBy=file:name")
        .aggregate(constant(true), new 
UseLatestAggregationStrategy()).completionFromBatchConsumer()
                .to("amq:topic:something");
{code}



> Add minimum age option to readLock=changed
> ------------------------------------------
>
>                 Key: CAMEL-8421
>                 URL: https://issues.apache.org/jira/browse/CAMEL-8421
>             Project: Camel
>          Issue Type: Improvement
>          Components: camel-core, camel-ftp
>            Reporter: Jyrki Ruuskanen
>            Priority: Minor
>
> I'm a fan of noop=true in file consumers since it means I don't have to worry 
> about how many readers I have and where. But eventually I came across a 
> scenario where current features are not sufficient.
> Let's say we have a source system which writes files with name 
> <timestamp>_something.xml, and it won't use temp files or .done marker files 
> or anything like that. We want to get the latest file as soon as it's 
> created. Consider the following route:
> {code}
> from("file:////somewhere/data?noop=true&include=.*_something[.]xml&readLock=changed&sortBy=file:name")
>       .aggregate(constant(true), new 
> UseLatestAggregationStrategy()).completionFromBatchConsumer()
>               .to("amq:topic:something");
> {code}
> When this route is started it will go through the files in order and get the 
> last one. Then it will wait for new files. This works fine as long as the 
> writer is not "slow".
> Now, we had cases of incomplete files being read and I was requested to not 
> to read the file before it is 10 minutes old, just in case. If I increase 
> readLockCheckInterval to 10 minutes getting to the latest file at route 
> startup will take close to forever. The current readLock=changed 
> implementation always sleeps for at least one readLockCheckInterval per file.
> If we had readLockMinAge option to define the minimum age for the target file 
> the consumer could acquire readLock on the first poll and breeze through the 
> files until too young a file is reached.
> The route below would poll a file every 500ms (default poll delay), while the 
> current readLock=changed would take 1500ms (default poll delay + default 
> readLockCheckInterval) per file. Consumer goes through the files until it 
> hits the end and gets the last one as soon as it becomes old enough.
> {code}
> from("file:////somewhere/data?noop=true&include=.*_something[.]xml&readLock=changed&readLockMinAge=600000&sortBy=file:name")
>       .aggregate(constant(true), new 
> UseLatestAggregationStrategy()).completionFromBatchConsumer()
>               .to("amq:topic:something");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to