Re: Deprecation of file consumer timestamp

james . strachan Mon, 24 Nov 2008 22:25:17 -0800

Btw unit testing - where you want to process all filed on startup -
and never want to edit/delete them was the main motivation & use case
for noop.


We definitely need to support different strategies as there are many
different use cases. Eg sometimes keeping a cache of all files
processed won't scale due to huge number of files. Sometimes you want
to process a file again if it is touched.

I understand that sometimes timestamps are dodgy; but I would rather
us support all use cases cleanly using different pluggable strategies
than disable useful functionality (like testing! :-)


On 19/11/2008, Gert Vanthienen <[EMAIL PROTECTED]> wrote:
> L.S.,
>
> It almost sounds as if we need two separate different strategies that
> can be configured on the file endpoint:
> - one to determine which files need to be processed (the basic one just
> takes all the files in a directory but we can build additional ones that
> use a storage mechanisms)
> - another one (like we already have now) that determines what to do with
> the file after a successful or failed exchange
>
> FWIW, I actually like the simple noop one for creating unit tests
> because it allows you to just refer to the /src/test/resources folder in
> your project instead of having to copy them to a work folder first.
>
> Regards,
>
> Gert
>
> Claus Ibsen wrote:
>> Hi
>>
>> Oh I have thought that some end-users want FileConsumer to keep retry
>> consuming the same filer over and over again if it could not be
>> processed, so the postAction could have a 3rd option or we could have
>> an option to set this feature (kinda like noop but only for when the
>> file could not be processed)
>>
>>
>>
>> /Claus Ibsen
>> Apache Camel Committer
>> Blog: http://davsclaus.blogspot.com/
>>
>>
>>
>> On Wed, Nov 19, 2008 at 10:35 AM, Claus Ibsen <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hi
>>>
>>> The store idea is good as it can be used for the idempotent consumer
>>> as well so we can use it to persist as well, so it can survive
>>> restarts. We need to allow it to be pluggable so users can use a
>>> shared DB if they use grid, or maybe some of that fancy terracote
>>> thing that distributes memory caches.
>>>
>>> But turning back to the file consumer. I really think the noop=true
>>> options should be deprecated as well. The file is like an inbox where
>>> if a file is dropped it is consumed once. After processing the file is
>>> deleted or moved to another destination. Now with this "remember list"
>>> we have a serious issue if the inbox receives file with the same name
>>> but the content of the file is different. What if someone uploads a
>>> file to a FTP server and the filename is always fixed (= the same).
>>> Now we have a complex situation as we need to hash the file content to
>>> be able to determine if the file is different, or not support it at
>>> all.
>>>
>>> I am mostly keen to keep it simpler and as Hadrian said "keep it lean".
>>>
>>> So I am voting for:
>>> a) to remove noop as wel
>>> b) to always delete or move file after processing (we should support
>>> moving files to a different folder if exchange failed)
>>>
>>> Ad b)
>>> We should support moving files using different pattern depending on
>>> - exchange OK
>>> - exchange Failed
>>> I have though about introducing some better URI options to express this
>>>
>>> Something along the lines of (think of better uri option names)
>>> postAction=delete
>>>
>>> postAction=move
>>> moveCompleteExpression=./done/${file:name}.bak
>>> moveErrorExpression=./error/${date:now:yyyyMMdd}/${file:name}.error
>>>
>>> And we should have defaults as well, so if moveErrorExpression is
>>> omitted it defaults to the completed move.
>>>
>>>
>>> And then we could consider @deprecating all the other pre and postfix
>>> URI option we have in favor of the power of the expression instead.
>>>
>>>
>>>
>>> But the list store is not wasted as we can use it for the idempotent
>>> as well and for other areas.
>>>
>>>
>>>
>>>
>>>
>>> /Claus Ibsen
>>> Apache Camel Committer
>>> Blog: http://davsclaus.blogspot.com/
>>>
>>>
>>>
>>> On Wed, Nov 19, 2008 at 4:04 AM, Jon Anstey <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hmmm... yeah, I like this suggestion. It may be just what we need here!
>>>> Thanks!
>>>>
>>>> On Tue, Nov 18, 2008 at 4:11 PM, Gert Vanthienen
>>>> <[EMAIL PROTECTED]>wrote:
>>>>
>>>>
>>>>> Jon,
>>>>>
>>>>> How about if we enhance the file consumer to keep track of files that
>>>>> have
>>>>> already been processed instead of using a timestamp?  The timestamp
>>>>> approach
>>>>> is a bit error-prone (just touching the file by accident can set it off
>>>>> again).
>>>>> If we provide multiple implementations for the storage mechanism to
>>>>> keep
>>>>> this information, we can cover a lot of use cases (similar to the
>>>>> message id
>>>>> store for an idempotent consumer):
>>>>> - an in-memory store for testing purposes
>>>>> - a file-based implementation for basic production environments
>>>>> - a database- or ldap-backed implementation for clustered environments,
>>>>> where a file can arrive through multiple directories
>>>>>
>>>>> Regards,
>>>>>
>>>>> Gert
>>>>>
>>>>> Jon Anstey schreef:
>>>>>
>>>>>  The algorithm that checks whether a file should be consumed based on
>>>>>
>>>>>> timestamp has been deprecated for a while now (see
>>>>>> http://activemq.apache.org/camel/file.html). I've removed this on my
>>>>>> local
>>>>>> branch only to realize that it introduces a bit of an ugly problem...
>>>>>> essentially since files will be processed always (modified or not) in
>>>>>> the
>>>>>> case of noop=true or if a fault has been set, the same file will be
>>>>>> processed over and over again... not good!
>>>>>>
>>>>>> The original intent of removing the timestamp checking was to simplify
>>>>>> the
>>>>>> consumer. I think that in trying to get around this new issue we may
>>>>>> make
>>>>>> it
>>>>>> even more complicated!
>>>>>>
>>>>>> I'm wondering if there is a simple solution to this that I'm just not
>>>>>> seeing
>>>>>> yet or if maybe this issue was discussed before...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> Cheers,
>>>> Jon
>>>>
>>>> http://janstey.blogspot.com/
>>>>
>>>>
>>
>>
>
>


-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://fusesource.com/

Re: Deprecation of file consumer timestamp

Reply via email to