Btw unit testing - where you want to process all filed on startup - and never want to edit/delete them was the main motivation & use case for noop.
We definitely need to support different strategies as there are many different use cases. Eg sometimes keeping a cache of all files processed won't scale due to huge number of files. Sometimes you want to process a file again if it is touched. I understand that sometimes timestamps are dodgy; but I would rather us support all use cases cleanly using different pluggable strategies than disable useful functionality (like testing! :-) On 19/11/2008, Gert Vanthienen <[EMAIL PROTECTED]> wrote: > L.S., > > It almost sounds as if we need two separate different strategies that > can be configured on the file endpoint: > - one to determine which files need to be processed (the basic one just > takes all the files in a directory but we can build additional ones that > use a storage mechanisms) > - another one (like we already have now) that determines what to do with > the file after a successful or failed exchange > > FWIW, I actually like the simple noop one for creating unit tests > because it allows you to just refer to the /src/test/resources folder in > your project instead of having to copy them to a work folder first. > > Regards, > > Gert > > Claus Ibsen wrote: >> Hi >> >> Oh I have thought that some end-users want FileConsumer to keep retry >> consuming the same filer over and over again if it could not be >> processed, so the postAction could have a 3rd option or we could have >> an option to set this feature (kinda like noop but only for when the >> file could not be processed) >> >> >> >> /Claus Ibsen >> Apache Camel Committer >> Blog: http://davsclaus.blogspot.com/ >> >> >> >> On Wed, Nov 19, 2008 at 10:35 AM, Claus Ibsen <[EMAIL PROTECTED]> >> wrote: >> >>> Hi >>> >>> The store idea is good as it can be used for the idempotent consumer >>> as well so we can use it to persist as well, so it can survive >>> restarts. We need to allow it to be pluggable so users can use a >>> shared DB if they use grid, or maybe some of that fancy terracote >>> thing that distributes memory caches. >>> >>> But turning back to the file consumer. I really think the noop=true >>> options should be deprecated as well. The file is like an inbox where >>> if a file is dropped it is consumed once. After processing the file is >>> deleted or moved to another destination. Now with this "remember list" >>> we have a serious issue if the inbox receives file with the same name >>> but the content of the file is different. What if someone uploads a >>> file to a FTP server and the filename is always fixed (= the same). >>> Now we have a complex situation as we need to hash the file content to >>> be able to determine if the file is different, or not support it at >>> all. >>> >>> I am mostly keen to keep it simpler and as Hadrian said "keep it lean". >>> >>> So I am voting for: >>> a) to remove noop as wel >>> b) to always delete or move file after processing (we should support >>> moving files to a different folder if exchange failed) >>> >>> Ad b) >>> We should support moving files using different pattern depending on >>> - exchange OK >>> - exchange Failed >>> I have though about introducing some better URI options to express this >>> >>> Something along the lines of (think of better uri option names) >>> postAction=delete >>> >>> postAction=move >>> moveCompleteExpression=./done/${file:name}.bak >>> moveErrorExpression=./error/${date:now:yyyyMMdd}/${file:name}.error >>> >>> And we should have defaults as well, so if moveErrorExpression is >>> omitted it defaults to the completed move. >>> >>> >>> And then we could consider @deprecating all the other pre and postfix >>> URI option we have in favor of the power of the expression instead. >>> >>> >>> >>> But the list store is not wasted as we can use it for the idempotent >>> as well and for other areas. >>> >>> >>> >>> >>> >>> /Claus Ibsen >>> Apache Camel Committer >>> Blog: http://davsclaus.blogspot.com/ >>> >>> >>> >>> On Wed, Nov 19, 2008 at 4:04 AM, Jon Anstey <[EMAIL PROTECTED]> wrote: >>> >>>> Hmmm... yeah, I like this suggestion. It may be just what we need here! >>>> Thanks! >>>> >>>> On Tue, Nov 18, 2008 at 4:11 PM, Gert Vanthienen >>>> <[EMAIL PROTECTED]>wrote: >>>> >>>> >>>>> Jon, >>>>> >>>>> How about if we enhance the file consumer to keep track of files that >>>>> have >>>>> already been processed instead of using a timestamp? The timestamp >>>>> approach >>>>> is a bit error-prone (just touching the file by accident can set it off >>>>> again). >>>>> If we provide multiple implementations for the storage mechanism to >>>>> keep >>>>> this information, we can cover a lot of use cases (similar to the >>>>> message id >>>>> store for an idempotent consumer): >>>>> - an in-memory store for testing purposes >>>>> - a file-based implementation for basic production environments >>>>> - a database- or ldap-backed implementation for clustered environments, >>>>> where a file can arrive through multiple directories >>>>> >>>>> Regards, >>>>> >>>>> Gert >>>>> >>>>> Jon Anstey schreef: >>>>> >>>>> The algorithm that checks whether a file should be consumed based on >>>>> >>>>>> timestamp has been deprecated for a while now (see >>>>>> http://activemq.apache.org/camel/file.html). I've removed this on my >>>>>> local >>>>>> branch only to realize that it introduces a bit of an ugly problem... >>>>>> essentially since files will be processed always (modified or not) in >>>>>> the >>>>>> case of noop=true or if a fault has been set, the same file will be >>>>>> processed over and over again... not good! >>>>>> >>>>>> The original intent of removing the timestamp checking was to simplify >>>>>> the >>>>>> consumer. I think that in trying to get around this new issue we may >>>>>> make >>>>>> it >>>>>> even more complicated! >>>>>> >>>>>> I'm wondering if there is a simple solution to this that I'm just not >>>>>> seeing >>>>>> yet or if maybe this issue was discussed before... >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> -- >>>> Cheers, >>>> Jon >>>> >>>> http://janstey.blogspot.com/ >>>> >>>> >> >> > > -- James ------- http://macstrac.blogspot.com/ Open Source Integration http://fusesource.com/
