L.S.,
It almost sounds as if we need two separate different strategies that
can be configured on the file endpoint:
- one to determine which files need to be processed (the basic one just
takes all the files in a directory but we can build additional ones that
use a storage mechanisms)
- another one (like we already have now) that determines what to do with
the file after a successful or failed exchange
FWIW, I actually like the simple noop one for creating unit tests
because it allows you to just refer to the /src/test/resources folder in
your project instead of having to copy them to a work folder first.
Regards,
Gert
Claus Ibsen wrote:
Hi
Oh I have thought that some end-users want FileConsumer to keep retry
consuming the same filer over and over again if it could not be
processed, so the postAction could have a 3rd option or we could have
an option to set this feature (kinda like noop but only for when the
file could not be processed)
/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/
On Wed, Nov 19, 2008 at 10:35 AM, Claus Ibsen <[EMAIL PROTECTED]> wrote:
Hi
The store idea is good as it can be used for the idempotent consumer
as well so we can use it to persist as well, so it can survive
restarts. We need to allow it to be pluggable so users can use a
shared DB if they use grid, or maybe some of that fancy terracote
thing that distributes memory caches.
But turning back to the file consumer. I really think the noop=true
options should be deprecated as well. The file is like an inbox where
if a file is dropped it is consumed once. After processing the file is
deleted or moved to another destination. Now with this "remember list"
we have a serious issue if the inbox receives file with the same name
but the content of the file is different. What if someone uploads a
file to a FTP server and the filename is always fixed (= the same).
Now we have a complex situation as we need to hash the file content to
be able to determine if the file is different, or not support it at
all.
I am mostly keen to keep it simpler and as Hadrian said "keep it lean".
So I am voting for:
a) to remove noop as wel
b) to always delete or move file after processing (we should support
moving files to a different folder if exchange failed)
Ad b)
We should support moving files using different pattern depending on
- exchange OK
- exchange Failed
I have though about introducing some better URI options to express this
Something along the lines of (think of better uri option names)
postAction=delete
postAction=move
moveCompleteExpression=./done/${file:name}.bak
moveErrorExpression=./error/${date:now:yyyyMMdd}/${file:name}.error
And we should have defaults as well, so if moveErrorExpression is
omitted it defaults to the completed move.
And then we could consider @deprecating all the other pre and postfix
URI option we have in favor of the power of the expression instead.
But the list store is not wasted as we can use it for the idempotent
as well and for other areas.
/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/
On Wed, Nov 19, 2008 at 4:04 AM, Jon Anstey <[EMAIL PROTECTED]> wrote:
Hmmm... yeah, I like this suggestion. It may be just what we need here!
Thanks!
On Tue, Nov 18, 2008 at 4:11 PM, Gert Vanthienen
<[EMAIL PROTECTED]>wrote:
Jon,
How about if we enhance the file consumer to keep track of files that have
already been processed instead of using a timestamp? The timestamp approach
is a bit error-prone (just touching the file by accident can set it off
again).
If we provide multiple implementations for the storage mechanism to keep
this information, we can cover a lot of use cases (similar to the message id
store for an idempotent consumer):
- an in-memory store for testing purposes
- a file-based implementation for basic production environments
- a database- or ldap-backed implementation for clustered environments,
where a file can arrive through multiple directories
Regards,
Gert
Jon Anstey schreef:
The algorithm that checks whether a file should be consumed based on
timestamp has been deprecated for a while now (see
http://activemq.apache.org/camel/file.html). I've removed this on my
local
branch only to realize that it introduces a bit of an ugly problem...
essentially since files will be processed always (modified or not) in the
case of noop=true or if a fault has been set, the same file will be
processed over and over again... not good!
The original intent of removing the timestamp checking was to simplify the
consumer. I think that in trying to get around this new issue we may make
it
even more complicated!
I'm wondering if there is a simple solution to this that I'm just not
seeing
yet or if maybe this issue was discussed before...
--
Cheers,
Jon
http://janstey.blogspot.com/