File has been edited by Claus Ibsen (Sep 16, 2008).

Change summary:

CAMEL-909

(View changes)

Content:

File Component

The File component provides access to file systems; allowing files to be processed by any other Camel Components or messages from other components can be saved to disk.

URI format

file:fileOrDirectoryName[?options]

or


file://fileOrDirectoryName[?options]

Where fileOrDirectoryName represents the underlying file name. Camel will determine if fileOrDirectoryName is a file or directory.

Important Information

See the section "Common gotchas with folder and filenames" below.

Timestamp

In Camel 1.4 or older the file consumer uses an internal timestamp for last polling. This timestamp is used to match for new or changed files: if file modified timestamp > last poll timestamp => file can be consumed.

You can disable this algorithm with the new option consumer.timestamp=false or setting the consumer.alwaysConsume=true.
This algorithm has been marked for @deprecation and will be removed in Camel 2.0

We encourage you to use a different strategy for matching new files: such as deleting or moving the file after processing, then new files is always if there exists a file in the directory to poll.

URI Options

Name Default Value Description
initialDelay 1000 Camel 1.3 or older: milliseconds before polling the file/directory starts
delay 500 Camel 1.3 or older: milliseconds before the next poll of the file/directory
useFixedDelay false Camel 1.3 or older: true to use fixed delay between pools, otherwise fixed rate is used. See ScheduledExecutorService in JDK for details.
consumer.initialDelay 1000 Camel 1.4: milliseconds before polling the file/directory starts
consumer.delay 500 Camel 1.4: milliseconds before the next poll of the file/directory
consumer.useFixedDelay false Camel 1.4: true to use fixed delay between pools, otherwise fixed rate is used. See ScheduledExecutorService in JDK for details.
consumer.exclusiveReadLock true Camel 1.5: Used by FileConsumer. If set to true Camel will only poll the files if it has exclusive read lock to the file (= the file is not in progress of being written). Camel will wait until the file lock is granted. Setting to false Camel will poll the file even if its in progress of being written (= this is the behavior of Camel 1.4).
consumer.recursive true/false if a directory, will look for changes in files in all the sub directories. Notice: Default value in Camel 1.4 or older is true. In Camel 1.5 the default value is changed to false.
consumer.regexPattern null will only fire a an exchange for a file that matches the regex pattern
consumer.alwaysConsume false Camel 1.5: @deprecated. Is used to force consuming the file even if it hasn't changed since last time it was consumed. Is useful if you for instance move files back into a folder and the file keeps the original timestamp.
consumer.timestamp true Camel 1.5: @deprecated. This option is introduced to have similar name as the same option in FTP component. Setting this option will internally in Camel set the consumer.alwaysConsume option to the ! value. So if this option is true, then alwaysConsume is false and vice verca.
lock true if true will lock the file for the duration of the processing
delete false If delete is true then the file will be deleted when it is processed (the default is to move it, see below)
noop false If true then the file is not moved or deleted in any way (see below). This option is good for read only data, or for ETL type requirements
moveNamePrefix .camel/ The prefix String perpended to the filename when moving it. For example to move processed files into the done directory, set this value to 'done/'
moveNamePostfix null The postfix String appended to the filename when moving it. For example to rename processed files from foo to foo.old set this value to '.old'
append true When writing do we append to the end of the file, or replace it?
autoCreate true If set to true Camel will create the directory to the file if the file path does not exists - Uses File#mkdirs()
bufferSize 128kb Write buffer sized in bytes. Camel uses a default of 128 * 1024 bytes.
ignoreFileNameHeader false If this flag is enabled then producers will ignore the 'org.apache.camel.file.name' header and generate a new dynamic filename
excludedNamePrefixes null @Deprecated. Is used to exclude files if filename is starting with any of the given prefixes. The parameter is a String[]
excludedNamePostfixes null @Deprecated. Is used to exclude files if filename is ending with any of the given prefixes. The parameter is a String[]
excludedNamePrefix null Camel 1.5: Is used to exclude files if filename is starting with the given prefix.
excludedNamePostfix null Camel 1.5: Is used to exclude files if filename is ending with the given postfix.
generateEmptyExchangeWhenIdle false Option only for the FileConsumer. If this option is true and there was no files to process we simulate processing a single empty file, so an exchange is fired. Note: In this situation the File attribute in FileExchange is null.
_expression_ null Camel 1.5: Use _expression_ to dynamically set the filename. This allows you to very easily set dynamic pattern style filenames. If an _expression_ is set it take precedes over the org.apache.camel.file.name header. (Note: The header can itself also be an _expression_). The _expression_ options supports both String and _expression_ types. If the _expression_ is a String type then its always evaluated using the File Language. If the _expression_ is an _expression_ type then this type is of course used as it - this allows for instance to use OGNL as _expression_ too.

By default the file is locked for the duration of the processing. Also when files are processed they are moved into the .camel subdirectory; so that they appear to be deleted.

The File Consumer will always skip any file which name starts with a dot, such as ".", ".camel", ".m2" or ".groovy".

The File Consumer stores internally the last poll time. This is used to avoid polling already polled files as it will compare the lastpolltime with the modification timestamp on the file. Beware that its not persistent in any way so restarting Camel will restart the lastpolltime variable and you can potentially consume the same file again. Therefore you should either delete or move consumed files to a different folder.

By default Camel will move consumed files to the sub folder .camel relative where the file was consumed.

Message Headers

The following message headers can be used to affect the behavior of the component

Header Description
org.apache.camel.file.name Specifies the output file name (relative to the endpoint directory) to be used for the output message when sending to the endpoint. If this is not present and no _expression_ either then a generated message Id is used as filename instead.
org.apache.camel.file.name.produced New in Camel 1.4: The actual absolute filepath (path + name) for the output file that was written. This header is set by Camel and its purpose is providing end-users the name of the file that was written.

Default Behavior Changed in Camel 1.5

In Camel 1.5 the file consumer will avoid polling files that is currently in the progress of being written (see option consumer.exclusiveReadLock). However this requires Camel being able to rename the file for its testing. If the Camel user hasn't this rights on the file system, you can set this option to false to revert the change to the default behavior of Camel 1.4 or older.

The recursive option has changed its default value from true to false in Camel 1.5.

Common gotchas with folder and filenames

When Camel is producing files (writing files) there are a few gotchas how to set a filename of your choice. By default Camel will use the message id as the filename, and since the message id is normally a unique generated id you will end up with filenames such as: ID-MACHINENAME\2443-1211718892437\1-0. Such a filename is not desired and therefore best practice is to provide the filename in the message header "org.apache.camel.file.name".

The sample code below produces files using the message id as the filename:

from("direct:report").to("file:target/reports");

To use report.txt as the filename you have to do:

from("direct:report").setHeader(FileComponent.HEADER_FILE_NAME, constant("report.txt")).to( "file:target/reports");

Canel will default try to auto create the folder if it does not exists, and this is a bad combination with the UUID filename from above. So if you have:

from("direct:report").to("file:target/reports/report.txt");

And you want Camel to store in the file report.txt and autoCreate is true, then Camel will create the folder: target/reports/report.txt/. To fix this set the autoCreate=false and create the folder target/reports manually.

from("direct:report").to("file:target/reports/report.txt?autoCreate=false");

With auto create disabled Camel will store the report in the report.txt as expected.

File consumer, scanning for new files gotcha

The file consumer scans for new files by keeping an internal modified timestamp of the last consumed file. So if you copy a new file that has an older modified timestamp, then Camel will not pickup this file. This can happen if you are testing and you copy the same file back to the folder that has just been consumed. To remedy this modify the timestamp before copying the file back.

Filename _expression_

In Camel 1.5 we have support for setting the filename using an _expression_. This can be set either using the _expression_ option or as a string based File Language _expression_ in the org.apache.camel.file.name header. See the File Language for some samples.

Samples

Read from a directory and write to another directory

from("file://inputdir/?delete=true").to("file://outputdir")

Listen on a directory and create a message for each file dropped there. Copy the contents to the outputdir and delete the file in the inputdir.

Read from a directory and process the message in java

from("file://inputdir/").process(new Processor() {
  public void process(Exchange exchange) throws Exception {
    Object body = exchange.getIn().getBody();
    // do some business logic with the input body
  }
});

Body will be File object pointing to the file that was just dropped to the inputdir directory.

Read files from a directory and send the content to a jms queue

from("file://inputdir/").convertBodyTo(String.class).to("jms:test.queue")

By default the file endpoint sends a FileMessage which contains a File as body. If you send this directly to the jms component the jms message will only contain the File object but not the content. By converting the File to a String the message will contain the file contents what is probably what you want to do.

Writing to files

Camel is of course also able to write files, eg. producing files. In the sample below we receive some reports on the SEDA queue that we processes before they are written to a directory.

public void testToFile() throws Exception {
    template.sendBody("seda:reports", "This is a great report");

    // give time for the file to be written before assertions
    Thread.sleep(1000);

    // assert the file exists
    File file = new File("target/test-reports/report.txt");
    file = file.getAbsoluteFile();
    assertTrue("The file should have been written", file.exists());
}

protected JndiRegistry createRegistry() throws Exception {
    // bind our processor in the registry with the given id
    JndiRegistry reg = super.createRegistry();
    reg.bind("processReport", new ProcessReport());
    return reg;
}

protected RouteBuilder createRouteBuilder() throws Exception {
    return new RouteBuilder() {
        public void configure() throws Exception {
            // the reports from the seda queue is processed by our processor
            // before they are written to files in the target/reports directory
            from("seda:reports").processRef("processReport").to("file://target/test-reports");
        }
    };
}

private class ProcessReport implements Processor {

    public void process(Exchange exchange) throws Exception {
        String body = exchange.getIn().getBody(String.class);
        // do some business logic here

        // set the output to the file
        exchange.getOut().setBody(body);

        // set the output filename using java code logic, notice that this is done by setting
        // a special header property of the out exchange
        exchange.getOut().setHeader(FileComponent.HEADER_FILE_NAME, "report.txt");
    }

}

FileProducer filename gotchas

This unit test demonstrates some of the gotchas with filenames for the File Producer.

public void testProducerWithMessageIdAsFileName() throws Exception {
    Endpoint endpoint = context.getEndpoint("direct:report");
    Exchange exchange = endpoint.createExchange();
    exchange.getIn().setBody("This is a good report");

    FileEndpoint fileEndpoint = resolveMandatoryEndpoint("file:target/reports/report.txt", FileEndpoint.class);
    String id = fileEndpoint.getGeneratedFileName(exchange.getIn());

    template.send("direct:report", exchange);

    File file = new File("target/reports/report.txt/" + id);
    assertEquals("File should exists", true, file.exists());
}

public void testProducerWithConfiguedFileNameInEndpointURI() throws Exception {
    template.sendBody("direct:report2", "This is another good report");
    File file = new File("target/report2.txt");
    assertEquals("File should exists", true, file.exists());
}

public void testProducerWithHeaderFileName() throws Exception {
    template.sendBody("direct:report3", "This is super good report");
    File file = new File("target/report-super.txt");
    assertEquals("File should exists", true, file.exists());
}

protected RouteBuilder createRouteBuilder() throws Exception {
    return new RouteBuilder() {
        public void configure() throws Exception {
            from("direct:report").to("file:target/reports/report.txt");

            from("direct:report2").to("file:target/report2.txt?autoCreate=false");

            from("direct:report3").setHeader(FileComponent.HEADER_FILE_NAME, constant("report-super.txt")).to("file:target/");
        }
    };
}

Using _expression_ for filenames

In this sample we want to move consumed files to a backup folder using todays date as a sub foldername:

from("file://inbox?_expression_=backup/${date:now:yyyyMMdd}/${file:name}").to("...");

See File Language for more samples.

See Also

Reply via email to