L.S.,

Just added my pair of eyes ;). One part of the problem is indeed the list of exchanges that is returned by the expression, but I think you're also reading the entire file into memory a first time for tokenizing it. ExpressionBuilder.tokenizeExpression() converts the type to string and then uses a StringTokenizer on that. I think we could add support there for tokenizing File, InputStreams and Readers directly using a Scanner.

Regards,

Gert

Claus Ibsen wrote:
Hi

Looking into the source code of the splitter it looks like it creates the list 
of splitted exchanges before they are being processed. That is why it then will 
consume memory for big files.

Maybe somekind of batch size option is needed so you can set for instance 
number, say 20 as batch size.

   .splitter(body(InputStream.class).tokenize("\r\n").batchSize(20))

Could you create a JIRA ticket for this improvement?
Btw how big is the files you use? The file component uses a File as the object. So when you split using the input stream then Camel should use the type converter from File -> InputStream, that doesn't read the entire content into memory. This happends in the splitter where it creates the entire list of new exchanges to fire.

At least that is what I can read from the source code after a long days work, 
so please read the code as 4 eyes is better that 2 ;)



Med venlig hilsen
Claus Ibsen
......................................
Silverbullet
Skovsgårdsvænget 21
8362 Hørning
Tlf. +45 2962 7576
Web: www.silverbullet.dk

-----Original Message-----
From: Bart Frackiewicz [mailto:[EMAIL PROTECTED] Sent: 2. september 2008 17:40
To: camel-user@activemq.apache.org
Subject: Splitter for big files

Hi,

i am using this route for a couple of CSV file routes:

   from("file:/tmp/input/?delete=true")
   .splitter(body(InputStream.class).tokenize("\r\n"))
   .beanRef("myBean", "process")
   .to("file:/tmp/output/?append=true")

This works fine for small CSV files, but for big files i noticed
that camel uses a lot of memory, it seems that camel is reading
the file into memory. What is the configuration to use a stream
in the splitter?

I recognized the same behaviour in the xpath splitter:

   from("file:/tmp/input/?delete=true")
   .splitter(ns.xpath("//member"))
   ...

BTW, i found a posting from march, where James suggest following
implementation for an own splitter:

-- quote --

   from("file:///c:/temp?noop=true)").
     splitter().method("myBean", "split").
     to("activemq:someQueue")

Then register "myBean" with a split method...

class SomeBean {
   public Iterator split(File file) {
      /// figure out how to split this file into rows...
   }
}
-- quote --

But this won't work for me (Camel 1.4).

Bart


Reply via email to