On 03/07/12 22:20, Rob Vesse wrote:
Ok, you'll be able to get the file from
https://dl.dropbox.com/u/590790/test.tsv now

Yes if that code could go into NodeFactory in some form or other that'd be
great

Rob


Rob,

I've grabbed a local copy - thanks.  It's x20 smaller when compressed!

I'll profile the new code to see where the time is going - a quick code inspection suggests that creating the tokenizer isn't too bad but there are a number of (cheap?) objects created and this is for every item so maybe that's just a little too much.

If we wanted it to go as fast as possible, we could have a version of TokenizerText that output tokens for raw tab and new lines (not treat them as surpressable white space) then use a tokenizer on the input data directly. That's more than just a little tweaking though so first understand/improve what we've got.

I'll move the code to NodeFactory - it'll do variables as well so the header line can use it.

        Andy

Reply via email to