On 03/07/12 22:20, Rob Vesse wrote:
Ok, you'll be able to get the file from
https://dl.dropbox.com/u/590790/test.tsv now
Yes if that code could go into NodeFactory in some form or other that'd be
great
Rob
Rob,
I've grabbed a local copy - thanks. It's x20 smaller when compressed!
I'll profile the new code to see where the time is going - a quick code
inspection suggests that creating the tokenizer isn't too bad but there
are a number of (cheap?) objects created and this is for every item so
maybe that's just a little too much.
If we wanted it to go as fast as possible, we could have a version of
TokenizerText that output tokens for raw tab and new lines (not treat
them as surpressable white space) then use a tokenizer on the input data
directly. That's more than just a little tweaking though so first
understand/improve what we've got.
I'll move the code to NodeFactory - it'll do variables as well so the
header line can use it.
Andy