On 12 March 2012 10:31, Emmanuel Bourg <ebo...@apache.org> wrote:
> I have identified the performance killer, it's the ExtendedBufferedReader.
> It implements a complex logic to fetch one character ahead, but this extra
> character is rarely used. I have implemented a simpler look ahead using
> mark/reset as suggested by Bob Smith in CSV-42 and the performance improved
> by 30%.

Java has a PushbackReader class - could that not be used?

> Now the parsing is down to 3406 ms, and that's almost without touching the
> parser yet.
>
> Emmanuel Bourg
>
>
> Le 11/03/2012 15:05, Emmanuel Bourg a écrit :
>
>> Hi,
>>
>> I compared the performance of Commons CSV with the other CSV parsers
>> available. I took the world cities file from Maxmind as a test file [1],
>> it's a big file of 130M with 2.8 million records.
>>
>> Here are the results obtained on a Core 2 Duo E8400 after several
>> iterations to let the JIT compiler kick in:
>>
>> Direct read 750 ms
>> Java CSV 3328 ms
>> Super CSV 3562 ms (+7%)
>> OpenCSV 3609 ms (+8.4%)
>> GenJava CSV 3844 ms (+15.5%)
>> Commons CSV 4656 ms (+39.9%)
>> Skife CSV 4813 ms (+44.6%)
>>
>> I also tried Nuiton CSV and Esperio CSV but I couldn't figure how to use
>> them.
>>
>> I haven't analyzed why Commons CSV is slower yet, but it seems there is
>> room for improvements. The memory usage will have to be compared too,
>> I'm looking for a way to measure it.
>>
>>
>> Emmanuel Bourg
>>
>> [1] http://www.maxmind.com/download/worldcities/worldcitiespop.txt.gz
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to