On 12 March 2012 10:31, Emmanuel Bourg <ebo...@apache.org> wrote: > I have identified the performance killer, it's the ExtendedBufferedReader. > It implements a complex logic to fetch one character ahead, but this extra > character is rarely used. I have implemented a simpler look ahead using > mark/reset as suggested by Bob Smith in CSV-42 and the performance improved > by 30%.
Java has a PushbackReader class - could that not be used? > Now the parsing is down to 3406 ms, and that's almost without touching the > parser yet. > > Emmanuel Bourg > > > Le 11/03/2012 15:05, Emmanuel Bourg a écrit : > >> Hi, >> >> I compared the performance of Commons CSV with the other CSV parsers >> available. I took the world cities file from Maxmind as a test file [1], >> it's a big file of 130M with 2.8 million records. >> >> Here are the results obtained on a Core 2 Duo E8400 after several >> iterations to let the JIT compiler kick in: >> >> Direct read 750 ms >> Java CSV 3328 ms >> Super CSV 3562 ms (+7%) >> OpenCSV 3609 ms (+8.4%) >> GenJava CSV 3844 ms (+15.5%) >> Commons CSV 4656 ms (+39.9%) >> Skife CSV 4813 ms (+44.6%) >> >> I also tried Nuiton CSV and Esperio CSV but I couldn't figure how to use >> them. >> >> I haven't analyzed why Commons CSV is slower yet, but it seems there is >> room for improvements. The memory usage will have to be compared too, >> I'm looking for a way to measure it. >> >> >> Emmanuel Bourg >> >> [1] http://www.maxmind.com/download/worldcities/worldcitiespop.txt.gz >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org