After more experiments I'm less enthusiastic about providing an
optimized BufferedReader. The result of the performance test is
significantly different if the test is run alone or after all the other
unit tests (about 30% slower). When all the tests are executed, the
removal of the
On Tue, Mar 13, 2012 at 4:33 AM, Ralph Goers ralph.go...@dslextreme.com wrote:
I don't think we should be trying to recode JDK classes.
If the implementations suck, why not?
+1
--
http://www.grobmeier.de
https://www.timeandbill.de
Le 13/03/2012 01:44, sebb a écrit :
I don't think we should be trying to recode JDK classes.
I'd rather not, but we have done that in the past. FastDateFormat and
StrBuilder come to mind.
Emmanuel Bourg
smime.p7s
Description: S/MIME Cryptographic Signature
On 13 March 2012 09:01, Emmanuel Bourg ebo...@apache.org wrote:
Le 13/03/2012 01:44, sebb a écrit :
I don't think we should be trying to recode JDK classes.
I'd rather not, but we have done that in the past. FastDateFormat and
StrBuilder come to mind.
And now Java has StringBuilder, which
Le 13/03/2012 02:47, Niall Pemberton a écrit :
IMO performance should be taken out of the equation by using the
Readable interface[1]. That way the users can use whatever
implementation suits them (for example using an underlying buffered
InputStream) to change/improve performance.
I you mean
I have identified the performance killer, it's the
ExtendedBufferedReader. It implements a complex logic to fetch one
character ahead, but this extra character is rarely used. I have
implemented a simpler look ahead using mark/reset as suggested by Bob
Smith in CSV-42 and the performance
On 12 March 2012 10:31, Emmanuel Bourg ebo...@apache.org wrote:
I have identified the performance killer, it's the ExtendedBufferedReader.
It implements a complex logic to fetch one character ahead, but this extra
character is rarely used. I have implemented a simpler look ahead using
Le 12/03/2012 16:44, sebb a écrit :
Java has a PushbackReader class - could that not be used?
I considered it, but it doesn't mix well with line reading. The
mark/reset solution is really simple and efficient.
Emmanuel Bourg
smime.p7s
Description: S/MIME Cryptographic Signature
Am 12. März 2012 11:31 schrieb Emmanuel Bourg ebo...@apache.org:
I have identified the performance killer, it's the ExtendedBufferedReader.
It implements a complex logic to fetch one character ahead, but this extra
character is rarely used. I have implemented a simpler look ahead using
Le 12/03/2012 17:03, Benedikt Ritter a écrit :
The hole logic behind CSVLexer.nextToken() is very hard to read
(IMHO). Maybe a some refactoring would help to make it easier to
identify bottle necks?
Yes I started investigating in this direction. I filed a few bugs
regarding the behavior of
Would one of the parser libraries not work here?
On Mar 12, 2012 12:22 PM, Emmanuel Bourg ebo...@apache.org wrote:
Le 12/03/2012 17:03, Benedikt Ritter a écrit :
The hole logic behind CSVLexer.nextToken() is very hard to read
(IMHO). Maybe a some refactoring would help to make it easier to
Am 12. März 2012 17:22 schrieb Emmanuel Bourg ebo...@apache.org:
Le 12/03/2012 17:03, Benedikt Ritter a écrit :
The hole logic behind CSVLexer.nextToken() is very hard to read
(IMHO). Maybe a some refactoring would help to make it easier to
identify bottle necks?
Yes I started
Le 12/03/2012 17:28, James Carman a écrit :
Would one of the parser libraries not work here?
You think at something like JavaCC or AntLR? Not sure it'll be more
efficient than a handcrafted parser. The CSV format is simple enough to
do it manually.
Emmanuel Bourg
smime.p7s
Description:
On Mon, Mar 12, 2012 at 5:41 PM, Emmanuel Bourg ebo...@apache.org wrote:
Le 12/03/2012 17:28, James Carman a écrit :
Would one of the parser libraries not work here?
You think at something like JavaCC or AntLR? Not sure it'll be more
efficient than a handcrafted parser. The CSV format is
Yes this is what I mean. It might be worth a shot. Folks who specialize
in parsing have spent much time on these libraries. It would make sense
that they are quite fast. It gets us out of the parsing business.
On Mar 12, 2012 12:41 PM, Emmanuel Bourg ebo...@apache.org wrote:
Le 12/03/2012
I kept tickling ExtendedBufferedReader and I have some interesting results.
First I tried to simplify it by extending java.io.LineNumberReader
instead of BufferedReader. The performance decreased by 20%, probably
because the class is synchronized internally.
But wait, isn't BufferedReader
On 13 March 2012 00:12, Emmanuel Bourg ebo...@apache.org wrote:
I kept tickling ExtendedBufferedReader and I have some interesting results.
First I tried to simplify it by extending java.io.LineNumberReader instead
of BufferedReader. The performance decreased by 20%, probably because the
Le 13/03/2012 01:25, sebb a écrit :
I'm concerned that the CSV code may grow and grow with private
versions of code that could be provided by the JDK.
By all means make sure the code is efficient in the way it uses the
JDK classes, but I don't think we should be recoding standard classes.
I
On Mar 12, 2012, at 20:25, sebb seb...@gmail.com wrote:
On 13 March 2012 00:12, Emmanuel Bourg ebo...@apache.org wrote:
I kept tickling ExtendedBufferedReader and I have some interesting results.
First I tried to simplify it by extending java.io.LineNumberReader instead
of BufferedReader.
On Mar 12, 2012, at 20:30, Emmanuel Bourg ebo...@apache.org wrote:
Le 13/03/2012 01:25, sebb a écrit :
I'm concerned that the CSV code may grow and grow with private
versions of code that could be provided by the JDK.
By all means make sure the code is efficient in the way it uses the
JDK
On 13 March 2012 00:29, Emmanuel Bourg ebo...@apache.org wrote:
Le 13/03/2012 01:25, sebb a écrit :
I'm concerned that the CSV code may grow and grow with private
versions of code that could be provided by the JDK.
By all means make sure the code is efficient in the way it uses the
JDK
On Tue, Mar 13, 2012 at 12:29 AM, Emmanuel Bourg ebo...@apache.org wrote:
Le 13/03/2012 01:25, sebb a écrit :
I'm concerned that the CSV code may grow and grow with private
versions of code that could be provided by the JDK.
By all means make sure the code is efficient in the way it uses
On 13 March 2012 01:47, Niall Pemberton niall.pember...@gmail.com wrote:
On Tue, Mar 13, 2012 at 12:29 AM, Emmanuel Bourg ebo...@apache.org wrote:
Le 13/03/2012 01:25, sebb a écrit :
I'm concerned that the CSV code may grow and grow with private
versions of code that could be provided by the
On Mar 12, 2012, at 5:44 PM, sebb wrote:
On 13 March 2012 00:29, Emmanuel Bourg ebo...@apache.org wrote:
Le 13/03/2012 01:25, sebb a écrit :
I'm concerned that the CSV code may grow and grow with private
versions of code that could be provided by the JDK.
By all means make sure the
Hi,
I compared the performance of Commons CSV with the other CSV parsers
available. I took the world cities file from Maxmind as a test file [1],
it's a big file of 130M with 2.8 million records.
Here are the results obtained on a Core 2 Duo E8400 after several
iterations to let the JIT
Am 11. März 2012 15:05 schrieb Emmanuel Bourg ebo...@apache.org:
Hi,
I compared the performance of Commons CSV with the other CSV parsers
available. I took the world cities file from Maxmind as a test file [1],
it's a big file of 130M with 2.8 million records.
Here are the results obtained
Le 11/03/2012 16:53, Benedikt Ritter a écrit :
I have some spare time to help you with this. I'll check out the
latest source tonight. Any suggestion where to start?
Hi Benedikt, thank you for helping. You can start looking at the source
of CSVParser if anything catch your eyes, and then run
Am 11. März 2012 21:21 schrieb Emmanuel Bourg ebo...@apache.org:
Le 11/03/2012 16:53, Benedikt Ritter a écrit :
I have some spare time to help you with this. I'll check out the
latest source tonight. Any suggestion where to start?
Hi Benedikt, thank you for helping. You can start looking
Le 12/03/2012 00:02, Benedikt Ritter a écrit :
I've started to dig my way through the source. I've not done too much
performance measuring in my career yet. I would use VisualVM for
profiling, if you don't know anything better.
Usually I work with JProfiler, it identifies the hotspots pretty
29 matches
Mail list logo