For long I have been looking for high performance solution to process delimited 
data.

To put forward the problem in performance please consider this example

one,two,three,four,five ---- (, is used as delimiter)

while parsing this string one needs to traverse char by char to find the 
delimiter and later act on the segment.

I am trying to figure out the solution that shall simulate normal human reading 
scenario.
Humans change reading habits as we read more.

initially we read char by char then make a word in our brain and attach meaning 
to the same.
Now as we grow ole we start picking 2 to 3 words in one read and process it 
pretty fast.

The point I am trying to make here is, can we make our code more intelligent to 
take snapshot of data and identify pattern.
I know this sounds pretty hazy but some way to stop parsing char by char and 
develop algo that shall read the memory block in chunk and identify if there 
are any delimiters in the chunk. if delimiter is found then parse char by char 
to get the position.

take a small test here, count number of commas in each row

one,tw
one,two,thr
one,,,two

while looking at this test data did you do char by char parsing or snapshot 
reading

Regards 
Chetan 


-----Original Message-----
From: Simon Kitching [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 26, 2005 12:23 PM
To: Jakarta Commons Users List
Subject: RE: CSV parsing/writing?


If the goal of the project is small, ie just a class to parse csv, then
commons-io, commons-codec, commons-lang are the obvious parties. So it's
a matter of seeing if the committers on those projects are interested.

If the goal is larger, ie creating a new commons component itself then
it is likely to be hard work. The way things usually become commons
components is that they are initially a successful part of some other
successful apache project and are spun off into a separate component
here. So one solution might be to find an apache project that would find
csv functionality useful, and then get the developers of that project to
join commons and become the "mentors" of a csv (or more ambitious)
project here.

Projects that might find csv handling useful include
 * workflow projects
 * B2B projects (geronimo?)
 * data import/export: POI?

It seems clear from the mails here that although there is some user
interest in this, there just aren't any existing committers willing to
dedicate the necessary time to mentoring this new project.

As another alternative, a project can be created on Sourceforge, using
the Apache Public License (APL). That way, apache projects like the ones
listed above can happily use the code if they find a need to process csv
in the future. And at that point, friendly discussions might occur about
moving the project to apache commons.

Apache commons really isn't in the same business as sourceforge. This
means that not every good idea gets a home here. Or to look at it the
other way, if it doesn't find a home here that doesn't mean it isn't a
good idea.

(man, csv is a hard acronym to type. At least half the time it comes out
cvs :-).


Cheers,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------
This message contains the information that may be privileged and is  the 
property of the KPIT Cummins Infosystems LTD.It is intended only for the person 
to whom it is addressed. If you are not intended recipient, you are not 
authorized to read, print , retain copy, disseminate, distribute, or use this 
message or any part thereof. If you receive this message in error, please 
notify the sender immediately and delete all copies of this message. KPIT 
Cummins does not accept any liability for virus infected mails.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to