For long I have been looking for high performance solution to process delimited data.
To put forward the problem in performance please consider this example one,two,three,four,five ---- (, is used as delimiter) while parsing this string one needs to traverse char by char to find the delimiter and later act on the segment. I am trying to figure out the solution that shall simulate normal human reading scenario. Humans change reading habits as we read more. initially we read char by char then make a word in our brain and attach meaning to the same. Now as we grow ole we start picking 2 to 3 words in one read and process it pretty fast. The point I am trying to make here is, can we make our code more intelligent to take snapshot of data and identify pattern. I know this sounds pretty hazy but some way to stop parsing char by char and develop algo that shall read the memory block in chunk and identify if there are any delimiters in the chunk. if delimiter is found then parse char by char to get the position. take a small test here, count number of commas in each row one,tw one,two,thr one,,,two while looking at this test data did you do char by char parsing or snapshot reading Regards Chetan -----Original Message----- From: Simon Kitching [mailto:[EMAIL PROTECTED] Sent: Thursday, May 26, 2005 12:23 PM To: Jakarta Commons Users List Subject: RE: CSV parsing/writing? If the goal of the project is small, ie just a class to parse csv, then commons-io, commons-codec, commons-lang are the obvious parties. So it's a matter of seeing if the committers on those projects are interested. If the goal is larger, ie creating a new commons component itself then it is likely to be hard work. The way things usually become commons components is that they are initially a successful part of some other successful apache project and are spun off into a separate component here. So one solution might be to find an apache project that would find csv functionality useful, and then get the developers of that project to join commons and become the "mentors" of a csv (or more ambitious) project here. Projects that might find csv handling useful include * workflow projects * B2B projects (geronimo?) * data import/export: POI? It seems clear from the mails here that although there is some user interest in this, there just aren't any existing committers willing to dedicate the necessary time to mentoring this new project. As another alternative, a project can be created on Sourceforge, using the Apache Public License (APL). That way, apache projects like the ones listed above can happily use the code if they find a need to process csv in the future. And at that point, friendly discussions might occur about moving the project to apache commons. Apache commons really isn't in the same business as sourceforge. This means that not every good idea gets a home here. Or to look at it the other way, if it doesn't find a home here that doesn't mean it isn't a good idea. (man, csv is a hard acronym to type. At least half the time it comes out cvs :-). Cheers, Simon --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------- This message contains the information that may be privileged and is the property of the KPIT Cummins Infosystems LTD.It is intended only for the person to whom it is addressed. If you are not intended recipient, you are not authorized to read, print , retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Cummins does not accept any liability for virus infected mails. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
