[ http://issues.apache.org/jira/browse/SANDBOX-166?page=comments#action_12426905 ] Ortwin Glück commented on SANDBOX-166: --------------------------------------
The most important optimization is to reuse Token (and their StringBuffer) objects. CSV files usually contain the same number of columns throughout the file. The parser should adapt itself dynamically after the first line and size its internal arrays correctly. Also the columns have maximum lengths. The parser should adapt itself dynamically and size it's StringBuffers correctly. Because of JDK 1.3 compatibility there is StringBuffer.append(StringBuffer.toString()) which copies data twice. Using a better character buffer can alleviate the problem. CSVParser: getLine(): String[0] is immutable and should be a constant. Token objects should be reused! nextToken(): reuse intermediate StringBuffer wsBuf. Don't create a new instance on every call simpleTokenLexer(): reuse intermediate StringBuffer wsBuf. Don't create a new instance on every call I'll attach a patch that addresses those issues shortly. > Improve memory use > ------------------ > > Key: SANDBOX-166 > URL: http://issues.apache.org/jira/browse/SANDBOX-166 > Project: Commons Sandbox > Issue Type: Improvement > Components: CSV > Affects Versions: Nightly Builds > Reporter: Ortwin Glück > Attachments: profile.png > > > The parser is currently a real memory burner. I fed it a 4MB CSV file and ran > the TPTP profiler. As you can see the parser creates around 100MB of garbage > whereas it could (in really optimized) use around 4MBs. Such figures are not > acceptable within a server environment. Please attach insights and patches to > this issue report. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
