[ 
http://issues.apache.org/jira/browse/SANDBOX-166?page=comments#action_12426905 
] 
            
Ortwin Glück commented on SANDBOX-166:
--------------------------------------

The most important optimization is to reuse Token (and their StringBuffer) 
objects.

CSV files usually contain the same number of columns throughout the file. The 
parser should adapt itself dynamically after the first line and size its 
internal arrays correctly. Also the columns have maximum lengths. The parser 
should adapt itself dynamically and size it's StringBuffers correctly.

Because of JDK 1.3 compatibility there is 
StringBuffer.append(StringBuffer.toString()) which copies data twice. Using a 
better character buffer can alleviate the problem.

CSVParser:
 getLine(): String[0] is immutable and should be a constant.  Token objects 
should be reused!

 nextToken(): reuse intermediate StringBuffer wsBuf. Don't create a new 
instance on every call

 simpleTokenLexer(): reuse intermediate StringBuffer wsBuf. Don't create a new 
instance on every call

I'll attach a patch that addresses those issues shortly.



> Improve memory use
> ------------------
>
>                 Key: SANDBOX-166
>                 URL: http://issues.apache.org/jira/browse/SANDBOX-166
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: CSV
>    Affects Versions: Nightly Builds
>            Reporter: Ortwin Glück
>         Attachments: profile.png
>
>
> The parser is currently a real memory burner. I fed it a 4MB CSV file and ran 
> the TPTP profiler.  As you can see the parser creates around 100MB of garbage 
> whereas it could (in really optimized) use around 4MBs.  Such figures are not 
> acceptable within a server environment. Please attach insights and patches to 
> this issue report.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to