[jira] [Updated] (CSV-277) Review Lexer simpleToken for Performance

David Mollitor (Jira) Mon, 12 Jul 2021 13:59:26 -0700


     [ 
https://issues.apache.org/jira/browse/CSV-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Mollitor updated CSV-277:
-------------------------------
    Description: 
Running the Apache ORC benchmarks which has {{commons-csv}} as a dependency and 
noticed the bulk of running time is in {{commons-csv}}.

I attached the VisualVM output and here is my test setup:

{code:none}
JVM: OpenJDK 64-Bit Server VM (25.292-b10, mixed mode)
Java: version 1.8.0_292, vendor Private Build
Java Home: /usr/lib/jvm/java-8-openjdk-amd64/jre
JVM Flags: <none>
{code}


I suspect this is in part because {{ExtendedBufferedReader}} extends 
{{BufferedReader}}. {{BufferedReader}} is a synchronized method class which 
means that every call to {{read}} requires synchronization.  Usually it's not 
an issue, but for {{commons-csv}}, it adds a lot of overhead because it reads 
each byte one-at-a-time.  So even though it's buffered, it has to go through a 
synchronization processes for each byte read.  It also has to perform a "jump" 
into the parent class for each byte.

Nothing else stands out to me as being "slow."

  was:
{{BufferedReader}} is a synchronized method class which means that every call 
to {{read}} requires synchronization.

Usually it's not an issue, but for {{commons-csv}}, it adds a lot of overhead 
because it reads each  byte one at a time.  So even though it's buffered, it 
has to go through a synchronization processes for each byte read.

Change {{ExtendedBufferedReader}} to implement its own internal buffer.


> Review Lexer simpleToken for Performance
> ----------------------------------------
>
>                 Key: CSV-277
>                 URL: https://issues.apache.org/jira/browse/CSV-277
>             Project: Commons CSV
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Priority: Major
>         Attachments: CSVCapture.PNG
>
>
> Running the Apache ORC benchmarks which has {{commons-csv}} as a dependency 
> and noticed the bulk of running time is in {{commons-csv}}.
> I attached the VisualVM output and here is my test setup:
> {code:none}
> JVM: OpenJDK 64-Bit Server VM (25.292-b10, mixed mode)
> Java: version 1.8.0_292, vendor Private Build
> Java Home: /usr/lib/jvm/java-8-openjdk-amd64/jre
> JVM Flags: <none>
> {code}
> I suspect this is in part because {{ExtendedBufferedReader}} extends 
> {{BufferedReader}}. {{BufferedReader}} is a synchronized method class which 
> means that every call to {{read}} requires synchronization.  Usually it's not 
> an issue, but for {{commons-csv}}, it adds a lot of overhead because it reads 
> each byte one-at-a-time.  So even though it's buffered, it has to go through 
> a synchronization processes for each byte read.  It also has to perform a 
> "jump" into the parent class for each byte.
> Nothing else stands out to me as being "slow."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CSV-277) Review Lexer simpleToken for Performance

Reply via email to