[ 
https://issues.apache.org/jira/browse/SQOOP-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joeri Hermans updated SQOOP-2906:
---------------------------------
    Description: 
Hi all

Our distributed profiler indicated some inefficiencies in the 
AvroUtil.toAvroIdentifier method, more specifically, the use of Regex patterns. 
This can be directly observed from the FlameGraph generated by this profiler 
(https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). We 
implemented an optimization, and compared this with the original method. On our 
testing machine, the optimization by itself is about 500% (on average) more 
efficient compared to the original implementation. We have yet to test how this 
optimization will influence the performance of user jobs.

Any suggestions or remarks are welcome.

Kind regards,


Joeri

https://github.com/apache/sqoop/pull/18


Writeup:

https://db-blog.web.cern.ch/blog/joeri-hermans/2016-04-hadoop-performance-troubleshooting-stack-tracing-introduction

  was:
Hi all

Our distributed profiler indicated some inefficiencies in the 
AvroUtil.toAvroIdentifier method, more specifically, the use of Regex patterns. 
This can be directly observed from the FlameGraph generated by this profiler 
(https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). We 
implemented an optimization, and compared this with the original method. On our 
testing machine, the optimization by itself is about 500% (on average) more 
efficient compared to the original implementation. We have yet to test how this 
optimization will influence the performance of user jobs.

Any suggestions or remarks are welcome.

Kind regards,


Joeri

https://github.com/apache/sqoop/pull/18


> Optimization of AvroUtil.toAvroIdentifier
> -----------------------------------------
>
>                 Key: SQOOP-2906
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2906
>             Project: Sqoop
>          Issue Type: Improvement
>            Reporter: Joeri Hermans
>            Assignee: Joeri Hermans
>              Labels: avro, hadoop, optimization
>         Attachments: diff.txt
>
>
> Hi all
> Our distributed profiler indicated some inefficiencies in the 
> AvroUtil.toAvroIdentifier method, more specifically, the use of Regex 
> patterns. This can be directly observed from the FlameGraph generated by this 
> profiler (https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). 
> We implemented an optimization, and compared this with the original method. 
> On our testing machine, the optimization by itself is about 500% (on average) 
> more efficient compared to the original implementation. We have yet to test 
> how this optimization will influence the performance of user jobs.
> Any suggestions or remarks are welcome.
> Kind regards,
> Joeri
> https://github.com/apache/sqoop/pull/18
> Writeup:
> https://db-blog.web.cern.ch/blog/joeri-hermans/2016-04-hadoop-performance-troubleshooting-stack-tracing-introduction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to