GitHub user JoeriHermans opened a pull request:

    https://github.com/apache/sqoop/pull/18

    Optimize toAvroIdentifier

    Our distributed profiler indicated some inefficiencies in the 
AvroUtil.toAvroIdentifier method, more specifically, the use of Regex patterns. 
This can be directly observed from the FlameGraph generated by this profiler 
(https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). We 
implemented an optimization, and compared this with the original method. On our 
testing machine, the optimization by itself is 230% (on average) more efficient 
compared to the original implementation. We have yet to test how this 
optimization will influence the performance of user jobs.
    
    Any suggestions or remarks are welcome.
    
    
    Kind regards,
    
    Joeri

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoeriHermans/sqoop patch-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/sqoop/pull/18.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18
    
----
commit e8a3eaf872fe9804375c736a6e2603015e3a36a2
Author: Joeri Hermans <[email protected]>
Date:   2016-04-11T13:46:07Z

    Optimize toAvroIdentifier
    
    Our distributed profiler indicated some inefficiencies in the 
AvroUtil.toAvroIdentifier method, more specifically, the use of Regex patterns. 
This can be directly observed from the FlameGraph generated by this profiler 
(https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). We 
implemented an optimization, and compared this with the original method. On our 
testing machine, the optimization by itself is 230% (on average) more efficient 
compared to the original implementation. We have yet to test how this 
optimization will influence the performance of user jobs.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to