Yeah. Hot function deserves a good optimization. It will be great to create an
issue in JIRA and change your code to fit common Sqoop (Java) code convention.
E.g. always have {} to wrap even single line code.
Stanley
-----Original Message-----
From: JoeriHermans [mailto:[email protected]]
Sent: Monday, April 11, 2016 9:50 PM
To: [email protected]
Subject: [GitHub] sqoop pull request: Optimize toAvroIdentifier
GitHub user JoeriHermans opened a pull request:
https://github.com/apache/sqoop/pull/18
Optimize toAvroIdentifier
Our distributed profiler indicated some inefficiencies in the
AvroUtil.toAvroIdentifier method, more specifically, the use of Regex patterns.
This can be directly observed from the FlameGraph generated by this profiler
(https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). We
implemented an optimization, and compared this with the original method. On our
testing machine, the optimization by itself is 230% (on average) more efficient
compared to the original implementation. We have yet to test how this
optimization will influence the performance of user jobs.
Any suggestions or remarks are welcome.
Kind regards,
Joeri
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoeriHermans/sqoop patch-1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/sqoop/pull/18.patch
To close this pull request, make a commit to your master/trunk branch with (at
least) the following in the commit message:
This closes #18
----
commit e8a3eaf872fe9804375c736a6e2603015e3a36a2
Author: Joeri Hermans <[email protected]>
Date: 2016-04-11T13:46:07Z
Optimize toAvroIdentifier
Our distributed profiler indicated some inefficiencies in the
AvroUtil.toAvroIdentifier method, more specifically, the use of Regex patterns.
This can be directly observed from the FlameGraph generated by this profiler
(https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). We
implemented an optimization, and compared this with the original method. On our
testing machine, the optimization by itself is 230% (on average) more efficient
compared to the original implementation. We have yet to test how this
optimization will influence the performance of user jobs.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket with
INFRA.
---