[jira] [Commented] (SQOOP-2906) Optimization of AvroUtil.toAvroIdentifier

Joeri Hermans (JIRA) Fri, 13 May 2016 01:44:55 -0700

    [ 
https://issues.apache.org/jira/browse/SQOOP-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282536#comment-15282536
 ]


Joeri Hermans commented on SQOOP-2906:
--------------------------------------

Hi Attila,

It is probably my fault a bit as well, I'm still working on it :) I've added a 
build script in order to prepare the dependencies.

These are actually very good questions! Is it OK if I include these in the 
readme?

> How did you ensure that the extra parameters are passed to the mappers and 
> the Sqoop1 cmd line tool?
sqoop import 
-Dyarn.app.mapreduce.am.env="JAVA_HOME=/usr/lib/jvm/jdk1.8.0_60/jre" 
-Dmapreduce.map.java.opts="-XX:+PreserveFramePointer -XX:InlineSmallCode=200" 
-Dmapreduce.map.env="JAVA_HOME=/usr/lib/jvm/jdk1.8.0_60/jre" --connect [server] 
--username [username] --target-dir [hdfs dir] --table [table] -P -m [number of 
mappers]

> In case of -c which server/service IP is needed there and in what kind of 
> format?
 In the case of YARN, you will have to specify the REST address in order the 
fetch the nodes http://[namenode]:8088/ws/v1/cluster/nodes

> Is it enough to use only with -h?
Yes, but the sampling duration will be only 99Hz with a sampling duration of 5 
seconds. But these default values can be changed by specifying the 
corresponding parameters, or editing hprofiler.sh.

> If I don't wanna use SSH keys is it a valid solution if I type my root 
> password all the time when it executes SSH?
I think this might work, not sure though, because the processes are initiated 
in parallel, so I don't know how stdin will handle this. However, you can 
create a file with your password, then cat the file and pipe the password to 
stdin of the process. You can do this be editing src/host_executor.sh.

> Should it work just out of box ( e.g. defining -h -j -f -t ) or is any 
> postprocess needed?
No post-processing is required. However, interpreting the results might be 
troublesome.

> Did you use a hadoop distribution like CDH or just pure hdfs and sqoop1?
Yes, we use Sqoop 1.4.6 on CDH 5.5.1.

I hope this helps, if you still have any issues free feel to contact me! I'm 
definitely willing to improve this tool :)


Kind regards,

Joeri

> Optimization of AvroUtil.toAvroIdentifier
> -----------------------------------------
>
>                 Key: SQOOP-2906
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2906
>             Project: Sqoop
>          Issue Type: Improvement
>            Reporter: Joeri Hermans
>            Assignee: Joeri Hermans
>              Labels: avro, hadoop, optimization
>         Attachments: diff.txt
>
>
> Hi all
> Our distributed profiler indicated some inefficiencies in the 
> AvroUtil.toAvroIdentifier method, more specifically, the use of Regex 
> patterns. This can be directly observed from the FlameGraph generated by this 
> profiler (https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). 
> We implemented an optimization, and compared this with the original method. 
> On our testing machine, the optimization by itself is about 500% (on average) 
> more efficient compared to the original implementation. We have yet to test 
> how this optimization will influence the performance of user jobs.
> Any suggestions or remarks are welcome.
> Kind regards,
> Joeri
> https://github.com/apache/sqoop/pull/18
> Writeup:
> https://db-blog.web.cern.ch/blog/joeri-hermans/2016-04-hadoop-performance-troubleshooting-stack-tracing-introduction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-2906) Optimization of AvroUtil.toAvroIdentifier

Reply via email to