[
https://issues.apache.org/jira/browse/SQOOP-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282536#comment-15282536
]
Joeri Hermans commented on SQOOP-2906:
--------------------------------------
Hi Attila,
It is probably my fault a bit as well, I'm still working on it :) I've added a
build script in order to prepare the dependencies.
These are actually very good questions! Is it OK if I include these in the
readme?
> How did you ensure that the extra parameters are passed to the mappers and
> the Sqoop1 cmd line tool?
sqoop import
-Dyarn.app.mapreduce.am.env="JAVA_HOME=/usr/lib/jvm/jdk1.8.0_60/jre"
-Dmapreduce.map.java.opts="-XX:+PreserveFramePointer -XX:InlineSmallCode=200"
-Dmapreduce.map.env="JAVA_HOME=/usr/lib/jvm/jdk1.8.0_60/jre" --connect [server]
--username [username] --target-dir [hdfs dir] --table [table] -P -m [number of
mappers]
> In case of -c which server/service IP is needed there and in what kind of
> format?
In the case of YARN, you will have to specify the REST address in order the
fetch the nodes http://[namenode]:8088/ws/v1/cluster/nodes
> Is it enough to use only with -h?
Yes, but the sampling duration will be only 99Hz with a sampling duration of 5
seconds. But these default values can be changed by specifying the
corresponding parameters, or editing hprofiler.sh.
> If I don't wanna use SSH keys is it a valid solution if I type my root
> password all the time when it executes SSH?
I think this might work, not sure though, because the processes are initiated
in parallel, so I don't know how stdin will handle this. However, you can
create a file with your password, then cat the file and pipe the password to
stdin of the process. You can do this be editing src/host_executor.sh.
> Should it work just out of box ( e.g. defining -h -j -f -t ) or is any
> postprocess needed?
No post-processing is required. However, interpreting the results might be
troublesome.
> Did you use a hadoop distribution like CDH or just pure hdfs and sqoop1?
Yes, we use Sqoop 1.4.6 on CDH 5.5.1.
I hope this helps, if you still have any issues free feel to contact me! I'm
definitely willing to improve this tool :)
Kind regards,
Joeri
> Optimization of AvroUtil.toAvroIdentifier
> -----------------------------------------
>
> Key: SQOOP-2906
> URL: https://issues.apache.org/jira/browse/SQOOP-2906
> Project: Sqoop
> Issue Type: Improvement
> Reporter: Joeri Hermans
> Assignee: Joeri Hermans
> Labels: avro, hadoop, optimization
> Attachments: diff.txt
>
>
> Hi all
> Our distributed profiler indicated some inefficiencies in the
> AvroUtil.toAvroIdentifier method, more specifically, the use of Regex
> patterns. This can be directly observed from the FlameGraph generated by this
> profiler (https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg).
> We implemented an optimization, and compared this with the original method.
> On our testing machine, the optimization by itself is about 500% (on average)
> more efficient compared to the original implementation. We have yet to test
> how this optimization will influence the performance of user jobs.
> Any suggestions or remarks are welcome.
> Kind regards,
> Joeri
> https://github.com/apache/sqoop/pull/18
> Writeup:
> https://db-blog.web.cern.ch/blog/joeri-hermans/2016-04-hadoop-performance-troubleshooting-stack-tracing-introduction
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)