[jira] [Comment Edited] (SQOOP-2981) sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu

Attila Szabo (JIRA) Tue, 05 Jul 2016 22:41:27 -0700

    [ 
https://issues.apache.org/jira/browse/SQOOP-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363788#comment-15363788
 ]


Attila Szabo edited comment on SQOOP-2981 at 7/6/16 5:41 AM:
-------------------------------------------------------------

Hi [~Tagar],

Could you please also perform a profiling with the help of Joeri's tool ( 
https://github.com/cerndb/Hadoop-Profiler ). It would require JDK8_u60 or 
above. Is it a doable scenario on your side?

The reason why I'm asking for that is the following:
JdbcWritableBridge#readBigDecimal is just a shorthand for 
java.sql.ResultSet#getBigDecimal(int), so the implementation is within the 
Oracle JDBC driver, and thus Sqoop do not have too much control over that 
method.

Meanwhile I'll try to investigate your scenario on my side, and check if I find 
any clue what could cause you performance issue (or if we could find a 
workaround e.g. mapping to a different type instead of BigDecimal or so).

Cheers,
Maugli


was (Author: maugli):
Hi [~Tagar],

Could you please also perform a profiling with the help of Joremi's tool ( 
https://github.com/cerndb/Hadoop-Profiler ). It would require JDK8_u60 or 
above. Is it a doable scenario on your side?

The reason why I'm asking for that is the following:
JdbcWritableBridge#readBigDecimal is just a shorthand for 
java.sql.ResultSet#getBigDecimal(int), so the implementation is within the 
Oracle JDBC driver, and thus Sqoop do not have too much control over that 
method.

Meanwhile I'll try to investigate your scenario on my side, and check if I find 
any clue what could cause you performance issue (or if we could find a 
workaround e.g. mapping to a different type instead of BigDecimal or so).

Cheers,
Maugli

> sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu
> ------------------------------------------------------------------------------
>
>                 Key: SQOOP-2981
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2981
>             Project: Sqoop
>          Issue Type: Bug
>          Components: codegen, connectors/oracle, sqoop2-jdbc-connector
>    Affects Versions: 1.4.5, 1.4.6
>         Environment: sqoop import from Oracle; saves as parquet file
>            Reporter: Ruslan Dautkhanov
>              Labels: decimal, import, jdbc, oracle, parquet
>
> Majority of time spent of sqoop import from Oracle was on converting Decimal. 
> It was 2.5x times more than following most cpu consumer (snappy compression). 
> Sqoop was 100% on cpu in total, Oracle side was pretty bored.
> {quote}
>  JvmTop 0.8.0 alpha - 11:56:45,  amd64, 48 cpus, Linux 2.6.32-57, load avg 
> 0.92
>  http://code.google.com/p/jvmtop
>  Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20
>   38.78% (     7.68s) 
> com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
>   14.27% (     2.82s) org.xerial.snappy.SnappyNative.rawCompress()
>   12.67% (     2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
>   10.28% (     2.04s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
>    4.80% (     0.95s) 
> ...quet.column.values.fallback.FallbackValuesWriter.writ()
>    3.69% (     0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
>    2.51% (     0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
>    2.30% (     0.46s) parquet.column.impl.ColumnWriterV1.write()
>    1.90% (     0.38s) 
> ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
>    1.31% (     0.26s) 
> ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
>    1.27% (     0.25s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
>    1.22% (     0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
>    0.65% (     0.13s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
>    0.64% (     0.13s) 
> ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
>    0.64% (     0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
>    0.63% (     0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
>    0.62% (     0.12s) 
> ...quet.column.values.dictionary.DictionaryValuesWriter.()
>    0.58% (     0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
>    0.49% (     0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
>    0.44% (     0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
> {quote}
> DDL of the table on Oracle side:
> {quote}
> CREATE TABLE someschema.sometable
> (
>   ID             NUMBER                         NOT NULL,
>   psn  VARCHAR2(50 BYTE),
>   MERCHID     NUMBER,
>   RECENCY        NUMBER,
>   FREQUENCY      NUMBER,
>   SPEND          NUMBER,
>   SPENDING       NUMBER,
>   FREQ           NUMBER
> )
> {quote}
> Sqoop parameters:
> {quote}
> sqoop import 
>         -Dmapred.job.name="sqoop import into out_table" 
>         --connect "jdbc:oracle:thin:@jdbc_tns" 
>         --username username --password password
>         --direct 
>         --compress --compression-codec snappy 
>         --as-parquetfile 
>         --target-dir hdfs_dir
>         --num-mappers num_mappers
>         --query "SELECT * FROM someschema.sometable WHERE \$CONDITIONS"
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (SQOOP-2981) sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu

Reply via email to