[
https://issues.apache.org/jira/browse/SQOOP-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413926#comment-15413926
]
Ruslan Dautkhanov commented on SQOOP-2981:
------------------------------------------
Asked Rosmari to follow up directly here as they've put SR into autoclose and
got following response:
{quote} Oracle Support [ODM Action Plan]
Hi Ruslan,
I apologize that I can't make SR updates via third-party message boards. Please
make any updates directly within the Service Request.
What we need is to understand what the import does in relation to the the use
of getBigDecimal. How is JDBC getBigDecimal used here?
Basically, we need a JDBC testcase showing the performance issue when using
getBigDecimal, or the information to build the testcase based on what your
application does and how it uses getBigDecimal.
Thanks and regards,
Rosmari
{quote}
If you could provide @maugli response to Rosmari, it would be great. Thank you.
> sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu
> ------------------------------------------------------------------------------
>
> Key: SQOOP-2981
> URL: https://issues.apache.org/jira/browse/SQOOP-2981
> Project: Sqoop
> Issue Type: Bug
> Components: codegen, connectors/oracle, sqoop2-jdbc-connector
> Affects Versions: 1.4.5, 1.4.6
> Environment: sqoop import from Oracle; saves as parquet file
> Reporter: Ruslan Dautkhanov
> Labels: decimal, import, jdbc, oracle, parquet
>
> Majority of time spent of sqoop import from Oracle was on converting Decimal.
> It was 2.5x times more than following most cpu consumer (snappy compression).
> Sqoop was 100% on cpu in total, Oracle side was pretty bored.
> {quote}
> JvmTop 0.8.0 alpha - 11:56:45, amd64, 48 cpus, Linux 2.6.32-57, load avg
> 0.92
> http://code.google.com/p/jvmtop
> Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20
> 38.78% ( 7.68s)
> com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
> 14.27% ( 2.82s) org.xerial.snappy.SnappyNative.rawCompress()
> 12.67% ( 2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
> 10.28% ( 2.04s)
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
> 4.80% ( 0.95s)
> ...quet.column.values.fallback.FallbackValuesWriter.writ()
> 3.69% ( 0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
> 2.51% ( 0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
> 2.30% ( 0.46s) parquet.column.impl.ColumnWriterV1.write()
> 1.90% ( 0.38s)
> ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
> 1.31% ( 0.26s)
> ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
> 1.27% ( 0.25s)
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
> 1.22% ( 0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
> 0.65% ( 0.13s)
> ...quet.column.values.dictionary.DictionaryValuesWriter$()
> 0.64% ( 0.13s)
> ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
> 0.64% ( 0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
> 0.63% ( 0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
> 0.62% ( 0.12s)
> ...quet.column.values.dictionary.DictionaryValuesWriter.()
> 0.58% ( 0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
> 0.49% ( 0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
> 0.44% ( 0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
> {quote}
> DDL of the table on Oracle side:
> {quote}
> CREATE TABLE someschema.sometable
> (
> ID NUMBER NOT NULL,
> psn VARCHAR2(50 BYTE),
> MERCHID NUMBER,
> RECENCY NUMBER,
> FREQUENCY NUMBER,
> SPEND NUMBER,
> SPENDING NUMBER,
> FREQ NUMBER
> )
> {quote}
> Sqoop parameters:
> {quote}
> sqoop import
> -Dmapred.job.name="sqoop import into out_table"
> --connect "jdbc:oracle:thin:@jdbc_tns"
> --username username --password password
> --direct
> --compress --compression-codec snappy
> --as-parquetfile
> --target-dir hdfs_dir
> --num-mappers num_mappers
> --query "SELECT * FROM someschema.sometable WHERE \$CONDITIONS"
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)