[ 
https://issues.apache.org/jira/browse/KYLIN-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024842#comment-17024842
 ] 

 Kaige Liu commented on KYLIN-4362:
-----------------------------------

Split-by column is missed in the generated sqoop command. Can you please share 
your table DDL to debug this issue? 

> Kylin 3.0.0 Release: MR & Spark Job is failing with JDBC connection and Sqoop.
> ------------------------------------------------------------------------------
>
>                 Key: KYLIN-4362
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4362
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: Sonu Singh
>            Priority: Blocker
>             Fix For: v3.0.0
>
>
> MR and SPark job are failing on HDP3.1 with below error:
> -00 execute finished with exception
> java.io.IOException: OS command error exit with return code: 1, error 
> message: Warning: /usr/hdp/3.0.1.0-187/accumulo does not exist! Accumulo 
> imports will fail.
> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 20/01/27 17:09:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.1.0-187
> Missing argument for option: split-by
> The command is:
> /usr/hdp/current/sqoop-client/bin/sqoop import 
> -Dorg.apache.sqoop.splitter.allow_text_splitter=true 
> -Dmapreduce.job.queuename=default --connect "jdbc:vdb:/ 
> /XX.XX.XX.XX:XX/XXXXX" --driver com.XXXX.XX.jdbc.Driver --username XXXXX 
> --password "XXXXXXX" --query "SELECT \`sales\`.\`locationdim ensionid\` as 
> \`SALES_LOCATIONDIMENSIONID\` ,\`sales\`.\`storeitemdimensionid\` as 
> \`SALES_STOREITEMDIMENSIONID\` ,\`sales\`.\`basecostperunit\` as \`SALES_ 
> BASECOSTPERUNIT\` ,\`sales\`.\`createdby\` as \`SALES_CREATEDBY\` 
> ,\`sales\`.\`updateddate\` as \`SALES_UPDATEDDATE\` FROM \`XXXX\`.\`sales\` 
> \`sale s\` WHERE 1=1 AND \$CONDITIONS" --target-dir 
> hdfs://XX-master:8020/apps/XXX/XXX/kylin-4f367799-4993-bb67-da69-a9a147c62a1e/kylin_intermediate_cube_11_2701
>  2020_1d0a2dfd_bd66_d3e3_304b_9cd7f2018dbc --split-by --boundary-query 
> "SELECT min(\`\`), max(\`\`) FROM \`XXXXXX\`.\`sales\` " --null-string '\\N' 
> --n ull-non-string '\\N' --fields-terminated-by '|' --num-mappers 4
> at 
> org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:88)
> at org.apache.kylin.source.jdbc.CmdStep.sqoopFlatHiveTable(CmdStep.java:43)
> at org.apache.kylin.source.jdbc.CmdStep.doWork(CmdStep.java:54)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:171)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:62)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:171)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:106)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2020-01-27 17:09:19,362 INFO [Scheduler 1642300543 Job 
> 4f367799-4993-bb67-da69-a9a147c62a1e-160] execution.ExecutableManager:466 : 
> job id:4f367799-4993-bb6 7-da69-a9a147c62a1e-00 from RUNNING to ERROR
> 2020-01-27 17:09:19,365 ERROR [Scheduler 1642300543 Job 
> 4f367799-4993-bb67-da69-a9a147c62a1e-160] execution.AbstractExecutable:173 : 
> error running Executabl e: 
> CubingJob\{id=4f367799-4993-bb67-da69-a9a147c62a1e, name=BUILD CUBE - 
> cube_11_27012020 - FULL_BUILD - UTC 2020-01-27 17:09:00, state=RUNNING}
> 2020-01-27 17:09:19,372 DEBUG [pool-7-thread-1] cachesync.Broadcaster:111 : 
> Servers in the cluster: [localhost:7070]
> 2020-01-27 17:09:19,373 DEBUG [pool-7-thread-1] cachesync.Broadcaster:121 : 
> Announcing new bro
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to