[ 
https://issues.apache.org/jira/browse/KYLIN-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041975#comment-17041975
 ] 

wangrupeng commented on KYLIN-4370:
-----------------------------------

Here I find that why spark cannot read sequence file when using jdbc source.

When create flat table, kylin will judge if the source is jdbc source and then 
force set the storage format as "TEXTFILE" which can be config in 
kylin.properties, just set  
kylin.source.hive.flat-table-storage-format=textfile.

!image-2020-02-21-23-47-52-247.png|width=614,height=62!

As we can see from the following code, sqoop doesn't support exporting 
SEQUENCEFILE to hive now. So when build cube with jdbc source using spark 
engine, kylin uses spark sql to get hive table and then  due to the 
SparkSession doesn't really get hive metadata configuration(a bug), so we get 
the problem.

!image-2020-02-21-23-44-53-459.png|width=566,height=150!

The fllowing is in SparkUtil.getOtherFormatHiveInput().

!image-2020-02-19-17-20-20-076.png|width=785,height=119!

> Spark job failing with JDBC source on 8th step with error :  
> org.apache.kylin.engine.spark.SparkCubingByLayer. Root cause: Table or view 
> not found: `default`.`kylin_intermediate table'
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4370
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4370
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v3.0.0
>            Reporter: Sonu Singh
>            Assignee: wangrupeng
>            Priority: Blocker
>             Fix For: v3.0.0
>
>         Attachments: image-2020-02-19-17-20-20-076.png, 
> image-2020-02-19-18-48-00-857.png, image-2020-02-21-23-44-53-459.png, 
> image-2020-02-21-23-47-52-247.png
>
>
> 020-02-03 11:18:45,899 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> 2020-02-03 11:18:45 INFO Client:54 - Application report for 
> application_1580106368736_0431 (state: RUNNING)
> 2020-02-03 11:18:46,901 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> 2020-02-03 11:18:46 INFO Client:54 - Application report for 
> application_1580106368736_0431 (state: FINISHED)
> 2020-02-03 11:18:46,906 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> 2020-02-03 11:18:46 INFO Client:54 -
> 2020-02-03 11:18:46,906 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> client token: N/A
> 2020-02-03 11:18:46,906 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> diagnostics: User class threw exception: java.lang.RuntimeException: error 
> execute org.apache.kylin.engine.spark.SparkCubingByLayer. Root cause: Table 
> or view not found: 
> `default`.`kylin_intermediate_cube_2_30012020_spark_ded81d01_32e9_4f82_ea97_61788fdfd59b`;;
> 2020-02-03 11:18:46,906 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> 'UnresolvedRelation 
> `default`.`kylin_intermediate_cube_2_30012020_spark_ded81d01_32e9_4f82_ea97_61788fdfd59b`
> 2020-02-03 11:18:46,906 INFO [pool-25-thread-1] spark.SparkExecutable:33 :
> 2020-02-03 11:18:46,906 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:34)
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:36)
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> Caused by: org.apache.spark.sql.AnalysisException: Table or view not found: 
> `default`.`kylin_intermediate_cube_2_30012020_spark_ded81d01_32e9_4f82_ea97_61788fdfd59b`;;
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> 'UnresolvedRelation 
> `default`.`kylin_intermediate_cube_2_30012020_spark_ded81d01_32e9_4f82_ea97_61788fdfd59b`
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 :
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:86)
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:84)
> 2020-02-03 11:18:46,907 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:84)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:92)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.SparkSession.table(SparkSession.scala:628)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.sql.SparkSession.table(SparkSession.scala:624)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.kylin.engine.spark.SparkUtil.getOtherFormatHiveInput(SparkUtil.java:163)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:144)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.kylin.engine.spark.SparkCubingByLayer.execute(SparkCubingByLayer.java:161)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:29)
> 2020-02-03 11:18:46,908 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> ... 6 more
> 2020-02-03 11:18:46,909 INFO [pool-25-thread-1] spark.SparkExecutable:33 :
> 2020-02-03 11:18:46,909 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> ApplicationMaster host: 172.31.43.179
> 2020-02-03 11:18:46,909 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> ApplicationMaster RPC port: 0
> 2020-02-03 11:18:46,909 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> queue: default
> 2020-02-03 11:18:46,912 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> start time: 1580728698758
> 2020-02-03 11:18:46,912 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> final status: FAILED
> 2020-02-03 11:18:46,912 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> tracking URL: http://XXXXXXXX:8088/proxy/application_1580106368736_0431/
> 2020-02-03 11:18:46,918 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> user: root
> 2020-02-03 11:18:46,918 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1580106368736_0431 finished with failed status
> 2020-02-03 11:18:46,919 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.deploy.yarn.Client.run(Client.scala:1165)
> 2020-02-03 11:18:46,919 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
> 2020-02-03 11:18:46,919 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
> 2020-02-03 11:18:46,919 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
> 2020-02-03 11:18:46,919 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
> 2020-02-03 11:18:46,919 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
> 2020-02-03 11:18:46,919 INFO [pool-25-thread-1] spark.SparkExecutable:33 : at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2020-02-03 11:18:46,929 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> 2020-02-03 11:18:46 INFO ShutdownHookManager:54 - Shutdown hook called
> 2020-02-03 11:18:46,931 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> 2020-02-03 11:18:46 INFO ShutdownHookManager:54 - Deleting directory 
> /tmp/spark-a13c48c9-967e-422e-b541-aba9d55fcd7c
> 2020-02-03 11:18:46,940 INFO [pool-25-thread-1] spark.SparkExecutable:33 : 
> 2020-02-03 11:18:46 INFO ShutdownHookManager:54 - Deleting directory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to