Hi all,
when I load data from hdfs csv file, a stage of spark job failed with the
following error, where can I find a more detail error that can help me find the
solution, or may some one know why this happen and how to solve it.
command:
cc.sql(s"load data inpath 'hdfs://master:9000/opt/sample.csv' into table
test_table")
error log:
Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most
recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2):
org.apache.carbondata.processing.etl.DataLoadingException: Due to internal
errors, please check logs for more details.
Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most
recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2):
org.apache.carbondata.processing.etl.DataLoadingException: Due to internal
errors, please check logs for more details. at
org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212)
at
org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144)
at
org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212)
at
org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125)
at
org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:255)
at
org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at
org.apache.spark.scheduler.Task.run(Task.scala:89) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Driver stacktrace: