Re: load data error

金铸 Mon, 01 Aug 2016 19:54:32 -0700

WARN  02-08 18:37:46,529 - Lost task 1.0 in stage 6.0 (TID 20, master): 
org.carbondata.processing.graphgenerator.GraphGeneratorException:
Error While Initializing the Kettel Engine
        at 
org.carbondata.processing.graphgenerator.GraphGenerator.validateAndInitialiseKettelEngine(GraphGenerator.java:309)
        at 
org.carbondata.processing.graphgenerator.GraphGenerator.generateGraph(GraphGenerator.java:278)
        at 
org.carbondata.spark.load.CarbonLoaderUtil.generateGraph(CarbonLoaderUtil.java:118)
        at 
org.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:173)
        at 
org.carbondata.spark.rdd.CarbonDataLoadRDD$$anon$1.<init>(CarbonDataLoadRDD.scala:196)
        at 
org.carbondata.spark.rdd.CarbonDataLoadRDD.compute(CarbonDataLoadRDD.scala:155)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.pentaho.di.core.exception.KettleException:
Unable to read file 
'/opt/incubator-carbondata/processing/carbonplugins/.kettle/kettle.properties'
/opt/incubator-carbondata/processing/carbonplugins/.kettle/kettle.properties 
(No such file or directory)


        at org.pentaho.di.core.util.EnvUtil.readProperties(EnvUtil.java:65)
        at org.pentaho.di.core.util.EnvUtil.environmentInit(EnvUtil.java:95)
        at 
org.carbondata.processing.graphgenerator.GraphGenerator.validateAndInitialiseKettelEngine(GraphGenerator.java:303)
        ... 13 more
Caused by: java.io.FileNotFoundException: 
/opt/incubator-carbondata/processing/carbonplugins/.kettle/kettle.properties 
(No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at java.io.FileInputStream.<init>(FileInputStream.java:101)
        at org.pentaho.di.core.util.EnvUtil.readProperties(EnvUtil.java:60)
        ... 15 more


在 2016/8/1 11:04, 金铸 写道:

[hadoop@master ~]$ hadoop fs -cat /opt/incubator-carbondata/sample.csv
16/08/01 18:11:00 WARN util.NativeCodeLoader: Unable to loadnative-hadoop library for your platform... using builtin-java classeswhere applicable
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
[hadoop@master ~]$

spark-sql> load data inpath '../sample.csv' into table test_table;
INFO 01-08 18:19:08,914 - Job 7 finished: collect atCarbonDataRDDFactory.scala:731, took 0.088371 s
INFO  01-08 18:19:08,915 - ********starting clean up**********
INFO  01-08 18:19:08,915 - ********clean up done**********
AUDIT 01-08 18:19:08,915 - [master][hadoop][Thread-1]Data load isfailed for default.test_tableINFO 01-08 18:19:08,915 - task runtime:(count: 2, mean: 58.000000,stdev: 20.000000, max: 78.000000, min: 38.000000)INFO 01-08 18:19:08,915 - 0% 5% 10% 25% 50% 75%90% 95% 100%INFO 01-08 18:19:08,915 - 38.0 ms 38.0 ms 38.0 ms 38.0ms 78.0 ms 78.0 ms 78.0 ms 78.0 ms 78.0 msINFO 01-08 18:19:08,916 - task result size:(count: 2, mean:948.000000, stdev: 0.000000, max: 948.000000, min: 948.000000)INFO 01-08 18:19:08,916 - 0% 5% 10% 25% 50% 75%90% 95% 100%INFO 01-08 18:19:08,916 - 948.0 B 948.0 B 948.0 B 948.0B 948.0 B 948.0 B 948.0 B 948.0 B 948.0 B
WARN  01-08 18:19:08,915 - Unable to write load metadata file
ERROR 01-08 18:19:08,917 - main
java.lang.Exception: Dataload failure
atorg.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791)atorg.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1161)atorg.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)atorg.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)atorg.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)atorg.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)atorg.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)atorg.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)atorg.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)atorg.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)atorg.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
atorg.carbondata.spark.rdd.CarbonDataFrameRDD.<init>(CarbonDataFrameRDD.scala:23)
    at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:131)
atorg.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)atorg.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311)atorg.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)atorg.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)atorg.apache.spark.sql.hive.cli.CarbonSQLCLIDriver$.main(CarbonSQLCLIDriver.scala:40)atorg.apache.spark.sql.hive.cli.CarbonSQLCLIDriver.main(CarbonSQLCLIDriver.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
atorg.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)atorg.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Thanks & Regards,


金铸


在 2016/7/31 11:32, Ravindra Pesala 写道:
Hi,
Exception says input path '/opt/incubator-carbondata/sample.csv' doesnot
exist. So please make sure following things,
1. Whether the the sample.csv file is present in the location  '
/opt/incubator-carbondata/'
2. Are you running the Spark in local mode or cluster mode.(If it is
running in cluster mode please keep the csv file in hdfs.)
3. Please try to keep the csv file in hdfs and load the data.

Thanks & Regards,
Ravindra

On 31 July 2016 at 07:37, 金铸 <[email protected]> wrote:
hi Liang：

        Thanks your repay。

        I have already used the “/opt/incubator-carbondata/sample.csv”
，Reported the same error。



在 2016/7/30 22:44, Liang Big data 写道:
Hi jinzhu金铸:


please check the below error:the input path having some issues.
Please use the absolute path to try it again.
-----------------------------------------------
ERROR 29-07 16:39:46,904 - main
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Inputpath
does not exist: /opt/incubator-carbondata/sample.csv

Regards
Liang

2016-07-29 8:47 GMT+08:00 金铸 <[email protected]>:

[hadoop@slave2 ~]$ cat /opt/incubator-carbondata/sample.csv
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
[hadoop@slave2 ~]$

     > load data inpath '../sample.csv' into table test_table;
INFO  29-07 16:39:46,087 - main Property file path:
/opt/incubator-carbondata/bin/../../../conf/carbon.properties
INFO 29-07 16:39:46,087 - main ------Using Carbon.properties--------
INFO  29-07 16:39:46,087 - main {}
INFO 29-07 16:39:46,088 - main Query [LOAD DATA INPATH'../SAMPLE.CSV'
INTO TABLE TEST_TABLE]
INFO 29-07 16:39:46,527 - Successfully able to get the tablemetadata
file lock
INFO 29-07 16:39:46,537 - main Initiating Direct Load for theTable :
(default.test_table)
INFO 29-07 16:39:46,541 - Generate global dictionary from sourcedata
files!
INFO  29-07 16:39:46,569 - [Block Distribution]
INFO  29-07 16:39:46,569 - totalInputSpaceConsumed : 74 ,
defaultParallelism : 6
INFO 29-07 16:39:46,569 -mapreduce.input.fileinputformat.split.maxsize
:
16777216
INFO 29-07 16:39:46,689 - Block broadcast_0 stored as values inmemory
(estimated size 232.6 KB, free 232.6 KB)
INFO 29-07 16:39:46,849 - Block broadcast_0_piece0 stored asbytes in
memory (estimated size 19.7 KB, free 252.3 KB)
INFO  29-07 16:39:46,850 - Added broadcast_0_piece0 in memory on
192.168.241.223:41572 (size: 19.7 KB, free: 511.5 MB)
INFO  29-07 16:39:46,856 - Created broadcast 0 from NewHadoopRDD at
CarbonTextFile.scala:45
ERROR 29-07 16:39:46,904 - generate global dictionary failed
ERROR 29-07 16:39:46,904 - main
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Inputpath
does not exist: /opt/incubator-carbondata/sample.csv
      at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
      at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
      at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
      at
org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:120)
      at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
      at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
      at scala.Option.getOrElse(Option.scala:120)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
      at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
      at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
      at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
      at scala.Option.getOrElse(Option.scala:120)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
atorg.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1307)
      at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
      at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
      at org.apache.spark.rdd.RDD.take(RDD.scala:1302)
      at
com.databricks.spark.csv.CarbonCsvRelation.firstLine$lzycompute(CarbonCsvRelation.scala:175)
      at
com.databricks.spark.csv.CarbonCsvRelation.firstLine(CarbonCsvRelation.scala:170)
      at
com.databricks.spark.csv.CarbonCsvRelation.inferSchema(CarbonCsvRelation.scala:141)
      at
com.databricks.spark.csv.CarbonCsvRelation.<init>(CarbonCsvRelation.scala:71)
      at
com.databricks.spark.csv.newapi.DefaultSource.createRelation(DefaultSource.scala:142)
      at
com.databricks.spark.csv.newapi.DefaultSource.createRelation(DefaultSource.scala:44)
      at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
      at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
      at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
      at
org.carbondata.spark.util.GlobalDictionaryUtil$.loadDataFrame(GlobalDictionaryUtil.scala:365)
      at
org.carbondata.spark.util.GlobalDictionaryUtil$.generateGlobalDictionary(GlobalDictionaryUtil.scala:676)
      at
org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1159)
      at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
      at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
      at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
      at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
      at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
      at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
      at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
      at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
      at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
      at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
      at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
      at
org.carbondata.spark.rdd.CarbonDataFrameRDD.<init>(CarbonDataFrameRDD.scala:23)atorg.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:131)
      at
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
      at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311)
      at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
      at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
      at
org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver$.main(CarbonSQLCLIDriver.scala:40)
      at
org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver.main(CarbonSQLCLIDriver.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
      at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
      at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
atorg.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

--
金铸
---------------------------------------------------------------------------------------------------Confidentiality Notice: The information contained in this e-mailand any
accompanying attachment(s)
is intended only for the use of the intended recipient and may be
confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any
reader
of this communication is
not the intended recipient, unauthorized use, forwarding, printing,
storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this
communication in error,please
immediately notify the sender by return e-mail, and delete theoriginal
message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------


--
金铸
技术发展部（TDD）
东软集团股份有限公司
沈阳浑南新区新秀街2号东软软件园A2-105A
Postcode:110179
Tel: (86 24)8366 2049
Mobile：13897999526



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please
immediately notify the sender by return e-mail, and delete the original message 
and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------

Re: load data error

Reply via email to