Hi, In Append mode , the carbon table supposed to be created before other wise load fails as Table do not exist. In Overwrite mode the carbon table would be created (it drops if it already exists) and loads the data. But in your case for overwrite mode it creates the table but it says table not found while loading. Can you provide script to reproduce this issue and also provide the carbondata and spark version you are using.
Regards, Ravindra. On 25 November 2016 at 17:58, ZhuWilliam <[email protected]> wrote: > When I change the SaveMode.Append to Override,then the error is more weird: > > > > INFO 25-11 20:19:46,572 - streaming-job-executor-0 Query [ > CREATE TABLE IF NOT EXISTS DEFAULT.CARBON2 > (A STRING, B STRING) > STORED BY 'ORG.APACHE.CARBONDATA.FORMAT' > ] > INFO 25-11 20:19:46,656 - Parsing command: > CREATE TABLE IF NOT EXISTS default.carbon2 > (a STRING, b STRING) > STORED BY 'org.apache.carbondata.format' > > INFO 25-11 20:19:46,663 - Parse Completed > AUDIT 25-11 20:19:46,860 - [allwefantasy][allwefantasy][ > Thread-100]Creating > Table with Database name [default] and Table name [carbon2] > INFO 25-11 20:19:46,889 - 1: get_tables: db=default pat=.* > INFO 25-11 20:19:46,889 - ugi=allwefantasy ip=unknown-ip-addr > cmd=get_tables: db=default pat=.* > INFO 25-11 20:19:46,889 - 1: Opening raw store with implemenation > class:org.apache.hadoop.hive.metastore.ObjectStore > INFO 25-11 20:19:46,891 - ObjectStore, initialize called > INFO 25-11 20:19:46,897 - Reading in results for query > "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used > is > closing > INFO 25-11 20:19:46,898 - Using direct SQL, underlying DB is MYSQL > INFO 25-11 20:19:46,898 - Initialized ObjectStore > INFO 25-11 20:19:46,954 - streaming-job-executor-0 Table block size not > specified for default_carbon2. Therefore considering the default value 1024 > MB > INFO 25-11 20:19:46,978 - Table carbon2 for Database default created > successfully. > INFO 25-11 20:19:46,978 - streaming-job-executor-0 Table carbon2 for > Database default created successfully. > AUDIT 25-11 20:19:46,978 - [allwefantasy][allwefantasy][ > Thread-100]Creating > timestamp file for default.carbon2 > INFO 25-11 20:19:46,979 - streaming-job-executor-0 Query [CREATE TABLE > DEFAULT.CARBON2 USING CARBONDATA OPTIONS (TABLENAME "DEFAULT.CARBON2", > TABLEPATH "FILE:///TMP/CARBONDATA/STORE/DEFAULT/CARBON2") ] > INFO 25-11 20:19:47,033 - 1: get_table : db=default tbl=carbon2 > INFO 25-11 20:19:47,034 - ugi=allwefantasy ip=unknown-ip-addr > cmd=get_table > : db=default tbl=carbon2 > WARN 25-11 20:19:47,062 - Couldn't find corresponding Hive SerDe for data > source provider carbondata. Persisting data source relation > `default`.`carbon2` into Hive metastore in Spark SQL specific format, which > is NOT compatible with Hive. > INFO 25-11 20:19:47,247 - 1: create_table: Table(tableName:carbon2, > dbName:default, owner:allwefantasy, createTime:1480076387, > lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, > type:array<string>, comment:from deserializer)], location:null, > inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2. > MetadataTypedColumnsetSerDe, > parameters:{tableName=default.carbon2, serialization.format=1, > tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[], > sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], > skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], > parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata}, > viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, > privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, > rolePrivileges:null)) > INFO 25-11 20:19:47,247 - ugi=allwefantasy ip=unknown-ip-addr > cmd=create_table: Table(tableName:carbon2, dbName:default, > owner:allwefantasy, createTime:1480076387, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, > comment:from deserializer)], location:null, > inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2. > MetadataTypedColumnsetSerDe, > parameters:{tableName=default.carbon2, serialization.format=1, > tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[], > sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], > skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], > parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata}, > viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, > privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, > rolePrivileges:null)) > INFO 25-11 20:19:47,257 - Creating directory if it doesn't exist: > file:/tmp/user/hive/warehouse/carbon2 > AUDIT 25-11 20:19:47,564 - [allwefantasy][allwefantasy][Thread-100]Table > created with Database name [default] and Table name [carbon2] > org.apache.spark.sql.catalyst.analysis.NoSuchTableException > at > org.apache.spark.sql.hive.CarbonMetastoreCatalog.lookupRelation1( > CarbonMetastoreCatalog.scala:141) > at > org.apache.spark.sql.hive.CarbonMetastoreCatalog.lookupRelation1( > CarbonMetastoreCatalog.scala:127) > at > org.apache.spark.sql.execution.command.LoadTable. > run(carbonTableSchema.scala:1044) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.loadDataFrame( > CarbonDataFrameWriter.scala:132) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile( > CarbonDataFrameWriter.scala:52) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.saveAsCarbonFile( > CarbonDataFrameWriter.scala:37) > at > org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation. > scala:110) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply( > ResolvedDataSource.scala:222) > at org.apache.spark.sql.DataFrameWriter.save( > DataFrameWriter.scala:148) > at > streaming.core.compositor.spark.streaming.output. > SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:60) > at > streaming.core.compositor.spark.streaming.output. > SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:53) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$ > foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$ > foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.ForEachDStream$$ > anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) > at > org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties( > DStream.scala:426) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp( > ForEachDStream.scala:49) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply( > ForEachDStream.scala:49) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply( > ForEachDStream.scala:49) > at scala.util.Try$.apply(Try.scala:161) > at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at > org.apache.spark.streaming.scheduler.JobScheduler$ > JobHandler.run(JobScheduler.scala:223) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > AUDIT 25-11 20:19:47,584 - [allwefantasy][allwefantasy][Thread-100]Table > Not > Found: carbon2 > INFO 25-11 20:19:47,588 - Finished job streaming job 1480076380000 ms.0 > from job set of time 1480076380000 ms > INFO 25-11 20:19:47,590 - Total delay: 7.586 s for time 1480076380000 ms > (execution: 7.547 s) > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Using-DataFrame- > to-write-carbondata-file-cause-no-table-found-error-tp3203p3212.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > -- Thanks & Regards, Ravi
