| col_name | data_type | 基数数量 | | dt | string | date | | pt | string | 3 | | lst | string | 1 | | plat | string | 1 | | sty | string | 2 | | is_pay | string | 2 | | is_vip | string | 2 | | is_mpack | string | 2 | | scene | string | 3 | | status | string | 4 | | nw | string | 5 | | isc | string | 5 | | area | string | 9 | | spttag | string | 18 | | province | string | 484 | | isp | string | 706 | | city | string | 1127 | | tv | string | 1577 | | hwm | string | 10000 | | pip | string | 1000000 | | fo | string | 6307095 | | sh | string | 10000000 | | mid | string | 80000000 | | user_id | string | 80000000 | | play_pv | bigint | | | spt_cnt | bigint | | | prg_spt_cnt | bigint | |
At 2017-03-25 18:52:07, "Liang Chen" <chenliang6...@gmail.com> wrote: >Hi > >Please provide all columns' cardinality info(distinct value). > >Regards >Liang > > >ww...@163.com wrote >> Hello! >> >> 0、The failure >> When i insert into carbon table,i encounter failure。The failure is as >> follow: >> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most >> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): >> ExecutorLostFailure (executor 1 exited caused by one of the running tasks) >> Reason: Slave lost+details >> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most >> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): >> ExecutorLostFailure (executor 1 exited caused by one of the running tasks) >> Reason: Slave lost >> Driver stacktrace: >> the stage: >> >> Step: >> 1、start spark-shell >> ./bin/spark-shell \ >> --master yarn-client \ >> --num-executors 5 \ (I tried to set this parameter range from 10 to >> 20,but the second job has only 5 tasks) >> --executor-cores 5 \ >> --executor-memory 20G \ >> --driver-memory 8G \ >> --queue root.default \ >> --jars /xxx.jar >> >> //spark-default.conf spark.default.parallelism=320 >> >> import org.apache.spark.sql.CarbonContext >> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore") >> >> 2、create table >> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst >> String,plat String,sty String,is_pay String,is_vip String,is_mpack >> String,scene String,status String,nw String,isc String,area String,spttag >> String,province String,isp String,city String,tv String,hwm String,pip >> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt >> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED >> BY 'carbondata' TBLPROPERTIES >> ('DICTIONARY_EXCLUDE'='pip,sh,mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')") >> >> >> //notes,set "fo" column BUCKETCOLUMNS is to join another table >> //the column distinct values are as follows: >> >> >> 3、insert into table(xxxx_table_tmp is a hive extenal orc table,has 20 >> 0000 0000 records) >> cc.sql("insert into xxxx_table select >> dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm,pip,fo,sh,mid,user_id >> ,play_pv,spt_cnt,prg_spt_cnt from xxxx_table_tmp where dt='2017-01-01'") >> >> 4、spark split sql into two jobs,the first finished succeeded, but the >> second failed: >> >> >> 5、The second job stage: >> >> >> >> Question: >> 1、Why the second job has only five jobs,but the first job has 994 jobs ?( >> note:My hadoop cluster has 5 datanode) >> I guess it caused the failure >> 2、In the sources,i find DataLoadPartitionCoalescer.class,is it means that >> "one datanode has only one partition ,and then the task is only one on the >> datanode"? >> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set >> as follow,but i can not find "carbon.table.split.partition.enable" in >> other parts of the project。 >> I set "carbon.table.split.partition.enable" to true, but the second >> job has only five jobs.How to use this property? >> ExampleUtils : >> // whether use table split partition >> // true -> use table split partition, support multiple partition >> loading >> // false -> use node split partition, support data load by host >> partition >> >> CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable", >> "false") >> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can i >> speed it. >> 5、in the spark-shell ,I tried to set this parameter range from 10 to >> 20,but the second job has only 5 tasks >> the other parameter executor-memory = 20G is enough? >> >> I need your help!Thank you very much! > >> wwyxg@ > >> >> > >> wwyxg@ > > > > > >-- >View this message in context: >http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/insert-into-carbon-table-failed-tp9609p9610.html >Sent from the Apache CarbonData Mailing List archive mailing list archive at >Nabble.com.