Re:Re: insert into carbon table failed

a Sat, 25 Mar 2017 04:18:39 -0700

| col_name | data_type | 基数数量 |
| dt       | string | date |
| pt       | string | 3 |
| lst      | string | 1 |
| plat     | string | 1 |
| sty      | string | 2 |
| is_pay | string | 2 |
| is_vip | string | 2 |
| is_mpack | string | 2 |
| scene    | string | 3 |
| status   | string | 4 |
| nw       | string | 5 |
| isc      | string | 5 |
| area     | string | 9 |
| spttag   | string | 18 |
| province | string | 484 |
| isp      | string | 706 |
| city     | string | 1127 |
| tv       | string | 1577 |
| hwm      | string | 10000 |
| pip      | string | 1000000 |
| fo | string | 6307095 |
| sh       | string | 10000000 |
| mid      | string | 80000000 |
| user_id  | string | 80000000 |
| play_pv | bigint | 　 |
| spt_cnt  | bigint | 　 |
| prg_spt_cnt | bigint | 　 |







At 2017-03-25 18:52:07, "Liang Chen" <chenliang6...@gmail.com> wrote:
>Hi
>
>Please provide all columns' cardinality info(distinct value).
>
>Regards
>Liang
>
>
>ww...@163.com wrote
>> Hello!
>> 
>> 0、The failure
>> When i insert into carbon table，i encounter failure。The failure is  as
>> follow:
>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>> Reason: Slave lost+details
>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>> Reason: Slave lost
>> Driver stacktrace:
>> the stage:
>> 
>> Step:
>> 1、start spark－shell
>> ./bin/spark-shell \ 
>> --master yarn-client \ 
>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>> 20,but the second job has only 5 tasks)
>> --executor-cores 5 \ 
>> --executor-memory 20G \ 
>> --driver-memory 8G \ 
>> --queue root.default \ 
>> --jars /xxx.jar
>> 
>> //spark-default.conf spark.default.parallelism=320
>> 
>> import org.apache.spark.sql.CarbonContext 
>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore") 
>> 
>> 2、create table
>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>> String,scene String,status String,nw String,isc String,area String,spttag
>> String,province String,isp String,city String,tv String,hwm String,pip
>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
>> BY 'carbondata' TBLPROPERTIES
>> ('DICTIONARY_EXCLUDE'='pip,sh,mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
>>  
>> 
>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
>> //the column distinct values are as follows:
>> 
>> 
>> 3、insert into table（xxxx_table_tmp  is a hive extenal orc table，has 20
>> 0000 0000 records）
>> cc.sql("insert into xxxx_table select
>> dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm,pip,fo,sh,mid,user_id
>> ,play_pv,spt_cnt,prg_spt_cnt from xxxx_table_tmp where dt='2017-01-01'")
>> 
>> 4、spark split sql into two jobs，the first finished succeeded, but the
>> second failed:
>> 
>>  
>> 5、The second job stage:
>> 
>> 
>> 
>> Question:
>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
>> note:My hadoop cluster has 5 datanode）
>>       I guess it caused the failure 
>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
>> "one datanode has only one partition ,and then the task is only one on the
>> datanode"?
>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
>> as follow,but i can not find "carbon.table.split.partition.enable" in
>> other parts of the project。
>>      I set "carbon.table.split.partition.enable" to true, but the second
>> job has only five jobs.How to use this property?
>>      ExampleUtils :
>>     // whether use table split partition 
>>     // true -> use table split partition, support multiple partition
>> loading 
>>     // false -> use node split partition, support data load by host
>> partition 
>>    
>> CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>> "false") 
>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can i
>> speed it.
>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>> 20,but the second job has only 5 tasks
>>      the other parameter executor-memory = 20G is enough?
>> 
>> I need your help!Thank you very much!
>
>> wwyxg@
>
>> 
>> 
>
>> wwyxg@
>
>
>
>
>
>--
>View this message in context: 
>http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/insert-into-carbon-table-failed-tp9609p9610.html
>Sent from the Apache CarbonData Mailing List archive mailing list archive at 
>Nabble.com.

Re:Re: insert into carbon table failed

Reply via email to