Re: carbondata partitioned by date generate many small files

陈星宇 Tue, 05 Jun 2018 18:47:12 -0700

hi Li,
Yes,i got the partition folder as you say, but under the partition folder 
,there are many small file just like following picture,
How to merge then automatically after jobs done.




thanks


ChenXingYu
 
 
------------------ Original ------------------
From:  "Jacky Li"<jacky.li...@qq.com>;
Date:  Tue, Jun 5, 2018 08:43 PM
To:  "dev"<dev@carbondata.apache.org>; 

Subject:  Re: carbondata partitioned by date generate many small files

 
Hi,


There is a testcase in StandardPartitionTableQueryTestCase used date column as 
partition column, if you run that testcase, the partition folder generated 
looks like following picture.
 


Are you getting similar folders?


Regards,
Jacky

在 2018年6月5日，下午2:49，陈星宇 <chenxingyu...@keruyun.com> 写道：

hi carbondata team,


i am using carbondata 1.3.1 to create table and import data, generated many 
small files and spark job is very slow, i suspected the number of file is 
related to the number of spark job . but if i decrease the jobs, job will fail 
because of outofmemory. see my ddl as below:


create table xx.xx(
dept_name string,
xx
.
.
.
) PARTITIONED BY (xxx date)
STORED BY 'carbondata' TBLPROPERTIES('SORT_COLUMNS'='xxx,xxx,xxx ,xxx,xxx')



please give some advice.


thanks


ChenXingYu

Re: carbondata partitioned by date generate many small files

Reply via email to