Re: carbondata partitioned by date generate many small files

陈星宇 Wed, 20 Jun 2018 05:48:08 -0700

hi Jacky,


see my file list as below. it generated lots of small file. how to merge them


8.6 K    25.9 K   
/xx/partition_date=2018-05-10/101100620100001_batchno0-0-1529497245506.carbonindex
4.6 K    13.9 K   
/xx/partition_date=2018-05-10/101100621100003_batchno0-0-1529497245506.carbonindex
4.6 K    13.9 K   
/xx/partition_date=2018-05-10/101100626100003_batchno0-0-1529497245506.carbonindex
4.6 K    13.8 K   
/xx/partition_date=2018-05-10/101100636100007_batchno0-0-1529497245506.carbonindex
4.6 K    13.7 K   
/xx/partition_date=2018-05-10/101100637100011_batchno0-0-1529497245506.carbonindex
4.6 K    13.7 K   
/xx/partition_date=2018-05-10/101100641100005_batchno0-0-1529497245506.carbonindex
4.7 K    14.1 K   
/xx/partition_date=2018-05-10/101100648100009_batchno0-0-1529497245506.carbonindex
6.0 K    18.1 K   
/xx/partition_date=2018-05-10/101100649100002_batchno0-0-1529497245506.carbonindex
885.6 K  2.6 M    
/xx/partition_date=2018-05-10/part-0-100100035100009_batchno0-0-1529495933936.carbondata
6.4 M    19.3 M   
/xx/partition_date=2018-05-10/part-0-100100052100013_batchno0-0-1529495933936.carbondata
11.6 M   34.9 M   
/xx/partition_date=2018-05-10/part-0-100100077100011_batchno0-0-1529495933936.carbondata
427.4 K  1.3 M    
/xx/partition_date=2018-05-10/part-0-100100079100003_batchno0-0-1529495933936.carbondata
5.2 M    15.5 M   
/xx/partition_date=2018-05-10/part-0-100100089100010_batchno0-0-1529495933936.carbondata
16.4 M   49.3 M   
/xx/partition_date=2018-05-10/part-0-100100123100008_batchno0-0-1529495933936.carbondata
6.0 M    18.1 M   
/xx/partition_date=2018-05-10/part-0-100100134100006_batchno0-0-1529495933936.carbondata
9.6 M    28.9 M   
/xx/partition_date=2018-05-10/part-0-100100144100006_batchno0-0-1529495933936.carbondata
28.7 M   86.2 M   
/xx/partition_date=2018-05-10/part-0-100100145100001_batchno0-0-1529495933936.carbondata
11.7 K   35.0 K   
/xx/partition_date=2018-05-10/part-0-100100168100040_batchno0-0-1529495933936.carbondata





chenxingyu 
 
------------------ Original ------------------
From:  "Jacky Li"<jacky.li...@qq.com>;
Date:  Fri, Jun 8, 2018 04:07 PM
To:  "dev"<dev@carbondata.apache.org>; 

Subject:  Re: carbondata partitioned by date generate many small files

 
Hi, 

I couldn’t see the picture you sent, can you send a text of it?

Regards,
Jacky

> 在 2018年6月6日，上午9:46，陈星宇 <chenxingyu...@keruyun.com> 写道：
> 
> hi Li,
> Yes,i got the partition folder as you say, but under the partition folder 
> ,there are many small file just like following picture,
> How to merge then automatically after jobs done.
> 
> 
> thanks
> 
> ChenXingYu
>  
>  
> ------------------ Original ------------------
> From:  "Jacky Li"<jacky.li...@qq.com>;
> Date:  Tue, Jun 5, 2018 08:43 PM
> To:  "dev"<dev@carbondata.apache.org>;
> Subject:  Re: carbondata partitioned by date generate many small files
>  
> Hi,
> 
> There is a testcase in StandardPartitionTableQueryTestCase used date column 
> as partition column, if you run that testcase, the partition folder generated 
> looks like following picture.
>  
> 
> Are you getting similar folders?
> 
> Regards,
> Jacky
> 
>> 在 2018年6月5日，下午2:49，陈星宇 <chenxingyu...@keruyun.com 
>> <mailto:chenxingyu...@keruyun.com>> 写道：
>> 
>> hi carbondata team,
>> 
>> 
>> i am using carbondata 1.3.1 to create table and import data, generated many 
>> small files and spark job is very slow, i suspected the number of file is 
>> related to the number of spark job . but if i decrease the jobs, job will 
>> fail because of outofmemory. see my ddl as below:
>> 
>> 
>> create table xx.xx(
>> dept_name string,
>> xx
>> .
>> .
>> .
>> ) PARTITIONED BY (xxx date)
>> STORED BY 'carbondata' TBLPROPERTIES('SORT_COLUMNS'='xxx,xxx,xxx ,xxx,xxx')
>> 
>> 
>> 
>> please give some advice.
>> 
>> 
>> thanks
>> 
>> 
>> ChenXingYu
>

Re: carbondata partitioned by date generate many small files

Reply via email to