Below are the few files. -rw-r--r-- 3 dvcc Hadoop_cdp 15.1 M 2020-03-15 19:09 /projects/20191201/10/da5d5747-91cb-4fd4-bd2a-1881cae8b1ba-0_12-253-3275_20200315190853.parquet -rw-r--r-- 3 dvcc Hadoop_cdp 15.2 M 2020-03-15 19:09 /projects/20191201/10/8b111872-f797-4a24-990c-8854b7dcaf48-0_11-253-3274_20200315190853.parquet -rw-r--r-- 3 dvcc Hadoop_cdp 15.2 M 2020-03-15 19:09 /projects/20191201/10/84b6aeb1-6c05-4a80-bf05-29256bbe03a7-0_17-253-3280_20200315190853.parquet -rw-r--r-- 3 dvcc Hadoop_cdp 15.1 M 2020-03-15 19:09 /projects/20191201/10/2fd64689-aa67-4727-ac47-262680aad570-0_14-253-3277_20200315190853.parquet
On Sun, Mar 15, 2020 at 12:16 PM selvaraj periyasamy < [email protected]> wrote: > Team, > > I am using Hudi 0.5.0. While writing COW table with below code, many small > files with 15 MB size are getting created, where as total partition size is > 300MB + > > val output = transDetailsDF.write.format("org.apache.hudi"). > option("hoodie.insert.shuffle.parallelism", "2"). > option("hoodie.upsert.shuffle.parallelism", "2"). > option("hoodie.datasource.write.table.type","COPY_ON_WRITE"). > option(OPERATION_OPT_KEY, "upsert"). > option(PRECOMBINE_FIELD_OPT_KEY,"transaction_date"). > option(RECORDKEY_FIELD_OPT_KEY,"record_key"). > option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). > option(TABLE_NAME, tableName). > > option("hoodie.datasource.write.payload.class","org.apache.hudi.OverwriteWithLatestAvroPayload_Custom"). > option("hoodie.memory.merge.max.size", "2004857600000"). > option("hoodie.bloom.index.prune.by.ranges","false"). > option("hoodie.cleaner.policy","KEEP_LATEST_FILE_VERSIONS"). > option("hoodie.cleaner.commits.retained", 2). > option("hoodie.keep.min.commits",3). > option("hoodie.keep.max.commits",5). > > option("hoodie.parquet.max.file.size",String.valueOf(128*1024*1024)). > > option("hoodie.parquet.small.file.limit",String.valueOf(100*1024*1024)). > mode(Append). > save(basePath); > As per instruction provided in > https://cwiki.apache.org/confluence/display/HUDI/FAQ , I set > compactionSmallFileSize > to 100 MB and limitFileSize to 128 . > > Hadoop block size is 256 MB , I am looking for 128 MB files are created. > > Am I missing any config here? > > Thanks, > Selva >
