Team,
I am using Hudi 0.5.0. While writing COW table with below code, many small
files with 15 MB size are getting created, where as total partition size is
300MB +
val output = transDetailsDF.write.format("org.apache.hudi").
option("hoodie.insert.shuffle.parallelism", "2").
option("hoodie.upsert.shuffle.parallelism", "2").
option("hoodie.datasource.write.table.type","COPY_ON_WRITE").
option(OPERATION_OPT_KEY, "upsert").
option(PRECOMBINE_FIELD_OPT_KEY,"transaction_date").
option(RECORDKEY_FIELD_OPT_KEY,"record_key").
option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
option(TABLE_NAME, tableName).
option("hoodie.datasource.write.payload.class","org.apache.hudi.OverwriteWithLatestAvroPayload_Custom").
option("hoodie.memory.merge.max.size", "2004857600000").
option("hoodie.bloom.index.prune.by.ranges","false").
option("hoodie.cleaner.policy","KEEP_LATEST_FILE_VERSIONS").
option("hoodie.cleaner.commits.retained", 2).
option("hoodie.keep.min.commits",3).
option("hoodie.keep.max.commits",5).
option("hoodie.parquet.max.file.size",String.valueOf(128*1024*1024)).
option("hoodie.parquet.small.file.limit",String.valueOf(100*1024*1024)).
mode(Append).
save(basePath);
As per instruction provided in
https://cwiki.apache.org/confluence/display/HUDI/FAQ , I set
compactionSmallFileSize
to 100 MB and limitFileSize to 128 .
Hadoop block size is 256 MB , I am looking for 128 MB files are created.
Am I missing any config here?
Thanks,
Selva