Dear All

I am using DeltaStreamer to stream the data from kafka topic and to write it 
into the hudi data set.
For this use case I am not doing any upsert all are insert only so each job 
creates new parquet file after the inject job. So  large number of small files 
are creating. how can i  merge these files from deltastreamer job using the 
available configurations.

I think compactionSmallFileSize may useful for this case,  but i am not sure 
whether it is for deltastreamer or not. I tried it in deltastreamer but it 
did't worked. Please assist on this. If possible give one example for the same

Thanks & Regards
Rahul 

Reply via email to