[GitHub] [iceberg] ankitkpandey opened a new issue #2342: Control on the number of data files created

GitBox Wed, 17 Mar 2021 01:57:59 -0700


ankitkpandey opened a new issue #2342:
URL: https://github.com/apache/iceberg/issues/2342



   Hi, I'm trying to use Spark along with Iceberg to capture differential data, 
using Spark SQL's MERGE INTO command.
   But I see around 200 files each with roughly 1mb size. Is there a 
configuration var which I can use to reduce the number of files?
   
   Also, my current use-case doesn't require time-travel and old snapshots, so 
is there a way to automatically delete them while merging the new data. Maybe 
just keeping the last snapshot.
   I have looked extensively through the docs but could only find methods using 
the Table and Actions API.
   
   Any help would be appreciated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ankitkpandey opened a new issue #2342: Control on the number of data files created

Reply via email to