bkahloon commented on issue #2172: URL: https://github.com/apache/iceberg/issues/2172#issuecomment-771296624
@openinx just a follow up question. I ingested the data using the datastream Flink CDC api. I then used the FlinkSink in Iceberg to write to the Iceberg table. However, I can't seem to figure out this behaviour. The application reads all the rows in the source db, but then doesn't write to the Iceberg table until I cancel the job (it's as if the data gets committed and shows up in S3 once I cancel the job). I checked if there was any backpressure in the job and there was none. From reading into the IcebergFilesCommitter, it seems that Iceberg writes the files on checkpoints ? (please correct me if I'm wrong, I didn't go through the entire implementation). I then enabled checkpoints at an interval of 30seconds and still same result. Will FlinkSink wait until it reaches the 128Mb default parquet files size in Iceberg before it writes out the file ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
