Re: [PR] spark: Don't use table FileIO for checkpointing files [iceberg]

via GitHub Thu, 05 Mar 2026 11:54:47 -0800


RussellSpitzer commented on code in PR #15239:
URL: https://github.com/apache/iceberg/pull/15239#discussion_r2891949434



##########
docs/docs/spark-configuration.md:
##########
@@ -220,6 +220,7 @@ spark.read
 | stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; 
if before the oldest known ancestor snapshot, the oldest will be used           
                                                  |
 | streaming-max-files-per-micro-batch | INT_MAX | Maximum number of files per 
microbatch                                                                      
                                                                  |
 | streaming-max-rows-per-micro-batch  | INT_MAX | "Soft maximum" number of 
rows per microbatch; always includes all rows in next unprocessed file, 
excludes additional files if their inclusion would exceed the soft max limit |
+| streaming-checkpoint-use-hadoop | false | Use Hadoop FileSystem for 
streaming checkpoint operations instead of the table's FileIO implementation    
                                                                          |

Review Comment:
   Yeah the only reason why I think we should use HadoopFileIO is that it 
minimizes the amount of code we have to maintain here. If you are in favor of a 
vanilla HadoopFS impl I think that's fine too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] spark: Don't use table FileIO for checkpointing files [iceberg]

Reply via email to