RussellSpitzer commented on code in PR #15239:
URL: https://github.com/apache/iceberg/pull/15239#discussion_r2886420394
##########
docs/docs/spark-configuration.md:
##########
@@ -220,6 +220,7 @@ spark.read
| stream-from-timestamp | (none) | A timestamp in milliseconds to stream from;
if before the oldest known ancestor snapshot, the oldest will be used
|
| streaming-max-files-per-micro-batch | INT_MAX | Maximum number of files per
microbatch
|
| streaming-max-rows-per-micro-batch | INT_MAX | "Soft maximum" number of
rows per microbatch; always includes all rows in next unprocessed file,
excludes additional files if their inclusion would exceed the soft max limit |
+| streaming-checkpoint-use-hadoop | false | Use Hadoop FileSystem for
streaming checkpoint operations instead of the table's FileIO implementation
|
Review Comment:
@danielcweeks ^ What do you think about just always using HadoopFS via
HadoopFileIO? I think it's clear from the code I linked to that it's what Spark
requires to work, so if a user must have configured correctly while TableIO can
be anything.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]