RussellSpitzer commented on a change in pull request #3292:
URL: https://github.com/apache/iceberg/pull/3292#discussion_r729979609
##########
File path:
spark3/src/main/java/org/apache/iceberg/spark/source/SparkFilesScan.java
##########
@@ -37,9 +37,10 @@
class SparkFilesScan extends SparkBatchScan {
private final String taskSetID;
- private final long splitSize;
- private final int splitLookback;
- private final long splitOpenFileCost;
+ private final Long readSplitSize;
+ private final Long planTargetSize;
+ private final Integer splitLookback;
Review comment:
ah no reason, I can change them back
##########
File path:
spark3/src/main/java/org/apache/iceberg/spark/actions/Spark3BinPackStrategy.java
##########
@@ -61,9 +61,15 @@ public Table table() {
SparkSession cloneSession = spark.cloneSession();
cloneSession.conf().set(SQLConf.ADAPTIVE_EXECUTION_ENABLED().key(),
false);
+ long targetReadSize = splitSize(inputFileSize(filesToRewrite));
+ // Ideally this would be the row-group size but the row group size is
not guaranteed to be consistent
Review comment:
Sure, my main goal here was that we want pieces from a file that is
greater than 50% of the target file size. Whether or not the current row group
size is compatible with that, just dividing by four would give us a chance to
get pieces that fit.
##########
File path:
spark3/src/main/java/org/apache/iceberg/spark/source/SparkFilesScan.java
##########
@@ -37,9 +37,10 @@
class SparkFilesScan extends SparkBatchScan {
private final String taskSetID;
- private final long splitSize;
- private final int splitLookback;
- private final long splitOpenFileCost;
+ private final Long readSplitSize;
+ private final Long planTargetSize;
+ private final Integer splitLookback;
Review comment:
Ah I see I was just matching all the other variables in this section
being objects, i'll change them back though
##########
File path:
spark3/src/main/java/org/apache/iceberg/spark/actions/Spark3BinPackStrategy.java
##########
@@ -61,9 +61,15 @@ public Table table() {
SparkSession cloneSession = spark.cloneSession();
cloneSession.conf().set(SQLConf.ADAPTIVE_EXECUTION_ENABLED().key(),
false);
+ long targetReadSize = splitSize(inputFileSize(filesToRewrite));
+ // Ideally this would be the row-group size but the row group size is
not guaranteed to be consistent
Review comment:
Since we only have rowGroup size set for Parquet I'm going to check for
the value being set, and otherwise do the plan size over 4
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]