aokolnychyi commented on a change in pull request #2248:
URL: https://github.com/apache/iceberg/pull/2248#discussion_r580612852
##########
File path: spark3/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -474,15 +475,31 @@ public static boolean isLocalityEnabled(FileIO io, String
location, CaseInsensit
return false;
}
- public static boolean isVectorizationEnabled(Map<String, String> properties,
CaseInsensitiveStringMap readOptions) {
+ public static boolean isVectorizationEnabled(FileFormat fileFormat,
+ Map<String, String> properties,
+ CaseInsensitiveStringMap
readOptions) {
String batchReadsSessionConf = SparkSession.active().conf()
.get("spark.sql.iceberg.vectorization.enabled", null);
if (batchReadsSessionConf != null) {
return Boolean.valueOf(batchReadsSessionConf);
}
- return readOptions.getBoolean(SparkReadOptions.VECTORIZATION_ENABLED,
Review comment:
Oh, I am a bit surprised how we handle nullability.
My main concern is the session config is usually used for default values
that make sense for most tables. However, not having a way to override the
session config on a job level does not sound right to me. For example, if a
user works with 5 tables in a notebook, I get that he/she won't want to repeat
the same configs for all tables. So the session conf is probably a good place
for common configs. However, what if there is a config that must be changed for
one of those 5 tables? Requiring a separate notebook does not seem reasonable.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]