[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2248: Spark: Fix vectorization flags

GitBox Mon, 22 Feb 2021 13:44:25 -0800


aokolnychyi commented on a change in pull request #2248:
URL: https://github.com/apache/iceberg/pull/2248#discussion_r580612852




##########
File path: spark3/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -474,15 +475,31 @@ public static boolean isLocalityEnabled(FileIO io, String 
location, CaseInsensit
     return false;
   }
 
-  public static boolean isVectorizationEnabled(Map<String, String> properties, 
CaseInsensitiveStringMap readOptions) {
+  public static boolean isVectorizationEnabled(FileFormat fileFormat,
+                                               Map<String, String> properties,
+                                               CaseInsensitiveStringMap 
readOptions) {
     String batchReadsSessionConf = SparkSession.active().conf()
         .get("spark.sql.iceberg.vectorization.enabled", null);
     if (batchReadsSessionConf != null) {
       return Boolean.valueOf(batchReadsSessionConf);
     }
-    return readOptions.getBoolean(SparkReadOptions.VECTORIZATION_ENABLED,

Review comment:
       Oh, I am a bit surprised how we handle nullability.
   
   My main concern is the session config is usually used for default values 
that make sense for most tables. However, not having a way to override the 
session config on a job level does not sound right to me. For example, if a 
user works with 5 tables in a notebook, I get that he/she won't want to repeat 
the same configs for all tables. So the session conf is probably a good place 
for common configs. However, what if there is a config that must be changed for 
one of those 5 tables? Requiring a separate notebook does not seem reasonable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2248: Spark: Fix vectorization flags

Reply via email to