shardulm94 commented on a change in pull request #2248:
URL: https://github.com/apache/iceberg/pull/2248#discussion_r580536744



##########
File path: spark3/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -474,15 +475,31 @@ public static boolean isLocalityEnabled(FileIO io, String 
location, CaseInsensit
     return false;
   }
 
-  public static boolean isVectorizationEnabled(Map<String, String> properties, 
CaseInsensitiveStringMap readOptions) {
+  public static boolean isVectorizationEnabled(FileFormat fileFormat,
+                                               Map<String, String> properties,
+                                               CaseInsensitiveStringMap 
readOptions) {
     String batchReadsSessionConf = SparkSession.active().conf()
         .get("spark.sql.iceberg.vectorization.enabled", null);
     if (batchReadsSessionConf != null) {
       return Boolean.valueOf(batchReadsSessionConf);
     }
-    return readOptions.getBoolean(SparkReadOptions.VECTORIZATION_ENABLED,

Review comment:
       I see tradeoffs either way. I agree that the most specific value is 
ideally the read options explicitly passed to the table read. But a session 
conf taking higher precedence is also convenient in production to turn off 
vectorization for an application by a pure config change, no need for code 
changes.
   
   Another option we have is to use a boolean `AND` between the session conf 
and read option. This is used in 
https://github.com/apache/iceberg/blob/91ac42174e4c535ece4e36db2cb587a23babced9/spark2/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java#L182
   It can be a little confusing here if the default of session conf (true) is 
different than the default of read option (false), but is worth considering. Or 
maybe a three state boolean is more appropriate here, but that gets complicated 
quickly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to