shardulm94 commented on a change in pull request #2248:
URL: https://github.com/apache/iceberg/pull/2248#discussion_r578028187
##########
File path: spark3/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -474,15 +475,28 @@ public static boolean isLocalityEnabled(FileIO io, String
location, CaseInsensit
return false;
}
- public static boolean isVectorizationEnabled(Map<String, String> properties,
CaseInsensitiveStringMap readOptions) {
+ public static boolean isVectorizationEnabled(FileFormat fileFormat,
+ Map<String, String> properties,
+ CaseInsensitiveStringMap
readOptions) {
String batchReadsSessionConf = SparkSession.active().conf()
.get("spark.sql.iceberg.vectorization.enabled", null);
if (batchReadsSessionConf != null) {
return Boolean.valueOf(batchReadsSessionConf);
}
- return readOptions.getBoolean(SparkReadOptions.VECTORIZATION_ENABLED,
- PropertyUtil.propertyAsBoolean(properties,
- TableProperties.PARQUET_VECTORIZATION_ENABLED,
TableProperties.PARQUET_VECTORIZATION_ENABLED_DEFAULT));
+
+ switch (fileFormat) {
+ case PARQUET:
+ boolean defaultValue = PropertyUtil.propertyAsBoolean(
+ properties,
+ TableProperties.PARQUET_VECTORIZATION_ENABLED,
+ TableProperties.PARQUET_VECTORIZATION_ENABLED_DEFAULT);
+ return readOptions.getBoolean(SparkReadOptions.VECTORIZATION_ENABLED,
defaultValue);
+ case ORC:
+ // TODO: support a table property to enable/disable vectorized reads
in ORC
+ return readOptions.getBoolean(SparkReadOptions.VECTORIZATION_ENABLED,
true);
Review comment:
We can add one. My initial plan was to remove the Parquet specific table
property and just have a generic table property for vectorization, but I just
forgot about it. At LinkedIn, we just pass it as a datasource option because we
have another layer above Iceberg where we set that.
ORC vectorized reader supports all datatypes, so I don't see an issue with
vectorization being enabled by default. It won't work with delete files, but I
think we have checks elsewhere for that. We can keep it disabled by default for
backwards compatibility maybe?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]