Re: [PR] Spark: Encapsulate parquet objects for Comet [iceberg]

via GitHub Tue, 18 Nov 2025 14:27:47 -0800


parthchandra commented on code in PR #13786:
URL: https://github.com/apache/iceberg/pull/13786#discussion_r2539815504



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java:
##########
@@ -30,6 +30,15 @@ private SparkSQLProperties() {}
   // Controls which Parquet reader implementation to use
   public static final String PARQUET_READER_TYPE = 
"spark.sql.iceberg.parquet.reader-type";
   public static final ParquetReaderType PARQUET_READER_TYPE_DEFAULT = 
ParquetReaderType.ICEBERG;
+
+  // Controls the fully qualified class name of the vectorized Parquet reader 
factory
+  public static final String PARQUET_VECTORIZED_READER_FACTORY =
+      "spark.sql.iceberg.parquet.vectorized-reader.factory";
+

Review Comment:
   fixed



##########
spark/v4.0/build.gradle:
##########
@@ -272,6 +272,8 @@ 
project(":iceberg-spark:iceberg-spark-runtime-${sparkMajorVersion}_${scalaVersio
     integrationImplementation project(path: ':iceberg-hive-metastore', 
configuration: 'testArtifacts')
     integrationImplementation project(path: 
":iceberg-spark:iceberg-spark-${sparkMajorVersion}_${scalaVersion}", 
configuration: 'testArtifacts')
     integrationImplementation project(path: 
":iceberg-spark:iceberg-spark-extensions-${sparkMajorVersion}_${scalaVersion}", 
configuration: 'testArtifacts')
+    integrationImplementation 
"org.apache.datafusion:comet-spark-spark3.5_2.13:${libs.versions.comet.get()}"

Review Comment:
   Corrected (This is a new line so rebasing didn't address this)



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatch.java:
##########
@@ -137,10 +137,24 @@ public PartitionReaderFactory createReaderFactory() {
   }
 
   private ParquetBatchReadConf parquetBatchReadConf(ParquetReaderType 
readerType) {
-    return ImmutableParquetBatchReadConf.builder()
-        .batchSize(readConf.parquetBatchSize())
-        .readerType(readerType)
-        .build();
+    String factoryClassName = readConf.parquetVectorizedReaderFactory();
+
+    // If no explicit factory is set and reader type is COMET, use the default 
Comet factory
+    if (factoryClassName == null && readerType == ParquetReaderType.COMET) {
+      factoryClassName =
+          
org.apache.iceberg.spark.SparkSQLProperties.COMET_VECTORIZED_READER_FACTORY_CLASS;

Review Comment:
   Fails checkstyle - 
   ```
    Task :iceberg-spark:iceberg-spark-4.0_2.13:checkstyleMain FAILED
   [ant:checkstyle] [ERROR] 
/iceberg/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatch.java:21:58:
 Using a static member import should be avoided - 
org.apache.iceberg.spark.SparkSQLProperties.COMET_VECTORIZED_READER_FACTORY_CLASS.
 [AvoidStaticImport]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Encapsulate parquet objects for Comet [iceberg]

Reply via email to