xiarixiaoyao commented on PR #8082:
URL: https://github.com/apache/hudi/pull/8082#issuecomment-1536060525

   @rahil-c  @danny0405 @yihua 
   I don't think this error was caused by 
spark.sql.parquet.enableVectorizedReader=true,  This value spark defaults to 
true.
   
   The modification of Spark below mentioned this error, we can adapt according 
to this PR. 
   
https://github.com/apache/spark/commit/77694b4673dd2efb5b79d596fbd647af3db5f8a0
   for spark3.3.2 add the following code  to 
Spark32PlusHoodieParquetFileFormat(maybe we need a new parquet format for 
spark3.3.2) can solve the problem 
   ```
       // Should always be set by FileSourceScanExec creating this.
       // Check conf before checking option, to allow working around an issue 
by changing conf.
       val returningBatch = 
sparkSession.sessionState.conf.parquetVectorizedReaderEnabled &&
         options.get(FileFormat.OPTION_RETURNING_BATCH)
           .getOrElse {
             throw new IllegalArgumentException(
               "OPTION_RETURNING_BATCH should always be set for 
ParquetFileFormat. " +
                 "To workaround this issue, set 
spark.sql.parquet.enableVectorizedReader=false.")
           }
           .equals("true")
       if (returningBatch) {
         // If the passed option said that we are to return batches, we need to 
also be able to
         // do this based on config and resultSchema.
         assert(supportBatch(sparkSession, resultSchema))
       }
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to