xiarixiaoyao commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1536060525
@rahil-c @danny0405 @yihua I don't think this error was caused by spark.sql.parquet.enableVectorizedReader=true, This value spark defaults to true. The modification of Spark below mentioned this error, we can adapt according to this PR. https://github.com/apache/spark/commit/77694b4673dd2efb5b79d596fbd647af3db5f8a0 for spark3.3.2 add the following code to Spark32PlusHoodieParquetFileFormat(maybe we need a new parquet format for spark3.3.2) can solve the problem ``` // Should always be set by FileSourceScanExec creating this. // Check conf before checking option, to allow working around an issue by changing conf. val returningBatch = sparkSession.sessionState.conf.parquetVectorizedReaderEnabled && options.get(FileFormat.OPTION_RETURNING_BATCH) .getOrElse { throw new IllegalArgumentException( "OPTION_RETURNING_BATCH should always be set for ParquetFileFormat. " + "To workaround this issue, set spark.sql.parquet.enableVectorizedReader=false.") } .equals("true") if (returningBatch) { // If the passed option said that we are to return batches, we need to also be able to // do this based on config and resultSchema. assert(supportBatch(sparkSession, resultSchema)) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
