yihua commented on code in PR #12935:
URL: https://github.com/apache/hudi/pull/12935#discussion_r1987959169
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala:
##########
@@ -59,17 +60,16 @@ import scala.collection.mutable
* @param filters spark filters that might be pushed down into the
reader
* @param requiredFilters filters that are required and should always be
used, even in merging situations
*/
-class SparkFileFormatInternalRowReaderContext(parquetFileReader:
SparkParquetReader,
- filters: Seq[Filter],
- requiredFilters: Seq[Filter])
extends BaseSparkInternalRowReaderContext {
+class SparkFileFormatInternalRowReaderContext(parquetFileReader:
SparkParquetReader, filters: Seq[Filter],
+ requiredFilters: Seq[Filter],
tableVersion: HoodieTableVersion) extends BaseSparkInternalRowReaderContext {
lazy val sparkAdapter: SparkAdapter = SparkAdapterSupport.sparkAdapter
private lazy val bootstrapSafeFilters: Seq[Filter] =
filters.filter(filterIsSafeForBootstrap) ++ requiredFilters
private val deserializerMap: mutable.Map[Schema, HoodieAvroDeserializer] =
mutable.Map()
private val serializerMap: mutable.Map[Schema, HoodieAvroSerializer] =
mutable.Map()
private lazy val allFilters = filters ++ requiredFilters
override def supportsParquetRowIndex: Boolean = {
- HoodieSparkUtils.gteqSpark3_5
+ HoodieSparkUtils.gteqSpark3_5 &&
tableVersion.greaterThanOrEquals(HoodieTableVersion.EIGHT)
Review Comment:
It is true that the support for position based log merges is only available
with table version 8, but the record positions are stored in the log header
with `HeaderMetadataType.RECORD_POSITIONS`, so the log record reader in the FG
reader should automatically fall back to key-based merging if the record
positions are not available table version. Instead of using the table version,
could you rely on the log header and FG reader so it's generic?
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/SparkBroadcastManager.java:
##########
@@ -71,6 +73,7 @@ public class SparkBroadcastManager extends
EngineBroadcastManager {
public SparkBroadcastManager(HoodieEngineContext context,
HoodieTableMetaClient metaClient) {
this.context = context;
this.metaClient = metaClient;
+ this.tableVersion = metaClient.getTableConfig().getTableVersion();
Review Comment:
Is this for broadcasting only?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]