Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

via GitHub Tue, 04 Jun 2024 07:43:17 -0700


jonvex commented on code in PR #10957:
URL: https://github.com/apache/hudi/pull/10957#discussion_r1626143886



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala:
##########
@@ -46,21 +49,27 @@ import scala.collection.mutable
  *
  * This uses Spark parquet reader to read parquet data files or parquet log 
blocks.
  *
- * @param readermaps our intention is to build the reader inside of 
getFileRecordIterator, but since it is called from
- *                   the executor, we will need to port a bunch of the code 
from ParquetFileFormat for each spark version
- *                   for now, we pass in a map of the different readers we 
expect to create
+ * @param parquetFileReader A reader that transforms a {@link PartitionedFile} 
to an iterator of
+ *                        {@link InternalRow}. This is required for reading 
the base file and
+ *                        not required for reading a file group with only log 
files.
+ * @param recordKeyColumn column name for the recordkey
+ * @param filters spark filters that might be pushed down into the reader
  */
-class SparkFileFormatInternalRowReaderContext(readerMaps: mutable.Map[Long, 
PartitionedFile => Iterator[InternalRow]]) extends 
BaseSparkInternalRowReaderContext {
+class SparkFileFormatInternalRowReaderContext(parquetFileReader: 
SparkParquetReader,
+                                              recordKeyColumn: String,
+                                              filters: Seq[Filter]) extends 
BaseSparkInternalRowReaderContext {
   lazy val sparkAdapter = SparkAdapterSupport.sparkAdapter
   val deserializerMap: mutable.Map[Schema, HoodieAvroDeserializer] = 
mutable.Map()
+  lazy val recordKeyFilters: Seq[Filter] = filters.filter(f => 
f.references.exists(c => c.equalsIgnoreCase(recordKeyColumn)))

Review Comment:
   I don't know



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

Reply via email to