jonvex commented on code in PR #10957:
URL: https://github.com/apache/hudi/pull/10957#discussion_r1626143886
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala:
##########
@@ -46,21 +49,27 @@ import scala.collection.mutable
*
* This uses Spark parquet reader to read parquet data files or parquet log
blocks.
*
- * @param readermaps our intention is to build the reader inside of
getFileRecordIterator, but since it is called from
- * the executor, we will need to port a bunch of the code
from ParquetFileFormat for each spark version
- * for now, we pass in a map of the different readers we
expect to create
+ * @param parquetFileReader A reader that transforms a {@link PartitionedFile}
to an iterator of
+ * {@link InternalRow}. This is required for reading
the base file and
+ * not required for reading a file group with only log
files.
+ * @param recordKeyColumn column name for the recordkey
+ * @param filters spark filters that might be pushed down into the reader
*/
-class SparkFileFormatInternalRowReaderContext(readerMaps: mutable.Map[Long,
PartitionedFile => Iterator[InternalRow]]) extends
BaseSparkInternalRowReaderContext {
+class SparkFileFormatInternalRowReaderContext(parquetFileReader:
SparkParquetReader,
+ recordKeyColumn: String,
+ filters: Seq[Filter]) extends
BaseSparkInternalRowReaderContext {
lazy val sparkAdapter = SparkAdapterSupport.sparkAdapter
val deserializerMap: mutable.Map[Schema, HoodieAvroDeserializer] =
mutable.Map()
+ lazy val recordKeyFilters: Seq[Filter] = filters.filter(f =>
f.references.exists(c => c.equalsIgnoreCase(recordKeyColumn)))
Review Comment:
I don't know
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]