Re: [PR] [MINOR]Follow up HUDI-8803, optimize vectorized reader by cache 'batchIdxField' [hudi]

via GitHub Wed, 26 Mar 2025 19:45:25 -0700


wangyinsheng commented on code in PR #13023:
URL: https://github.com/apache/hudi/pull/13023#discussion_r2015430063



##########
hudi-spark-datasource/hudi-spark3-common/src/main/java/org/apache/spark/sql/execution/datasources/parquet/Spark3HoodieVectorizedParquetRecordReader.java:
##########
@@ -161,8 +163,10 @@ public Object getCurrentValue() {
 
   private int batchIdxFromSuper() {
     try {
-      Field batchIdxField = 
VectorizedParquetRecordReader.class.getDeclaredField("batchIdx");
-      batchIdxField.setAccessible(true);
+      if (batchIdxField == null) {
+        batchIdxField = 
VectorizedParquetRecordReader.class.getDeclaredField("batchIdx");
+        batchIdxField.setAccessible(true);

Review Comment:
   `batchIdx` works when `returnColumnarBatch==false`, it represents the index 
within a batch. It is set to 0 when `nextBatch` is called, and incremented by 1 
when `nextKeyValue` is called,  and it is used as index（`batchIdx - 1`）to get 
corresponding value when `currentKeyValue` is called.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [MINOR]Follow up HUDI-8803, optimize vectorized reader by cache 'batchIdxField' [hudi]

Reply via email to