Re: [PR] [MINOR]Follow up HUDI-8803, optimize vectorized reader by cache 'batchIdxField' [hudi]

via GitHub Wed, 26 Mar 2025 21:25:06 -0700


wangyinsheng commented on code in PR #13023:
URL: https://github.com/apache/hudi/pull/13023#discussion_r2015554419



##########
hudi-spark-datasource/hudi-spark3-common/src/main/java/org/apache/spark/sql/execution/datasources/parquet/Spark3HoodieVectorizedParquetRecordReader.java:
##########
@@ -161,8 +163,10 @@ public Object getCurrentValue() {
 
   private int batchIdxFromSuper() {
     try {
-      Field batchIdxField = 
VectorizedParquetRecordReader.class.getDeclaredField("batchIdx");
-      batchIdxField.setAccessible(true);
+      if (batchIdxField == null) {
+        batchIdxField = 
VectorizedParquetRecordReader.class.getDeclaredField("batchIdx");
+        batchIdxField.setAccessible(true);

Review Comment:
   the reset is happens in parent class， The 
`Spark3HoodieVectorizedParquetRecordReader` maintains a `columnarBatch`  to  
handle schema changes. In this case,  it reuses the `batchIdx` from the parent 
class  to get currentValue. 
   Note that, this PR cached the `Field` of `batchIdx`, not the int value,  
then we don‘t need to get the `batchIdxField` from class every time



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [MINOR]Follow up HUDI-8803, optimize vectorized reader by cache 'batchIdxField' [hudi]

Reply via email to