voonhous commented on code in PR #8501:
URL: https://github.com/apache/hudi/pull/8501#discussion_r1172362684


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadTableState.java:
##########
@@ -85,7 +86,7 @@ public int getOperationPos() {
   public int[] getRequiredPositions() {
     final List<String> fieldNames = rowType.getFieldNames();
     return requiredRowType.getFieldNames().stream()
-        .map(fieldNames::indexOf)
+        .map(f -> getIdxOfFieldOrElseThrow(f, fieldNames))
         .mapToInt(i -> i)

Review Comment:
   Here's a concrete example:
   
   A user wants to read a hudi-source with the following schema:
   ```
   hudi-source(
       `id` INT,
       `user_id` INT,
       `name` STRING,
       `partition_col` STRING
   ) partitioned by (`partition_col`)
   ```
   
   When writing submitting a flink-sql file to be executed in the stream mode, 
he submitted a table DDL as such:
   
   ~~user_id~~ -> driver_id
   
   ```
   hudi-source(
       `id` INT,
       `driver_id` INT,
       `name` STRING,
       `partition_col` STRING
   ) partitioned by (`partition_col`)
   ```
   
   When something like this occurs, a VERY OBTUSE error is thrown:
   
   ```
   Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
       at 
org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.lambda$genPartColumnarRowReader$0(ParquetSplitReaderUtil.java:119)
       at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
       at 
java.util.Spliterators$IntArraySpliterator.forEachRemaining(Spliterators.java:1032)
       at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
       at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
       at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
       at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
       at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   ...
   ```
   
   For something that is pretty easy to fix, it shouldn't be throwing such an 
OBTUSE error... So, this PR aims to make such errors more obvious.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to