onlywangyh opened a new issue, #7298:
URL: https://github.com/apache/hudi/issues/7298

   **_Tips before filing an issue_**
   
   When we query sql in hive like:
   
   select mainwaybillno, 
                  zonecode, 
                  accountantcode,
                  baroprcode,
                  opcode,
                  row_number() over(PARTITION BY mainwaybillno, zonecode, 
opcode ORDER BY barscantm) sn from 
dm_kafka_rdmp_dw.fvp_core_fact_route_op_hudi_op_new_rt WHERE opcode IN ('50') 
and inc_day='20221120' limit 10;
   
   In MapReduce Job the config 
mapreduce.input.fileinputformat.inputdir=hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=50
   
   But the HoodieParquetInputFormat add 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=5000
 split to the job.
   
   This job will failed and throw exception :
   2022-11-21 18:11:33,895 INFO [IPC Server handler 1 on 45077] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from 
attempt_1668750926041_1011874_m_000110_0: Error: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
       at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
   Caused by: java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
       at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
       at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
       at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
       at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
       at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
       ... 8 more
   
   2022-11-21 18:11:33,897 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1668750926041_1011874_m_000110_0: Error: 
java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
       at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
   Caused by: java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
       at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
       at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
       at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
       at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
       at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
       ... 8 more
   
   
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to