[
https://issues.apache.org/jira/browse/HUDI-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Kudinkin updated HUDI-5276:
----------------------------------
Description:
When we query sql in hive like:
select mainwaybillno,
zonecode,
accountantcode,
baroprcode,
opcode,
row_number() over(PARTITION BY mainwaybillno, zonecode, opcode ORDER BY
barscantm) sn from dm_kafka_rdmp_dw.fvp_core_fact_route_op_hudi_op_new_rt WHERE
opcode IN ('50') and inc_day='20221120' limit 10;
In MapReduce Job the config
mapreduce.input.fileinputformat.inputdir=hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=50
But this file split
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=5000
was added to the job.
This job was failed and throw exception :
2022-11-21 18:11:33,895 INFO [IPC Server handler 1 on 45077]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from
attempt_1668750926041_1011874_m_000110_0: Error: java.lang.RuntimeException:
java.lang.IllegalStateException: Invalid input path
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: Invalid input path
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
at
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
at
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
2022-11-21 18:11:33,897 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report
from attempt_1668750926041_1011874_m_000110_0: Error:
java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: Invalid input path
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
at
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
at
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
was:
When we query sql in hive like:
select mainwaybillno,
zonecode,
accountantcode,
baroprcode,
opcode,
row_number() over(PARTITION BY mainwaybillno, zonecode, opcode ORDER BY
barscantm) sn from dm_kafka_rdmp_dw.fvp_core_fact_route_op_hudi_op_new_rt WHERE
opcode IN ('50') and inc_day='20221120' limit 10;
In MapReduce Job the config
mapreduce.input.fileinputformat.inputdir=hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=50
But this file split
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=5000
was added to the job.
This job was failed and throw exception :
2022-11-21 18:11:33,895 INFO [IPC Server handler 1 on 45077]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from
attempt_1668750926041_1011874_m_000110_0: Error: java.lang.RuntimeException:
java.lang.IllegalStateException: Invalid input path
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: Invalid input path
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
at
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
at
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
2022-11-21 18:11:33,897 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report
from attempt_1668750926041_1011874_m_000110_0: Error:
java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: Invalid input path
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
at
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
at
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
> Hudi getAllQueryPartitionPaths use regular match caused Invalid input path
> add
> -------------------------------------------------------------------------------
>
> Key: HUDI-5276
> URL: https://issues.apache.org/jira/browse/HUDI-5276
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: yuehanwang
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.13.0
>
>
>
> When we query sql in hive like:
> select mainwaybillno,
> zonecode,
> accountantcode,
> baroprcode,
> opcode,
> row_number() over(PARTITION BY mainwaybillno, zonecode, opcode ORDER BY
> barscantm) sn from dm_kafka_rdmp_dw.fvp_core_fact_route_op_hudi_op_new_rt
> WHERE opcode IN ('50') and inc_day='20221120' limit 10;
> In MapReduce Job the config
> mapreduce.input.fileinputformat.inputdir=hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=50
> But this file split
> hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=5000
> was added to the job.
> This job was failed and throw exception :
> 2022-11-21 18:11:33,895 INFO [IPC Server handler 1 on 45077]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from
> attempt_1668750926041_1011874_m_000110_0: Error: java.lang.RuntimeException:
> java.lang.IllegalStateException: Invalid input path
> hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.IllegalStateException: Invalid input path
> hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
> at
> org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
> at
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
> ... 8 more
> 2022-11-21 18:11:33,897 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
> report from attempt_1668750926041_1011874_m_000110_0: Error:
> java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input
> path
> hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.IllegalStateException: Invalid input path
> hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
> at
> org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
> at
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
> ... 8 more
--
This message was sent by Atlassian Jira
(v8.20.10#820010)