labixiaoxiaopang opened a new issue, #9353: URL: https://github.com/apache/hudi/issues/9353
**Describe the problem you faced** A clear and concise description of the problem. hello, I exec "set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat; select count(1) from table_rt;" to query table rows error. **Steps to reproduce the behavior:** 1.I uses Flink to write to Hudi,but Hudi is not compact,The files on HDFS are log files,like  2.I exec this sql > ``` >set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat; >select count(1) from table_rt; > ``` but it threw an exception. > ``` >2023-08-03 15:00:37,896 FATAL [IPC Server handler 0 on 11879] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1686040374408_0052_m_000001_0 - exited : java.io.IOException: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263 > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379) > at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263 > at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40) > at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376) > ... 8 more > >2023-08-03 15:00:37,896 INFO [IPC Server handler 0 on 11879] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1686040374408_0052_m_000001_0: Error: java.io.IOException: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263 > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379) > at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263 > at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40) > at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376) > ... 8 more > >2023-08-03 15:00:37,898 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1686040374408_0052_m_000001_0: Error: java.io.IOException: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263 > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379) > at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263 > at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40) > at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376) > ... 8 more > >2023-08-03 15:00:37,899 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1686040374408_0052_m_000001_0 TaskAttempt Transitioned from RUNNING to FAIL_FINISHING_CONTAINER >2023-08-03 15:00:37,909 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1686040374408_0052_m_000001_1 TaskAttempt Transitioned from NEW to UNASSIGNED >2023-08-03 15:00:37,909 INFO [Thread-54] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on node dc3-hw-gz5-dtmpl-dev-olap03 >2023-08-03 15:00:37,910 INFO [Thread-54] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added attempt_1686040374408_0052_m_000001_1 to list of failed maps >2023-08-03 15:00:37,997 INFO [IPC Server handler 4 on 11879] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1686040374408_0052_m_000000_0 is : 0.0 >2023-08-03 15:00:37,999 FATAL [IPC Server handler 5 on 11879] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1686040374408_0052_m_000000_0 - exited : java.io.IOException: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263 > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379) > at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263 > at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40) > at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376) > ... 8 more > >2023-08-03 15:00:37,999 INFO [IPC Server handler 5 on 11879] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1686040374408_0052_m_000000_0: Error: java.io.IOException: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263 > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379) > at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263 > at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40) > at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376) > ... 8 more > >2023-08-03 15:00:38,001 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1686040374408_0052_m_000000_0: Error: java.io.IOException: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263 > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379) > at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263 > at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40) > at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376) > ... 8 more > ``` 4.I tried turning off Hive vectorization again > ``` >set hive.vectorized.execution.reduce.enabled=false; >set hive.vectorized.execution.enabled=false; >set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat; >select count(1) from table_rt; > ``` But it didn't work. 5.Later, I tried sending a few more data records to trigger Compaction. I executed this SQL statement again. > ``` >set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat; >select count(1) from table_rt; > ``` Query the count correctly for rt table. Flink sql is > ``` >CREATE TABLE kafka_passenger_passenger ( > id INT, > user_name STRING, > area_code STRING, > age INT, > create_time STRING, > update_time STRING, > cdc_db_database_name STRING METADATA FROM 'value.source.database' VIRTUAL, > cdc_db_table_name STRING METADATA FROM 'value.source.table' VIRTUAL, > cdc_db_ts_ms TIMESTAMP(3) METADATA FROM 'value.source.timestamp' VIRTUAL, > kafka_offset BIGINT METADATA FROM 'offset' VIRTUAL, > kafka_topic STRING METADATA FROM 'topic' VIRTUAL, > kafka_partition INT METADATA FROM 'partition' VIRTUAL, > PRIMARY KEY(id) NOT ENFORCED >) WITH ( > 'connector' = 'kafka', > 'topic' = 'test_hudi', > 'properties.bootstrap.servers' = '127.0.0.1:9092', > 'properties.group.id' = 'FlinkHudiMORTest01_group', > 'scan.startup.mode' = 'earliest-offset', > 'debezium-json.schema-include' = 'true', > 'format' = 'debezium-json' >); >CREATE TABLE ods_passenger_passenger ( > id INT, > user_name STRING, > area_code STRING, > age INT, > create_time STRING, > update_time STRING, > cdc_partition_date STRING, > cdc_db_database_name STRING, > cdc_db_table_name STRING, > cdc_db_ts_ms TIMESTAMP(3), > kafka_offset BIGINT, > kafka_topic STRING, > kafka_partition INT, > PRIMARY KEY(id) NOT ENFORCED > ) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs://127.0.0.1:9100/hudi/ods/ods_passenger_passenger_mor/', > 'table.type' = 'MERGE_ON_READ', > 'hive_sync.enabled' = 'true', > 'hive_sync.mode' = 'hms', > 'hive_sync.metastore.uris' = 'thrift://127.0.0.1:9083', > 'hoodie.index.type' = 'BUCKET', > 'hoodie.compaction.trigger.strategy' = 'NUM_OR_TIME', > 'hoodie.compaction.delta_seconds' = '180', > 'hoodie.compaction.delta_commits' = '5', > 'hive_sync.table' = 'ods_passenger_passenger_mor', > 'hive_sync.db' = 'ods', > 'metadata.enabled' = 'true', >) > ``` **Expected behavior** Query the count correctly for rt table; **Environment Description** * Hudi version :0.13.1 * Flink version :1.14.4 * Hive version :2.3.5 * Hadoop version :2.8.5 * Storage (HDFS/S3/GCS..) :HDFS * Running on Docker? (yes/no) :no **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
