labixiaoxiaopang opened a new issue, #9353:
URL: https://github.com/apache/hudi/issues/9353

   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   hello, I exec 
   "set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat;
   select count(1) from table_rt;" 
   to query table rows error.
   
   
   **Steps to reproduce the behavior:**
   
   1.I uses Flink to write to Hudi,but Hudi is not compact,The files on HDFS 
are log files,like
   
![image](https://github.com/apache/hudi/assets/24770230/558a1a48-0787-49cc-9f54-fbda5f4ce6e6)
   2.I exec  this sql
   > ```
   >set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat;
   >select count(1) from table_rt;
   > ```
   but it threw an exception.
   
   > ```
   >2023-08-03 15:00:37,896 FATAL [IPC Server handler 0 on 11879] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
attempt_1686040374408_0052_m_000001_0 - exited : java.io.IOException: 
java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on 
RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379)
   >    at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
   >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
   >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   >    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
   >    at java.security.AccessController.doPrivileged(Native Method)
   >    at javax.security.auth.Subject.doAs(Subject.java:422)
   >    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   >    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
   >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader 
can only work on RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263
   >    at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
   >    at 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376)
   >    ... 8 more
   >
   >2023-08-03 15:00:37,896 INFO [IPC Server handler 0 on 11879] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from 
attempt_1686040374408_0052_m_000001_0: Error: java.io.IOException: 
java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on 
RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379)
   >    at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
   >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
   >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   >    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
   >    at java.security.AccessController.doPrivileged(Native Method)
   >    at javax.security.auth.Subject.doAs(Subject.java:422)
   >    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   >    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
   >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader 
can only work on RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263
   >    at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
   >    at 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376)
   >    ... 8 more
   >
   >2023-08-03 15:00:37,898 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1686040374408_0052_m_000001_0: Error: java.io.IOException: 
java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on 
RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379)
   >    at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
   >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
   >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   >    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
   >    at java.security.AccessController.doPrivileged(Native Method)
   >    at javax.security.auth.Subject.doAs(Subject.java:422)
   >    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   >    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
   >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader 
can only work on RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:263+263
   >    at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
   >    at 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376)
   >    ... 8 more
   >
   >2023-08-03 15:00:37,899 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1686040374408_0052_m_000001_0 TaskAttempt Transitioned from RUNNING to 
FAIL_FINISHING_CONTAINER
   >2023-08-03 15:00:37,909 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1686040374408_0052_m_000001_1 TaskAttempt Transitioned from NEW to 
UNASSIGNED
   >2023-08-03 15:00:37,909 INFO [Thread-54] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on node 
dc3-hw-gz5-dtmpl-dev-olap03
   >2023-08-03 15:00:37,910 INFO [Thread-54] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added 
attempt_1686040374408_0052_m_000001_1 to list of failed maps
   >2023-08-03 15:00:37,997 INFO [IPC Server handler 4 on 11879] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1686040374408_0052_m_000000_0 is : 0.0
   >2023-08-03 15:00:37,999 FATAL [IPC Server handler 5 on 11879] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
attempt_1686040374408_0052_m_000000_0 - exited : java.io.IOException: 
java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on 
RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379)
   >    at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
   >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
   >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   >    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
   >    at java.security.AccessController.doPrivileged(Native Method)
   >    at javax.security.auth.Subject.doAs(Subject.java:422)
   >    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   >    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
   >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader 
can only work on RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263
   >    at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
   >    at 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376)
   >    ... 8 more
   >
   >2023-08-03 15:00:37,999 INFO [IPC Server handler 5 on 11879] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from 
attempt_1686040374408_0052_m_000000_0: Error: java.io.IOException: 
java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on 
RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379)
   >    at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
   >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
   >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   >    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
   >    at java.security.AccessController.doPrivileged(Native Method)
   >    at javax.security.auth.Subject.doAs(Subject.java:422)
   >    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   >    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
   >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader 
can only work on RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263
   >    at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
   >    at 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376)
   >    ... 8 more
   >
   >2023-08-03 15:00:38,001 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1686040374408_0052_m_000000_0: Error: java.io.IOException: 
java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on 
RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   >    at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379)
   >    at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
   >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
   >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   >    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
   >    at java.security.AccessController.doPrivileged(Native Method)
   >    at javax.security.auth.Subject.doAs(Subject.java:422)
   >    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   >    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
   >Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader 
can only work on RealtimeSplit and not with 
hdfs://dc3-hw-gz5-dtmpl-dev-olap03:9100/tmp/hive/carapp/3c70862d-1dcf-44a3-86eb-bac5ba08478d/hive_2023-08-03_15-00-25_607_127251711347866397-9/-mr-10004/f099b7a4-0c79-4643-9382-57b33f8ed545/emptyFile:0+263
   >    at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
   >    at 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:68)
   >    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376)
   >    ... 8 more
   > ```
   
   4.I tried turning off Hive vectorization again
   > ```
   >set hive.vectorized.execution.reduce.enabled=false;
   >set hive.vectorized.execution.enabled=false;
   >set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat;
   >select count(1) from table_rt;
   > ```
   But it didn't work.
   
   5.Later, I tried sending a few more data records to trigger Compaction.
   I executed this SQL statement again.
   > ```
   >set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat;
   >select count(1) from table_rt;
   > ```
   Query the count correctly for rt table.
   
   Flink sql is
   > ```
   >CREATE TABLE kafka_passenger_passenger (
   >  id INT,
   >  user_name STRING,
   >  area_code STRING,
   >  age INT,
   >  create_time STRING,
   >  update_time STRING,
   >  cdc_db_database_name STRING METADATA FROM 'value.source.database' VIRTUAL,
   >  cdc_db_table_name STRING METADATA FROM 'value.source.table' VIRTUAL,
   >  cdc_db_ts_ms TIMESTAMP(3) METADATA FROM 'value.source.timestamp' VIRTUAL,
   >  kafka_offset BIGINT METADATA FROM 'offset' VIRTUAL,
   >  kafka_topic STRING METADATA  FROM 'topic' VIRTUAL,
   >  kafka_partition INT METADATA FROM 'partition' VIRTUAL,
   >  PRIMARY KEY(id) NOT ENFORCED
   >) WITH (
   >  'connector' = 'kafka',
   >  'topic' = 'test_hudi',
   >  'properties.bootstrap.servers' = '127.0.0.1:9092',
   >  'properties.group.id' = 'FlinkHudiMORTest01_group',
   >  'scan.startup.mode' = 'earliest-offset',
   >  'debezium-json.schema-include' = 'true',
   >  'format' = 'debezium-json'
   >);
   >CREATE TABLE ods_passenger_passenger (
   >  id INT,
   >  user_name STRING,
   >  area_code STRING,
   >  age INT,
   >  create_time STRING,
   >  update_time STRING,
   >  cdc_partition_date STRING,
   >  cdc_db_database_name STRING,
   >  cdc_db_table_name STRING,
   >  cdc_db_ts_ms TIMESTAMP(3),
   >  kafka_offset BIGINT,
   >  kafka_topic STRING,
   >  kafka_partition INT,
   >  PRIMARY KEY(id) NOT ENFORCED
   > ) 
   > WITH (
   >  'connector' = 'hudi',
   >  'path' = 'hdfs://127.0.0.1:9100/hudi/ods/ods_passenger_passenger_mor/',
   >  'table.type' = 'MERGE_ON_READ',
   >  'hive_sync.enabled' = 'true',
   >  'hive_sync.mode' = 'hms',
   >  'hive_sync.metastore.uris' = 'thrift://127.0.0.1:9083',
   >  'hoodie.index.type' = 'BUCKET',
   >  'hoodie.compaction.trigger.strategy' = 'NUM_OR_TIME', 
   >  'hoodie.compaction.delta_seconds' = '180',
   >  'hoodie.compaction.delta_commits' = '5',
   >  'hive_sync.table' = 'ods_passenger_passenger_mor',
   >  'hive_sync.db' = 'ods',
   >  'metadata.enabled' = 'true',
   >)
   > ```
   **Expected behavior**
   
   Query the count correctly for rt table;
   
   **Environment Description**
   
   * Hudi version :0.13.1
   
   * Flink version :1.14.4
   
   * Hive version :2.3.5
   
   * Hadoop version :2.8.5
   
   * Storage (HDFS/S3/GCS..) :HDFS
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to