[I] Hive "Count" queries don't work with bootstrap tables w/Hive3 [hudi]

via GitHub Sat, 29 Nov 2025 22:17:59 -0800


hudi-bot opened a new issue, #15676:
URL: https://github.com/apache/hudi/issues/15676


   Hive "count" queries fail on hudi bootstrap tables when they are using 
Hive3. This has been tested on all EMR-6.x releases and fails with the same 
error. The same query works with Hive2.
   
   For example with the query:
   {code:java}
   SELECT COUNT(*) FROM HUDI_BOOTSTRAP_TABLE;{code}
   Gives the following error:
   {code:java}
   TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
attempt_1672881902089_0008_1_00_000000_1:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: java.lang.RuntimeException: 
java.io.IOException: cannot find dir = 
   
[s3://my-bucket/test-data/hudi/parquet-source-tables/hive_style_partitioned_tb/event_type=two/part-00000-98fb0380-374c-40f5-8a57-89d95270a2c3-c000.parquet]
    in pathToPartitionInfo: [
   [s3://my-bucket/hudi-table/test_bootstrap_hive_partitionedrt/event_type=one]
   , 
   [s3://my-bucket/hudi-table/test_bootstrap_hive_partitionedrt/event_type=two]
   ]
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
   at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
   at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
   at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
   at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
   at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.RuntimeException: java.io.IOException: 
java.lang.RuntimeException: java.io.IOException: cannot find dir = 
[s3://my-bucket/test-data/hudi/parquet-source-tables/hive_style_partitioned_tb/event_type=two/part-00000-98fb0380-374c-40f5-8a57-89d95270a2c3-c000.parquet]
 in pathToPartitionInfo: 
[[s3://my-bucket/hudi-table/test_bootstrap_hive_partitionedrt/event_type=one], 
[s3://my-bucket/hudi-table/test_bootstrap_hive_partitionedrt/event_type=two]]
   at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
   at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:145)
   at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
   at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:157)
   at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:83)
   at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
   at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
   at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
   at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:525)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:171)
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
   ... 14 more
   Caused by: java.io.IOException: java.lang.RuntimeException: 
java.io.IOException: cannot find dir = 
[s3://my-bucket/test-data/hudi/parquet-source-tables/hive_style_partitioned_tb/event_type=two/part-00000-98fb0380-374c-40f5-8a57-89d95270a2c3-c000.parquet]
 in pathToPartitionInfo: 
[[s3://my-bucket/hudi-table/test_bootstrap_hive_partitionedrt/event_type=one], 
[s3://my-bucket/hudi-table/test_bootstrap_hive_partitionedrt/event_type=two]]
   at 
[org.apache.hadoop.hive.io|http://org.apache.hadoop.hive.io/].HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   at 
[org.apache.hadoop.hive.io|http://org.apache.hadoop.hive.io/].HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   at 
[org.apache.hadoop.hive.ql.io|http://org.apache.hadoop.hive.ql.io/].HiveInputFormat.getRecordReader(HiveInputFormat.java:421)
   at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
   ... 25 more
   Caused by: java.lang.RuntimeException: java.io.IOException: cannot find dir 
= 
[s3://my-bucket/test-data/hudi/parquet-source-tables/hive_style_partitioned_tb/event_type=two/part-00000-98fb0380-374c-40f5-8a57-89d95270a2c3-c000.parquet]
 in pathToPartitionInfo: 
[[s3://my-bucket/hudi-table/test_bootstrap_hive_partitionedrt/event_type=one], 
[s3://my-bucket/hudi-table/test_bootstrap_hive_partitionedrt/event_type=two]]
   at 
[org.apache.hadoop.hive.ql.io|http://org.apache.hadoop.hive.ql.io/].parquet.vector.VectorizedParquetRecordReader.<init>(VectorizedParquetRecordReader.java:156)
   at 
[org.apache.hadoop.hive.ql.io|http://org.apache.hadoop.hive.ql.io/].parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
   at 
[org.apache.hadoop.hive.ql.io|http://org.apache.hadoop.hive.ql.io/].parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
   at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:203)
   at 
[org.apache.hadoop.hive.ql.io|http://org.apache.hadoop.hive.ql.io/].HiveInputFormat.getRecordReader(HiveInputFormat.java:418)
   ... 26 more
   Caused by: java.io.IOException: cannot find dir = 
[s3://my-bucket/test-data/hudi/parquet-source-tables/hive_style_partitioned_tb/event_type=two/part-00000-98fb0380-374c-40f5-8a57-89d95270a2c3-c000.parquet]
 in pathToPartitionInfo: 
[[s3://my-bucket/hudi-table/test_bootstrap_hive_partitionedrt/event_type=one], 
[s3://my-bucket/hudi-table/test_bootstrap_hive_partitionedrt/event_type=two]]
   at 
[org.apache.hadoop.hive.ql.io|http://org.apache.hadoop.hive.ql.io/].HiveFileFormatUtils.getFromPathRecursively(HiveFileFormatUtils.java:402)
   at 
[org.apache.hadoop.hive.ql.io|http://org.apache.hadoop.hive.ql.io/].HiveFileFormatUtils.getFromPathRecursively(HiveFileFormatUtils.java:371)
   at 
[org.apache.hadoop.hive.ql.io|http://org.apache.hadoop.hive.ql.io/].HiveFileFormatUtils.getFromPathRecursively(HiveFileFormatUtils.java:366)
   at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.getPartitionValues(VectorizedRowBatchCtx.java:272)
   at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.getPartitionValues(VectorizedRowBatchCtx.java:263)
   at 
[org.apache.hadoop.hive.ql.io|http://org.apache.hadoop.hive.ql.io/].parquet.vector.VectorizedParquetRecordReader.initPartitionValues(VectorizedParquetRecordReader.java:164)
   at 
[org.apache.hadoop.hive.ql.io|http://org.apache.hadoop.hive.ql.io/].parquet.vector.VectorizedParquetRecordReader.<init>(VectorizedParquetRecordReader.java:153)
   ... 30 more
   {code}
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-5526
   - Type: Bug


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Hive "Count" queries don't work with bootstrap tables w/Hive3 [hudi]

Reply via email to