Hi Gurudatt, Can you share more context on the table and the query. Are you using spark sql or Hive query? the table type , etc? Also, if you can provide a small snippet to reproduce with the configs that you used, it would be useful to debug.
Thanks, Sudha On Sun, Nov 17, 2019 at 11:09 PM Gurudatt Kulkarni <[email protected]> wrote: > Hi All, > > I am facing an issue where the aggregate query fails on partitions that > have more than one parquet file. But if I run a select *, query it displays > all results properly. Here's the stack trace of the error that I am > getting. I checked the hdfs directory for the particular file and it exists > in the directory but some how hive is not able to find it after the update. > > java.io.IOException: cannot find dir = > > hdfs://hadoop-host:8020/tmp/hoodie/tables/tbl_test/2019/11/09/6f864d6d-40a6-4eb7-9ee8-6133a16aa9e5-0_59-22-256_20191115185447.parquet > in pathToPartitionInfo: [hdfs:/tmp/hoodie/tables/tbl_test/2019/11/09] > at > org.apache.hadoop.hive.ql.io > .HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:368) > at > org.apache.hadoop.hive.ql.io > .HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:330) > at > org.apache.hadoop.hive.ql.io > .CombineHiveInputFormat$CombineHiveInputSplit.<init>(CombineHiveInputFormat.java:166) > at > org.apache.hadoop.hive.ql.io > .CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:460) > at > org.apache.hadoop.hive.ql.io > .CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:547) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91) > at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) > at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:226) > at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:224) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.dependencies(RDD.scala:224) > at > org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:386) > at > > org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:398) > at > > org.apache.spark.scheduler.DAGScheduler.getParentStagesAndId(DAGScheduler.scala:299) > at > > org.apache.spark.scheduler.DAGScheduler.newResultStage(DAGScheduler.scala:334) > at > > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:837) > at > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1635) > at > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1627) > at > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1616) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > 19/11/18 12:19:16 ERROR impl.RemoteSparkJobStatus: Failed to run job 2 > java.util.concurrent.ExecutionException: Exception thrown by job > at > org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311) > at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316) > at > > org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus$GetJobInfoJob.call(RemoteSparkJobStatus.java:195) > at > > org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus$GetJobInfoJob.call(RemoteSparkJobStatus.java:171) > at > > org.apache.hive.spark.client.RemoteDriver$DriverProtocol.handle(RemoteDriver.java:327) > at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > > org.apache.hive.spark.client.rpc.RpcDispatcher.handleCall(RpcDispatcher.java:120) > at > > org.apache.hive.spark.client.rpc.RpcDispatcher.channelRead0(RpcDispatcher.java:79) > at > > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > at > > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244) > at > > io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103) > at > > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > at > > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > at > > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:748) > > Regards, > Gurudatt >
