mintao created KUDU-3169:
----------------------------
Summary: kudu java client throws scanner expired error while
processing large scan on High-load cluster
Key: KUDU-3169
URL: https://issues.apache.org/jira/browse/KUDU-3169
Project: Kudu
Issue Type: Bug
Components: client, java
Affects Versions: 1.8.0
Reporter: mintao
user submits a spark task to scan a kudu table with large amount records,
after just few minutes the job failed after 4 attempts, each attempt failed
with error :
{code:java}
org.apache.kudu.client.NonRecoverableException: Scanner
4e34e6f821be42b889022ec681e235cc not found (it may have expired)
org.apache.kudu.client.NonRecoverableException: Scanner
4e34e6f821be42b889022ec681e235cc not found (it may have expired) at
org.apache.kudu.client.KuduException.transformException(KuduException.java:110)
at
org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:402)
at org.apache.kudu.client.KuduScanner.nextRows(KuduScanner.java:57) at
org.apache.kudu.spark.kudu.RowIterator.hasNext(KuduRDD.scala:153) at
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source) at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Suppressed:
org.apache.kudu.client.KuduException$OriginalException: Original asynchronous
stack trace at
org.apache.kudu.client.RpcProxy.dispatchTSError(RpcProxy.java:341) at
org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:263) at
org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:59) at
org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:152) at
org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:148) at
org.apache.kudu.client.Connection.messageReceived(Connection.java:391) at
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.apache.kudu.client.Connection.handleUpstream(Connection.java:243) at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184)
at
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.apache.kudu.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more{code}
Each task ran just for about 19 seconds then throws scanner not found error
while tserver uses a default scanner_ttl_ms (60s).In tserver log, We found the
scanner that memtioned in client log expired after spark job failed, and
another tserver receives the scan request with that scannerId specifies.
it seems AsyncKuduScanner in kudu java client will choose a random server when
retrying scanNextRows, even though the AsyncKuduScanner already has a scannerId.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)