mintao created KUDU-3169:
----------------------------

             Summary: kudu java client throws scanner expired error while 
processing large scan on  High-load cluster
                 Key: KUDU-3169
                 URL: https://issues.apache.org/jira/browse/KUDU-3169
             Project: Kudu
          Issue Type: Bug
          Components: client, java
    Affects Versions: 1.8.0
            Reporter: mintao


user submits a spark task to scan  a kudu table with large amount records, 
after just few minutes the job failed after 4 attempts, each attempt failed 
with error :
{code:java}
 org.apache.kudu.client.NonRecoverableException: Scanner 
4e34e6f821be42b889022ec681e235cc not found (it may have expired) 
org.apache.kudu.client.NonRecoverableException: Scanner 
4e34e6f821be42b889022ec681e235cc not found (it may have expired) at 
org.apache.kudu.client.KuduException.transformException(KuduException.java:110) 
at 
org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:402) 
at org.apache.kudu.client.KuduScanner.nextRows(KuduScanner.java:57) at 
org.apache.kudu.spark.kudu.RowIterator.hasNext(KuduRDD.scala:153) at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source) at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) 
at org.apache.spark.scheduler.Task.run(Task.scala:109) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) Suppressed: 
org.apache.kudu.client.KuduException$OriginalException: Original asynchronous 
stack trace at 
org.apache.kudu.client.RpcProxy.dispatchTSError(RpcProxy.java:341) at 
org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:263) at 
org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:59) at 
org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:152) at 
org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:148) at 
org.apache.kudu.client.Connection.messageReceived(Connection.java:391) at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at org.apache.kudu.client.Connection.handleUpstream(Connection.java:243) at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at 
org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
 at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
 at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at 
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 at 
org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 ... 3 more{code}
 Each task ran just for about 19 seconds then throws scanner not found error  
while tserver uses a default scanner_ttl_ms (60s).In tserver log, We found the 
scanner that  memtioned in client log expired after spark job failed, and 
another tserver receives the scan request with that scannerId specifies.

 it seems AsyncKuduScanner in kudu java client will choose a random server when 
retrying scanNextRows, even though the AsyncKuduScanner already has a scannerId.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to