[
https://issues.apache.org/jira/browse/HBASE-22769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047757#comment-17047757
]
Jan Teichmann commented on HBASE-22769:
---------------------------------------
Same problem. I can load and show data in a {{hbaseDF}} as created in the
comment above. When I try to filter that DataFrame I get
{code:java}
org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoN
otRetryIOException: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.spark.SparkSQLPushDownFilter
at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1679)
at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:1163)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2682)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3013)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.spark.SparkSQLPushDownFilter
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at
org.apache.hadoop.hbase.util.DynamicClassLoader.tryRefreshClass(DynamicClassLoader.java:173)
at
org.apache.hadoop.hbase.util.DynamicClassLoader.loadClass(DynamicClassLoader.java:140)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1670)
... 8 more
{code}
This is on AWS emr-5.27.0 with Spark 2.4.4, HBase 1.4.10, Scala 2.11.12
And spark-shell with the following packages
{code}
spark-shell --packages
org.apache.hbase.connectors.spark:hbase-spark:1.0.0,org.apache.hbase:hbase-client:2.1.0,org.apache.hbase:hbase-common:2.1.0,org.apache.hbase:hbase-server:2.1.0,org.apache.hbase:hbase:2.1.0
{code}
> Runtime Error on join (with filter) when using hbase-spark connector
> --------------------------------------------------------------------
>
> Key: HBASE-22769
> URL: https://issues.apache.org/jira/browse/HBASE-22769
> Project: HBase
> Issue Type: Bug
> Components: hbase-connectors
> Affects Versions: connector-1.0.0
> Environment: Built using maven scala plugin on intellij IDEA with
> Maven 3.3.9. Ran on Azure HDInsight Spark cluster using Yarn.
> Spark version: 2.4.0
> Scala version: 2.11.12
> hbase-spark version: 1.0.0
> Reporter: Noah Banholzer
> Priority: Blocker
>
> I am attempting to do a left outer join (though any join with a push down
> filter causes this issue) between a Spark Structured Streaming DataFrame and
> a DataFrame read from HBase. I get the following stack trace when running a
> simple spark app that reads from a streaming source and attempts to left
> outer join with a dataframe read from HBase:
> {{19/07/30 18:30:25 INFO DAGScheduler: ShuffleMapStage 1 (start at
> SparkAppTest.scala:88) failed in 3.575 s due to Job aborted due to stage
> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task
> 0.3 in stage 1.0 (TID 10,
> wn5-edpspa.hnyo2upsdeau1bffc34wwrkgwc.ex.internal.cloudapp.net, executor 2):
> org.apache.hadoop.hbase.DoNotRetryIOException:
> org.apache.hadoop.hbase.DoNotRetryIOException:
> java.lang.reflect.InvocationTargetException at
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1609)
> at
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:1154)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2967)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3301)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42002)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: java.lang.reflect.InvocationTargetException at
> sun.reflect.GeneratedMethodAccessor15461.invoke(Unknown Source) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1605)
> }}
> {{... 8 more }}
> {{Caused by: java.lang.NoClassDefFoundError:
> org/apache/hadoop/hbase/spark/datasources/JavaBytesEncoder$ at
> org.apache.hadoop.hbase.spark.datasources.JavaBytesEncoder.create(JavaBytesEncoder.scala)
> at
> org.apache.hadoop.hbase.spark.SparkSQLPushDownFilter.parseFrom(SparkSQLPushDownFilter.java:196)
> }}
> {{... 12 more }}
> {{at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100)
> at
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90)
> at
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:359)
> at
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:347)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:344)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:242)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:58)
> at
> org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:387)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:361)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
> at
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)}}
>
> It appears to be attempting to reference a file called
> "JavaBytesEncoder$.class" resulting in a NoClassDefFoundError. Interestingly,
> when I unzipped the jar I found that both "JavaBytesEncoder.class" and
> "JavaBytesEncoder$.class" exist, but the latter is simply an empty file. This
> might just be a case of me misunderstanding how Java links classes upon build
> however.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)