[
https://issues.apache.org/jira/browse/HUDI-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205173#comment-17205173
]
Vinoth Chandar commented on HUDI-1289:
--------------------------------------
[~vbalaji] do you remember why we had to shade hbase? was it proactively?
> Using hbase index in spark hangs in Hudi 0.6.0
> ----------------------------------------------
>
> Key: HUDI-1289
> URL: https://issues.apache.org/jira/browse/HUDI-1289
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ryan Pifer
> Priority: Major
> Fix For: 0.6.1
>
>
> In Hudi 0.6.0 I can see that there was a change to shade the hbase
> dependencies in hudi-spark-bundle jar. When using HBASE index with only
> hudi-spark-bundle jar specified in spark session there are several issues:
>
> # Dependencies are not being correctly resolved:
> Hbase default status listener class value is defined by the class name before
> relocation
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: class
> org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not
> org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$Listener
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2427) at
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:656)
> ... 39 moreCaused by: java.lang.RuntimeException: class
> org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not
> org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$Listener
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2421) ...
> 40 more{code}
>
> [https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java#L72-L73]
>
> This can be fixed by overriding the status listener class in the hbase
> configuration used in hudi
> {code:java}
> hbaseConfig.set("hbase.status.listener.class",
> "org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener"){code}
> [https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java#L134]
>
> 2. After modifying the above, executors hang when trying to connect to hbase
> and fail after about 45 minutes
> {code:java}
> Caused by:
> org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Failed after attempts=36, exceptions:Thu Sep 17 23:59:42 UTC 2020, null,
> java.net.SocketTimeoutException: callTimeout=60000, callDuration=68536: row
> 'hudiindex,12345678,99999999999999' on table 'hbase:meta' at
> region=hbase:meta,,1.1588230740,
> hostname=ip-10-81-236-56.ec2.internal,16020,1600130997457, seqNum=0
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:210)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:212)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:186)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1275)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1181)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1165)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1122)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:957)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:75)
> at
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
> ... 35 more{code}
>
> When investigating the executor logs I was able to find the following
> {code:java}
>
> 20/09/18 21:35:48 TRACE TransportClient: Sending RPC to
> ip-10-31-253-39.ec2.internal/10.31.253.39:46825
> 20/09/18 21:35:48 TRACE TransportClient: Sending request RPC
> 7802669247197305083 to ip-10-31-253-39.ec2.internal/10.31.253.39:46825 took 0
> ms
> 20/09/18 21:35:48 TRACE MessageDecoder: Received message RpcResponse:
> RpcResponse{requestId=7802669247197305083,
> body=NettyManagedBuffer{buf=SimpleLeakAwareByteBuf(PooledUnsafeDirectByteBuf(ridx:
> 21, widx: 102, cap: 128))}}
> 20/09/18 21:35:53 TRACE ZooKeeperRegistry: Looking up meta region location in
> ZK,
> connection=org.apache.hudi.org.apache.hadoop.hbase.client.ZooKeeperRegistry@268d3bae
> 20/09/18 21:35:53 TRACE ZKUtil: hconnection-0x4f596c31-0x10000036821007a,
> quorum=ip-10-31-253-39.ec2.internal:2181, baseZNode=/hbase Retrieved 51
> byte(s) of data from znode /hbase/meta-region-server;
> data=PBUF\x0A)\x0A\x1Dip-10-16-254...
> 20/09/18 21:35:53 TRACE ZooKeeperRegistry: Looked up meta region location,
> connection=org.apache.hudi.org.apache.hadoop.hbase.client.ZooKeeperRegistry@268d3bae;
> servers = ip-10-16-254-233.ec2.internal,16020,1600298383776
> 20/09/18 21:35:53 TRACE MetaCache: Merged cached locations:
> [region=hbase:meta,,1.1588230740,
> hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0]
> 20/09/18 21:35:53 DEBUG RpcClientImpl: Use SIMPLE authentication for service
> ClientService, sasl=false
> 20/09/18 21:35:53 DEBUG RpcClientImpl: Connecting to
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: starting,
> connections 1
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: marking at
> should close, reason: null
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: closing ipc
> connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: ipc connection
> to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 closed
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: stopped,
> connections 0
> 20/09/18 21:35:53 INFO RpcRetryingCaller: MESSAGE: Call to
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 failed on local exception:
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
> Connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 is closing.
> Call id=418, waitTime=2
> 20/09/18 21:35:53 INFO RpcRetryingCaller:
> STACKTRACE[Ljava.lang.StackTraceElement;@20efcd07
> 20/09/18 21:35:53 INFO RpcRetryingCaller: CAUSE
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
> Connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 is closing.
> Call id=418, waitTime=2
> at
> org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1089)
> at
> org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:865)
> at
> org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:582)
> 20/09/18 21:35:53 INFO RpcRetryingCaller: Call exception, tries=10,
> retries=35, started=38363 ms ago, cancelled=false, msg=row
> 'huditest,12345678,99999999999999' on table 'hbase:meta' at
> region=hbase:meta,,1.1588230740,
> hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0
> 20/09/18 21:35:53 TRACE MetaCache: Removed region=hbase:meta,,1.1588230740,
> hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0 from
> cache
>
>
> {code}
>
> Even after adding the hbase jars to the session it will continue to hang. I
> was able to resolve the hanging issue by building the hudi spark bundle jar
> without shading the hbase related dependencies and adding them expliciting
> when launching my spark shell so it seems like a problem with relocation.
>
> Example of able to use hbase index successfully:
> {code:java}
> spark-shell --jars
> /usr/lib/hudi/cli/lib/hbase-client-1.2.3.jar,/usr/lib/hudi/cli/lib/hbase-common-1.2.3.jar,usr/lib/hudi/cli/lib/hbase-protocol-1.2.3.jar,/usr/lib/hudi/cli/lib/htrace-core-3.1.0-incubating.jar,/usr/lib/hudi/cli/lib/metrics-core-2.2.0.jar,hudi-spark-bundle_2.11-0.6.0-amzn-0.jar,/usr/lib/spark/external/lib/spark-avro.jar
> --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf
> "spark.sql.hive.convertMetastoreParquet=false"
>
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)