[ 
https://issues.apache.org/jira/browse/HUDI-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205913#comment-17205913
 ] 

Vinoth Chandar commented on HUDI-1289:
--------------------------------------

Great! Given how h base and guava are notorious for class mismatch hell,I'd 
prefer that we shade these if its doable at the cost of having to set the 
listener hard coded). 



if shading does not work, then we can go with the working combination that you 
have tested without shading. By shading, I mean relocating the package.

> Using hbase index in spark hangs in Hudi 0.6.0
> ----------------------------------------------
>
>                 Key: HUDI-1289
>                 URL: https://issues.apache.org/jira/browse/HUDI-1289
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ryan Pifer
>            Priority: Major
>             Fix For: 0.6.1
>
>
> In Hudi 0.6.0 I can see that there was a change to shade the hbase 
> dependencies in hudi-spark-bundle jar. When using HBASE index with only 
> hudi-spark-bundle jar specified in spark session there are several issues:
>  
>  # Dependencies are not being correctly resolved:
> Hbase default status listener class value is defined by the class name before 
> relocation
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: class 
> org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not 
> org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$Listener 
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2427) at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:656)
>  ... 39 moreCaused by: java.lang.RuntimeException: class 
> org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not 
> org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$Listener 
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2421) ... 
> 40 more{code}
>  
> [https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java#L72-L73]
>  
> This can be fixed by overriding the status listener class in the hbase 
> configuration used in hudi 
> {code:java}
> hbaseConfig.set("hbase.status.listener.class", 
> "org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener"){code}
> [https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java#L134]
>  
> 2. After modifying the above, executors hang when trying to connect to hbase 
> and fail after about 45 minutes
> {code:java}
> Caused by: 
> org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: 
> Failed after attempts=36, exceptions:Thu Sep 17 23:59:42 UTC 2020, null, 
> java.net.SocketTimeoutException: callTimeout=60000, callDuration=68536: row 
> 'hudiindex,12345678,99999999999999' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, 
> hostname=ip-10-81-236-56.ec2.internal,16020,1600130997457, seqNum=0
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:210)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:212)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:186)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1275)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1181)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1165)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1122)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:957)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:75)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>  ... 35 more{code}
>  
> When investigating the executor logs I was able to find the following
> {code:java}
>  
> 20/09/18 21:35:48 TRACE TransportClient: Sending RPC to 
> ip-10-31-253-39.ec2.internal/10.31.253.39:46825
> 20/09/18 21:35:48 TRACE TransportClient: Sending request RPC 
> 7802669247197305083 to ip-10-31-253-39.ec2.internal/10.31.253.39:46825 took 0 
> ms
> 20/09/18 21:35:48 TRACE MessageDecoder: Received message RpcResponse: 
> RpcResponse{requestId=7802669247197305083, 
> body=NettyManagedBuffer{buf=SimpleLeakAwareByteBuf(PooledUnsafeDirectByteBuf(ridx:
>  21, widx: 102, cap: 128))}}
> 20/09/18 21:35:53 TRACE ZooKeeperRegistry: Looking up meta region location in 
> ZK, 
> connection=org.apache.hudi.org.apache.hadoop.hbase.client.ZooKeeperRegistry@268d3bae
> 20/09/18 21:35:53 TRACE ZKUtil: hconnection-0x4f596c31-0x10000036821007a, 
> quorum=ip-10-31-253-39.ec2.internal:2181, baseZNode=/hbase Retrieved 51 
> byte(s) of data from znode /hbase/meta-region-server; 
> data=PBUF\x0A)\x0A\x1Dip-10-16-254...
> 20/09/18 21:35:53 TRACE ZooKeeperRegistry: Looked up meta region location, 
> connection=org.apache.hudi.org.apache.hadoop.hbase.client.ZooKeeperRegistry@268d3bae;
>  servers = ip-10-16-254-233.ec2.internal,16020,1600298383776 
> 20/09/18 21:35:53 TRACE MetaCache: Merged cached locations: 
> [region=hbase:meta,,1.1588230740, 
> hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0]
> 20/09/18 21:35:53 DEBUG RpcClientImpl: Use SIMPLE authentication for service 
> ClientService, sasl=false
> 20/09/18 21:35:53 DEBUG RpcClientImpl: Connecting to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: starting, 
> connections 1
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: marking at 
> should close, reason: null
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: closing ipc 
> connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: ipc connection 
> to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 closed
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: stopped, 
> connections 0
> 20/09/18 21:35:53 INFO RpcRetryingCaller: MESSAGE: Call to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 failed on local exception: 
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
>  Connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 is closing. 
> Call id=418, waitTime=2
> 20/09/18 21:35:53 INFO RpcRetryingCaller: 
> STACKTRACE[Ljava.lang.StackTraceElement;@20efcd07
> 20/09/18 21:35:53 INFO RpcRetryingCaller: CAUSE
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
>  Connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 is closing. 
> Call id=418, waitTime=2
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1089)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:865)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:582)
> 20/09/18 21:35:53 INFO RpcRetryingCaller: Call exception, tries=10, 
> retries=35, started=38363 ms ago, cancelled=false, msg=row 
> 'huditest,12345678,99999999999999' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, 
> hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0
> 20/09/18 21:35:53 TRACE MetaCache: Removed region=hbase:meta,,1.1588230740, 
> hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0 from 
> cache
>  
>  
> {code}
>  
> Even after adding the hbase jars to the session it will continue to hang. I 
> was able to resolve the hanging issue by building the hudi spark bundle jar 
> without shading the hbase related dependencies and adding them expliciting 
> when launching my spark shell so it seems like a problem with relocation.
>  
> Example of able to use hbase index successfully:
> {code:java}
> spark-shell --jars 
> /usr/lib/hudi/cli/lib/hbase-client-1.2.3.jar,/usr/lib/hudi/cli/lib/hbase-common-1.2.3.jar,usr/lib/hudi/cli/lib/hbase-protocol-1.2.3.jar,/usr/lib/hudi/cli/lib/htrace-core-3.1.0-incubating.jar,/usr/lib/hudi/cli/lib/metrics-core-2.2.0.jar,hudi-spark-bundle_2.11-0.6.0-amzn-0.jar,/usr/lib/spark/external/lib/spark-avro.jar
>  --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
> "spark.sql.hive.convertMetastoreParquet=false"
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to