[
https://issues.apache.org/jira/browse/HBASE-13011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319592#comment-14319592
]
zhangduo commented on HBASE-13011:
----------------------------------
TestHMasterRPCException is flakey. I think the problem is the test itself.
The test try to connect to HMaster several times until it getting
ServerNotRunningYetException.
But we do not set any guard to prevent HMaster transfering its state to
running, so it could happen that when we successfully connect to HMaster, it is
already under the running state(especially on heavy loaded machines)...
And I can not view the log file of other failed tests, maybe something wrong
with jenkins? I ran these tests locally, they all passed.
> TestLoadIncrementalHFiles is flakey when using AsyncRpcClient as client
> implementation
> --------------------------------------------------------------------------------------
>
> Key: HBASE-13011
> URL: https://issues.apache.org/jira/browse/HBASE-13011
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0, 1.1.0
> Reporter: zhangduo
> Assignee: zhangduo
> Fix For: 2.0.0, 1.1.0
>
> Attachments: HBASE-13011.patch, HBASE-13011_1.patch,
> HBASE-13011_2.patch
>
>
> The test sometimes failed because of timeout.
> https://builds.apache.org/job/PreCommit-HBASE-Build/12769/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/
> Dig into it, I found this
> {noformat}
> 2015-02-11 02:01:47,304 INFO [LoadIncrementalHFiles-1]
> mapreduce.LoadIncrementalHFiles(563): Trying to load
> hfile=hdfs://localhost:59736/user/jenkins/test-data/d964a632-8db5-4f3a-966f-89746947294b/testSimpleLoad/myfam/hfile_1
> first=ddd last=ooo
> 2015-02-11 02:01:47,308 INFO [LoadIncrementalHFiles-0]
> mapreduce.LoadIncrementalHFiles(563): Trying to load
> hfile=hdfs://localhost:59736/user/jenkins/test-data/d964a632-8db5-4f3a-966f-89746947294b/testSimpleLoad/myfam/hfile_0
> first=aaaa last=cccc
> 2015-02-11 02:01:47,317 DEBUG [LoadIncrementalHFiles-2]
> mapreduce.LoadIncrementalHFiles$3(664): Going to connect to server
> region=bulkNS:mytable_testSimpleLoad,,1423620104753.fdcbd21e43683c753bae40f1d890daa6.,
> hostname=asf910.gq1.ygridcore.net,41003,1423620099272, seqNum=2 for row
> with hfile group
> [{[B@7173d25a,hdfs://localhost:59736/user/jenkins/test-data/d964a632-8db5-4f3a-966f-89746947294b/testSimpleLoad/myfam/hfile_0}]
> 2015-02-11 02:01:47,320 DEBUG [LoadIncrementalHFiles-3]
> mapreduce.LoadIncrementalHFiles$3(664): Going to connect to server
> region=bulkNS:mytable_testSimpleLoad,ddd,1423620104753.ec757ff718ce8ab99f4f6bcca389d67f.,
> hostname=asf910.gq1.ygridcore.net,41003,1423620099272, seqNum=2 for row ddd
> with hfile group
> [{[B@7173d25a,hdfs://localhost:59736/user/jenkins/test-data/d964a632-8db5-4f3a-966f-89746947294b/testSimpleLoad/myfam/hfile_1}]
> {noformat}
> There are two files to commit, but after this
> {noformat}
> 2015-02-11 02:01:47,327 INFO
> [B.defaultRpcServer.handler=3,queue=0,port=41003] regionserver.HStore(690):
> Validating hfile at
> hdfs://localhost:59736/user/jenkins/test-data/d964a632-8db5-4f3a-966f-89746947294b/testSimpleLoad/myfam/hfile_0
> for inclusion in store myfam region
> bulkNS:mytable_testSimpleLoad,,1423620104753.fdcbd21e43683c753bae40f1d890daa6.
> 2015-02-11 02:01:47,330 INFO
> [B.defaultRpcServer.handler=1,queue=0,port=41003] regionserver.HStore(690):
> Validating hfile at
> hdfs://localhost:59736/user/jenkins/test-data/d964a632-8db5-4f3a-966f-89746947294b/testSimpleLoad/myfam/hfile_1
> for inclusion in store myfam region
> bulkNS:mytable_testSimpleLoad,ddd,1423620104753.ec757ff718ce8ab99f4f6bcca389d67f.
> 2015-02-11 02:01:47,330 INFO
> [B.defaultRpcServer.handler=4,queue=0,port=41003] regionserver.HStore(690):
> Validating hfile at
> hdfs://localhost:59736/user/jenkins/test-data/d964a632-8db5-4f3a-966f-89746947294b/testSimpleLoad/myfam/hfile_1
> for inclusion in store myfam region
> bulkNS:mytable_testSimpleLoad,ddd,1423620104753.ec757ff718ce8ab99f4f6bcca389d67f.
> {noformat}
> We can see that hfile_1 have been committed twice and the second call will
> fail and cause the test timeout.
> I'm not sure if it is a issue of AsyncRpcClient. But if I use RpcClientImpl,
> the test always passes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)