[
https://issues.apache.org/jira/browse/HBASE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922541#comment-13922541
]
Biju Nair commented on HBASE-10643:
-----------------------------------
1) Did not encounter this issue when testing recently with bucket cache using
the direct buffer ioengine and MaxDirectMemorySize of 24g and 32g?
- Yes
2) Can you share your JVM version particulars?
-java -version
java version "1.7.0_45"
OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
3) command line options you might have put in hbase-env.sh
- This is from 24 GB test - hbase-env.sh
- export HBASE_REGIONSERVER_OPTS="-Xmn512m
-XX:CMSInitiatingOccupancyFraction=70 -Xms5000m -Xmx5000m
-XX:MaxDirectMemorySize=22000m"
4) any of the HBase site file settings pertaining to zookeeper?
- No
5) Running bucketCache in file mode doesn't have this issue.
> Failure in RS when using large size bucketcache
> -----------------------------------------------
>
> Key: HBASE-10643
> URL: https://issues.apache.org/jira/browse/HBASE-10643
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.98.0, 0.96.0
> Reporter: Biju Nair
> Labels: bucketCache, regionserver
>
> When RS is brought up with XX:MaxDirectMemorySize of 22GB or higher, RS fails
> after a successful start. From the RS logs it looks like the bucketCache
> memory allocation is taking more time makes the RS considered dead by ZK. One
> option to fix the problem would be to allocate the bucketCache before
> registering with ZK.
> 2014-02-28 18:54:42,967 WARN [regionserver60020.compactionChecker]
> util.Sleeper: We slept 33496ms instead of 10000ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2014-02-28 18:54:42,967 WARN [regionserver60020.periodicFlusher]
> util.Sleeper: We slept 33496ms instead of 10000ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2014-02-28 18:54:42,967 WARN [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately 23988ms
> GC pool 'ParNew' had collection(s): count=1 time=24432ms
> 2014-02-28 18:54:43,006 FATAL [regionserver60020] regionserver.HRegionServer:
> ABORTING region server bbg-master2.bbg-test.hdp,60020,1393628951236:
> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> currently processing bbg-master2.bbg-test.hdp,60020,1393628951236 as dead
> server
> at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:341)
> at
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)
--
This message was sent by Atlassian JIRA
(v6.2#6252)