[jira] [Commented] (HBASE-10643) Failure in RS when using large size bucketcache

Biju Nair (JIRA) Thu, 06 Mar 2014 06:09:07 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922541#comment-13922541
 ]


Biju Nair commented on HBASE-10643:
-----------------------------------

1) Did not encounter this issue when testing recently with bucket cache using 
the direct buffer ioengine and MaxDirectMemorySize of 24g and 32g?
  - Yes
2)  Can you share your JVM version particulars?
   -java -version
    java version "1.7.0_45"
    OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
    OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
3) command line options you might have put in hbase-env.sh
  - This is from 24 GB test - hbase-env.sh
  - export HBASE_REGIONSERVER_OPTS="-Xmn512m 
-XX:CMSInitiatingOccupancyFraction=70  -Xms5000m -Xmx5000m 
-XX:MaxDirectMemorySize=22000m"
4) any of the HBase site file settings pertaining to zookeeper?
  - No
5) Running bucketCache in file mode doesn't have this issue.

> Failure in RS when using large size bucketcache
> -----------------------------------------------
>
>                 Key: HBASE-10643
>                 URL: https://issues.apache.org/jira/browse/HBASE-10643
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.0, 0.96.0
>            Reporter: Biju Nair
>              Labels: bucketCache, regionserver
>
> When RS is brought up with XX:MaxDirectMemorySize of 22GB or higher, RS fails 
> after a successful start. From the RS logs it looks like the bucketCache 
> memory allocation is taking more time makes the RS considered dead by ZK. One 
> option to fix the problem would be to allocate the bucketCache before 
> registering with ZK. 
> 2014-02-28 18:54:42,967 WARN  [regionserver60020.compactionChecker] 
> util.Sleeper: We slept 33496ms instead of 10000ms, this is likely due to a 
> long garbage collecting pause and it's usually bad, see 
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2014-02-28 18:54:42,967 WARN  [regionserver60020.periodicFlusher] 
> util.Sleeper: We slept 33496ms instead of 10000ms, this is likely due to a 
> long garbage collecting pause and it's usually bad, see 
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2014-02-28 18:54:42,967 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: 
> Detected pause in JVM or host machine (eg GC): pause of approximately 23988ms
> GC pool 'ParNew' had collection(s): count=1 time=24432ms
> 2014-02-28 18:54:43,006 FATAL [regionserver60020] regionserver.HRegionServer: 
> ABORTING region server bbg-master2.bbg-test.hdp,60020,1393628951236: 
> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
> currently processing bbg-master2.bbg-test.hdp,60020,1393628951236 as dead 
> server
>         at 
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:341)
>         at 
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10643) Failure in RS when using large size bucketcache

Reply via email to