[jira] Updated: (HBASE-2774) Spin in ReadWriteConsistencyControl eating CPU (load > 40) and no progress running YCSB on clean cluster startup

stack (JIRA) Wed, 23 Jun 2010 16:07:16 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack updated HBASE-2774:
-------------------------

    Attachment: sync-wait3.txt

This patch of Ryan's seems to fix the issue.  No longer do I see a spike in 
load.

The patch uses volatile longs in place of AtomicLong -- don't need the 
AtomicLong functionality -- and it then does a wait/notify instead of spinning.

The spun counter is not used... should return it out and log extreme counts?

Let me try Todd's suggestion next.

> Spin in ReadWriteConsistencyControl eating CPU (load > 40) and no progress 
> running YCSB on clean cluster startup
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2774
>                 URL: https://issues.apache.org/jira/browse/HBASE-2774
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: sync-wait3.txt
>
>
> When I try to do a YCSB load, RSs will spin up massive load but make no 
> progress.  Seems to happen to each RS in turn until they do their first 
> flush.  They stay in the high-load mode for maybe 5-10 minutes or so and then 
> fall out of the bad condition.
> Here is my ugly YCSB command (Haven't gotten around to tidying it up yet):
> {code}
> $ java -cp 
> build/ycsb.jar:/home/hadoop/current/conf/:/home/hadoop/current/hbase-0.21.0-SNAPSHOT.jar:/home/hadoop/current/lib/hadoop-core-0.20.3-append-r956776.jar:/home/hadoop/current/lib/zookeeper-3.3.1.jar:/home/hadoop/current/lib/commons-logging-1.1.1.jar:/home/hadoop/current/lib/log4j-1.2.15.jar
>   com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient  -P 
> workloads/5050 -p columnfamily=values -s -threads 100 -p recordcount=10000000
> {code}
> Cluster is 5 regionservers NOT running hadoop-core-0.20.3-append-r956776 but 
> rather old head of branch-0.20 hadoop.
> It seems that its easy to repro if you start fresh.  It might happen later in 
> loading but it seems as though after first flush, we're ok.
> It comes on pretty immediately.  The server that is taking on the upload has 
> its load start to climb gradually up into the 40s then stays there.  Later it 
> falls when condtion clears.
> Here is content of my yahoo workload file:
> {code}
> recordcount=100000000
> operationcount=100000000
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=0.5
> updateproportion=0.5
> scanproportion=0
> insertproportion=0
> requestdistribution=zipfian
> {code}
> Here is my hbase-site.xml
> {code}
>   <property>
>     <name>hbase.regions.slop</name>
>     <value>0.01</value>
>     <description>Rebalance if regionserver has average + (average * slop) 
> regions.
>     Default is 30% slop.
>     </description>
>   </property>
>   <property>
>     <name>hbase.zookeeper.quorum</name>
>     <value>XXXXXXXXX</value>
>   </property>
> <property>
>   <name>hbase.regionserver.hlog.blocksize</name>
>   <value>67108864</value>
>   <description>Block size for HLog files. To minimize potential data loss,
>     the size should be (avg key length) * (avg value length) * 
> flushlogentries.
>     Default 1MB.
>   </description>
> </property>
> <property>
>   <name>hbase.hstore.blockingStoreFiles</name>
>   <value>25</value>
> </property>
> <property>
>   <name>hbase.rootdir</name>
>   <value>hdfs://svXXXXXX:9000/hbase</value>
>   <description>The directory shared by region servers.</description>
> </property>
> <property>
>   <name>hbase.cluster.distributed</name>
>   <value>true</value>
> </property>
> <property>
>   <name>zookeeper.znode.parent</name>
>   <value>/stack</value>
>   <description>
>     the path in zookeeper for this cluster
>   </description>
> </property>
> <property>
>   <name>hfile.block.cache.size</name>
>   <value>0.2</value>
>   <description>
>     The size of the block cache used by HFile/StoreFile. Set to 0 to disable.
>   </description>
> </property>
> <property>
>   <name>hbase.hregion.memstore.block.multiplier</name>
>   <value>8</value>
>   <description>
>     Block updates if memcache has hbase.hregion.block.memcache
>     time hbase.hregion.flush.size bytes.  Useful preventing
>     runaway memcache during spikes in update traffic.  Without an
>     upper-bound, memcache fills such that when it flushes the
>     resultant flush files take a long time to compact or split, or
>     worse, we OOME.
>   </description>
> </property>
> <property>
> <name>zookeeper.session.timeout</name>
> <value>60000</value>
> </property>
> <property>
>   <name>hbase.regionserver.handler.count</name>
>   <value>60</value>
>   <description>Count of RPC Server instances spun up on RegionServers
>     Same property is used by the HMaster for count of master handlers.
>     Default is 10.
>   </description>
> </property>
> <property>
>     <name>hbase.regions.percheckin</name>
>     <value>20</value>
> </property>
> <property>
>     <name>hbase.regionserver.maxlogs</name>
>     <value>128</value>
> </property>
> <property>
>     <name>hbase.regionserver.logroll.multiplier</name>
>     <value>2.95</value>
> </property>
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2774) Spin in ReadWriteConsistencyControl eating CPU (load > 40) and no progress running YCSB on clean cluster startup

Reply via email to