Re: Performance at large number of regions/node

Jean-Daniel Cryans Fri, 28 May 2010 12:42:46 -0700

If the table was already created, changing hbase.hregion.max.filesize
and hbase.hregion.memstore.flush.size won't be considered, those are
the default values for new tables. You can set it in the shell too,
see the "alter" command.


Also, did you restart HBase? Did you push the configs to all nodes?
Did you disable writing to the WAL? If not, because durability is
still important to you but you want to upload as fast as you can, I
would recommend changing this too:

hbase.regionserver.hlog.blocksize 134217728

hbase.regionserver.maxlogs 128

I forgot you had quite largish values, so that must affect the log
rolling a _lot_.

Finally, did you LZOed the table? From experience, it will only do
good http://wiki.apache.org/hadoop/UsingLzoCompression

And finally (for real this time), how are you uploading to HBase? How
many clients? Are you even using the write buffer?
http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean)

J-D

On Fri, May 28, 2010 at 12:28 PM, Jacob Isaac <ja...@ebrary.com> wrote:
> Did a run yesterday, posted the relevant parameters below.
> Did not see any difference in throughput or total run time (~9 hrs)
>
> I am consistently getting about 5k rows/sec, each row around ~4-5k
> using a 17 node Hbase on 20 node HDFS cluster
>
> How does it compare?? Can I juice it more?
>
> ~Jacob
>
>
>  <property>
>    <name>hbase.regionserver.handler.count</name>
>    <value>60</value>
>  </property>
>
>  <property>
>    <name>hbase.hregion.max.filesize</name>
>    <value>1073741824</value>
>  </property>
>
>  <property>
>    <name>hbase.hregion.memstore.flush.size</name>
>    <value>100663296</value>
>  </property>
>
>  <property>
>    <name>hbase.hstore.blockingStoreFiles</name>
>    <value>15</value>
>  </property>
>
>  <property>
>    <name>hbase.hstore.compactionThreshold</name>
>    <value>4</value>
>  </property>
>
>  <property>
>    <name>hbase.hregion.memstore.block.multiplier</name>
>    <value>8</value>
>  </property>
>
>
>
> On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans 
> <jdcry...@apache.org>wrote:
>
>> Like I said in my first email, it helps for random reading when lots
>> of RAM is available to HBase. But it won't help the write throughput.
>>
>> J-D
>>
>> On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman
>> <vidhy...@yahoo-inc.com> wrote:
>> > I am not sure if I understood this right, but does changing
>> hfile.block.cache.size also help?
>> >
>> >
>> > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <jdcry...@apache.org> wrote:
>> >
>> > Well we do have a couple of other configs for high write throughput:
>> >
>> > <property>
>> >  <name>hbase.hstore.blockingStoreFiles</name>
>> >  <value>15</value>
>> > </property>
>> > <property>
>> >  <name>hbase.hregion.memstore.block.multiplier</name>
>> >  <value>8</value>
>> > </property>
>> > <property>
>> >  <name>hbase.regionserver.handler.count</name>
>> >  <value>60</value>
>> > </property>
>> > <property>
>> >  <name>hbase.regions.percheckin</name>
>> >  <value>100</value>
>> > </property>
>> >
>> > The last one is for restarts. Uploading very fast, you will more
>> > likely hit all the upper limits (blocking store file and memstore) and
>> > this will lower your throughput. Those configs relax that. Also for
>> > speedier uploads we disable writing to the WAL
>> >
>> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean)
>> .
>> > If the job fails or any machines fails you'll have to restart it or
>> > figure the whole, and you absolutely need to force flushes when the MR
>> > is done.
>> >
>> > J-D
>> >
>> > On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <ja...@ebrary.com> wrote:
>> >> Thanks J-D
>> >>
>> >> Currently we are trying to find/optimize our load/write times - although
>> in
>> >> prod we expect it to be 25/75 (writes/reads) ratio.
>> >> We are using long table model with only one column - row-size is
>> typically ~
>> >> 4-5k
>> >>
>> >> As to your suggestion on not using even 50% of disk space - I agree and
>> was
>> >> planning to use only ~30-40% (1.5T of 4T) for HDFS
>> >> and as I reported earlier
>> >> 4000 regi...@256m per region(with 3 replications) on 20 nodes ==  150G
>> >> per/node == 10% utilization
>> >>
>> >> while using 1GB as maxfilesize did you have to adjust other params such
>> >> as hbase.hstore.compactionThreshold and
>> hbase.hregion.memstore.flush.size.
>> >> There is an interesting observation by Jonathan Gray documented/reported
>> in
>> >> HBASE-2375 -
>> >> wondering whether that issue gets compounded when using 1G as the
>> >> hbase.hregion.max.filesize
>> >>
>> >> Thx
>> >> Jacob
>> >>
>> >>
>> >
>> >
>>
>

Re: Performance at large number of regions/node

Reply via email to