Kevin, Thank you for your response. This is not a question on how to configure correctly HBase cluster for write heavy workloads. This is internal HBase issue - something is wrong in a default logic of compaction selection algorithm in 0.94-0.98. It seems that nobody has ever tested importing data with very high hbase.hstore.blockingStoreFiles value (200 in our case).
-Vladimir Rodionov On Wed, Dec 3, 2014 at 6:38 AM, Kevin O'dell <[email protected]> wrote: > Vladimir, > > I know you said, "do not ask me why", but I am going to have to ask you > why. The fact you are doing this(this being blocking store files > 200) > tells me there is something or multiple somethings wrong with your cluster > setup. A couple things come to mind: > > * During this heavy write period, could we use bulk loads? If so, this > should solve almost all of your problems > > * 1GB region size is WAY too small, and if you are pushing the volume of > data you are talking about I would recommend 10 - 20GB region sizes this > should help keep your region count smaller as well which will result in > more optimal writes > > * Your cluster may be undersized, if you are setting the blocking to be > that high, you may be pushing too much data for your cluster overall. > > Would you be so kind as to pass me a few pieces of information? > > 1.) Cluster size > 2.) Average region count per RS > 3.) Heap size, Memstore global settings, and block cache settings > 4.) a RS log to pastebin and a time frame of "high writes" > > I can probably make some solid suggestions for you based on the above data. > > On Wed, Dec 3, 2014 at 1:04 AM, Vladimir Rodionov <[email protected]> > wrote: > > > This is what we observed in our environment(s) > > > > The issue exists in CDH4.5, 5.1, HDP2.1, Mapr4 > > > > If some one sets # of blocking stores way above default value, say - 200 > to > > avoid write stalls during intensive data loading (do not ask me , why we > do > > this), then > > one of the regions grows indefinitely and takes more 99% of overall > table. > > > > It can't be split because it still has orphaned reference files. Some of > a > > reference files are able to avoid compactions for a long time, obviously. > > > > The split policy is IncreasingToUpperBound, max region size is 1G. I do > my > > tests on CDH4.5 mostly but all other distros seem have the same issue. > > > > My attempt to add reference files forcefully to compaction list in > > Store.requetsCompaction() when region exceeds recommended maximum size > did > > not work out well - some weird results in our test cases (but HBase tests > > are OK: small, medium and large). > > > > What is so special with these reference files? Any ideas, what can be > done > > here to fix the issue? > > > > -Vladimir Rodionov > > > > > > -- > Kevin O'Dell > Systems Engineer, Cloudera >
