For #2 below, I suggest more validation against 0.90.5 - 0.90.1 is pretty old.
Cheers On Sun, Jan 8, 2012 at 3:05 PM, Mikael Sitruk <[email protected]>wrote: > Ted hi > > 1. thanks for pointing on HBASE-3051, Compaction at the granularity of a > column-family, it seems promising > > 2. Regarding manual management of compaction - it is exactly what i tried > to do and found all the finding. *In short there is no way to disable major > compaction from running automatically* (point #1 in original email), should > a JIRA be opened? > > 3. I have opened the following ones > HBASE-5146 - Hbase Shell - allow setting config properties > HBASE-5147 - Compaction/Major compaction operation from shell/API/JMX > HBASE-5148 - Compaction property at the server level are not propagated at > the table level > HBASE-5149 - getConfiguration() implementation is misleading > > Regards, > Mikael.S > > On Sun, Jan 8, 2012 at 11:07 PM, Ted Yu <[email protected]> wrote: > > > HBASE-3051, Compaction at the granularity of a column-family, is marked > > implemented by HBASE-3796 > > <https://issues.apache.org/jira/browse/HBASE-3796>which is in 0.92 > > (0.92 RC3 is coming out soon) > > > > Please see http://hbase.apache.org/book/regions.arch.html, 8.7.5.5 which > > refers to > > > > > http://hbase.apache.org/book/important_configurations.html#managed.compactions > > > > Cheers > > > > On Sun, Jan 8, 2012 at 12:55 PM, Mikael Sitruk <[email protected] > > >wrote: > > > > > Well I'm very interested to dig further. I can also tell that the > number > > of > > > log is getting very high very fast and of course a flush is triggered > > > adding more store files. Very fast the high number of store files > trigger > > > compaction and delay the flushing (default delay is 90000 ms). The > files > > > are small in size, major compaction is not needed but minor yes. > > > Nevertheless the code ignore the disabled automatic compaction and > > promotes > > > files to major compaction. > > > I think I need to play with both the log file size the compaction > > threshold > > > and the Max number of stores file. Do you have some recommendations? > > > Btw the compaction take about 1min 40 sec for a store size of 900MB > +/-. > > Is > > > it normal? > > > One thing that does not help in this story is that I have 2 column > > families > > > and each RS manages 100 of regions each cf growth with differents > speed. > > > Is there a version of hbase handling better such case (not flushing > both > > cf > > > if not needed to)? > > > > > > I will review the release note of the versions you suggested and open > > > issues/enhancements we discuss. > > > > > > Thanks > > > Cheers. > > > On Jan 8, 2012 10:22 PM, "Ted Yu" <[email protected]> wrote: > > > > > > > Your request in first paragraph below deserves a JIRA. > > > > > > > > For 2.b I agree a bug should be filed. > > > > > > > > For major compaction, adding more logs on region server side should > > help > > > > you understand the situation better - assuming you have interest to > dig > > > > further. > > > > Please upgrade to 0.90.5, or you can wait for 0.90.6 release which is > > > > slated for Jan. 19th. > > > > > > > > After upgrade, the logs and code would be more pertinent to the tip > of > > > 0.90 > > > > branch. > > > > > > > > Thanks for summarizing your findings. > > > > > > > > On Sun, Jan 8, 2012 at 12:04 PM, Mikael Sitruk < > > [email protected] > > > > >wrote: > > > > > > > > > In fact I think that for 2.a the current implementation is > > misleading. > > > > > Creating a connection and getting the configuration from the > > connection > > > > > should return the configuration of the cluster. > > > > > Requesting the configuration used to build an object should return > > the > > > > > configuration set on the object > > > > > Additionally it should be a new method like getConfigurations(), or > > > > > getClusterConfigurations() returning a map of serverinfo and > > > > > configuration. Another option is to add on the HRegionServer and > > > > HMaster a > > > > > method getConfiguration() returning the configuration object used > by > > > the > > > > > RegionServer or Master > > > > > > > > > > Regarding 2.b yes I tried but it did not return the setting from > the > > > > > cluster configuration (again server has non default configuration, > > > table > > > > > was not configured with specific values then cluster configuration > > > should > > > > > apply on the table object). So I see it as problematic. > > > > > > > > > > Mikael.s > > > > > On Jan 8, 2012 7:54 PM, <[email protected]> wrote: > > > > > > > > > > > About 2b, have you tried getting the major compaction setting > from > > > > column > > > > > > descriptor ? > > > > > > > > > > > > For 2a, what you requested would result in new methods of > > > > > > HBaseConfiguration class to be added. Currently the configuration > > on > > > > > client > > > > > > class path would be used. > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > > > > On Jan 8, 2012, at 9:28 AM, Mikael Sitruk < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > Ted hi > > > > > > > First thanks for answering, regarding the JIRA i will fill them > > > > > > > Second, it seems that i did not explain myself correctly > > regarding > > > > > 2.a. - > > > > > > > As you i do not expect that a configuration set on my client > will > > > be > > > > > > > propagated to the cluster, but i do expect that if i set a > > > > > configuration > > > > > > on > > > > > > > a server then doing connection.getConfiguration() from a > client i > > > > will > > > > > > get > > > > > > > teh configuration from the cluster. > > > > > > > Currently the configuration returned is from the client config. > > > > > > > So the problem is that you have no way to check the > configuration > > > of > > > > a > > > > > > > cluster. > > > > > > > I would expect to have some API to return the cluster config > and > > > even > > > > > > > getting a map <serverInfo, config> so it can be easy to check > > > cluster > > > > > > > problem using code. > > > > > > > > > > > > > > 2.b. I know this code, and i tried to validate it. I set in the > > > > server > > > > > > > config the "hbase.hregion.majorcompaction" to "0", then start > the > > > > > server > > > > > > > (cluster). Since from the UI or from JMX this parameter is not > > > > visible > > > > > at > > > > > > > the cluster level, I try to get the value from the client (to > see > > > > that > > > > > > the > > > > > > > cluster is using it) > > > > > > > > > > > > > > *HTableDescriptor hTableDescriptor = > > > > > > > conn.getHTableDescriptor(Bytes.toBytes("my table"));* > > > > > > > > > > > > > > *hTableDescriptor.getValue("hbase.hregion.majorcompaction")* > > > > > > > but i still got 24h (and not the value set in the config)! that > > was > > > > my > > > > > > > problem from the beginning! ==> Using the config (on the server > > > side) > > > > > > will > > > > > > > not propagate into the table/column family > > > > > > > > > > > > > > Mikael.S > > > > > > > > > > > > > > On Sun, Jan 8, 2012 at 7:13 PM, Ted Yu <[email protected]> > > > wrote: > > > > > > > > > > > > > >> I am not expert in major compaction feature. > > > > > > >> Let me try to answer questions in #2. > > > > > > >> > > > > > > >> 2.a > > > > > > >>> If I set the property via the configuration shouldn’t all the > > > > cluster > > > > > > be > > > > > > >>> aware of? > > > > > > >> > > > > > > >> There're multiple clients connecting to one cluster. I > wouldn't > > > > expect > > > > > > >> values in the configuration (m_hbConfig) to propagate onto the > > > > > cluster. > > > > > > >> > > > > > > >> 2.b > > > > > > >> Store.getNextMajorCompactTime() shows that > > > > > > "hbase.hregion.majorcompaction" > > > > > > >> can be specified per column family: > > > > > > >> > > > > > > >> long getNextMajorCompactTime() { > > > > > > >> // default = 24hrs > > > > > > >> long ret = conf.getLong(HConstants.MAJOR_COMPACTION_PERIOD, > > > > > > >> 1000*60*60*24); > > > > > > >> if (family.getValue(HConstants.MAJOR_COMPACTION_PERIOD) != > > > null) { > > > > > > >> > > > > > > >> 2.d > > > > > > >>> d. I tried also to setup the parameter via hbase shell but > > > setting > > > > > such > > > > > > >>> properties is not supported. (do you plan to add such support > > via > > > > the > > > > > > >>> shell?) > > > > > > >> > > > > > > >> This is a good idea. Please open a JIRA. > > > > > > >> > > > > > > >> For #5, HBASE-3965 is an improvement and doesn't have a patch > > yet. > > > > > > >> > > > > > > >> Allow me to quote Alan Kay: 'The best way to predict the > future > > is > > > > to > > > > > > >> invent it.' > > > > > > >> > > > > > > >> Once we have a patch, we can always backport it to 0.92 after > > some > > > > > > people > > > > > > >> have verified the improvement. > > > > > > >> > > > > > > >>> 6. In case a compaction (major) is running it seems > there > > > is > > > > no > > > > > > way > > > > > > >>> to stop-it. Do you plan to add such feature? > > > > > > >> > > > > > > >> Again, logging a JIRA would provide a good starting point for > > > > > > discussion. > > > > > > >> > > > > > > >> Thanks for the verification work and suggestions, Mikael. > > > > > > >> > > > > > > >> On Sun, Jan 8, 2012 at 7:27 AM, Mikael Sitruk < > > > > > [email protected] > > > > > > >>> wrote: > > > > > > >> > > > > > > >>> I forgot to mention, I'm using HBase 0.90.1 > > > > > > >>> > > > > > > >>> Regards, > > > > > > >>> Mikael.S > > > > > > >>> > > > > > > >>> On Sun, Jan 8, 2012 at 5:25 PM, Mikael Sitruk < > > > > > [email protected] > > > > > > >>>> wrote: > > > > > > >>> > > > > > > >>>> Hi > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> I have some concern regarding major compactions below... > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> 1. According to best practices from the mailing list and > > from > > > > the > > > > > > >>>> book, automatic major compaction should be disabled. This > > can > > > be > > > > > > >> done > > > > > > >>> by > > > > > > >>>> setting the property ‘hbase.hregion.majorcompaction’ to > ‘0’. > > > > > > >>> Neverhteless > > > > > > >>>> even after having doing this I STILL see “major > compaction” > > > > > messages > > > > > > >>> in > > > > > > >>>> logs. therefore it is unclear how can I manage major > > > > compactions. > > > > > > >> (The > > > > > > >>>> system has heavy insert - uniformly on the cluster, and > > major > > > > > > >>> compaction > > > > > > >>>> affect the performance of the system). > > > > > > >>>> If I'm not wrong it seems from the code that: even if not > > > > > requested > > > > > > >>>> and even if the indicator is set to '0' (no automatic > major > > > > > > >>> compaction), > > > > > > >>>> major compaction can be triggered by the code in case all > > > store > > > > > > >> files > > > > > > >>> are > > > > > > >>>> candidate for a compaction (from Store.compact(final > boolean > > > > > > >>> forceMajor)). > > > > > > >>>> Shouldn't the code add a condition that automatic major > > > > compaction > > > > > > >> is > > > > > > >>>> disabled?? > > > > > > >>>> > > > > > > >>>> 2. I tried to check the parameter > > > > ‘hbase.hregion.majorcompaction’ > > > > > > >> at > > > > > > >>>> runtime using several approaches - to validate that the > > server > > > > > > >> indeed > > > > > > >>>> loaded the parameter. > > > > > > >>>> > > > > > > >>>> a. Using a connection created from local config > > > > > > >>>> > > > > > > >>>> *conn = (HConnection) > > > > HConnectionManager.getConnection(m_hbConfig);* > > > > > > >>>> > > > > > > >>>> > > > > *conn.getConfiguration().getString(“hbase.hregion.majorcompaction”)* > > > > > > >>>> > > > > > > >>>> returns the parameter from local config and not from > cluster. > > Is > > > > it > > > > > a > > > > > > >>> bug? > > > > > > >>>> If I set the property via the configuration shouldn’t all > the > > > > > cluster > > > > > > >> be > > > > > > >>>> aware of? (supposing that the connection indeed connected to > > the > > > > > > >> cluster) > > > > > > >>>> > > > > > > >>>> b. fetching the property from the table descriptor > > > > > > >>>> > > > > > > >>>> *HTableDescriptor hTableDescriptor = > > > > > > >>>> conn.getHTableDescriptor(Bytes.toBytes("my table"));* > > > > > > >>>> > > > > > > >>>> *hTableDescriptor.getValue("hbase.hregion.majorcompaction")* > > > > > > >>>> > > > > > > >>>> This will returns the default parameter value (1 day) not > the > > > > > > parameter > > > > > > >>>> from the configuration (on the cluster). It seems to be a > bug, > > > > isn’t > > > > > > >> it? > > > > > > >>>> (the parameter from the config, should be the default if not > > set > > > > at > > > > > > the > > > > > > >>>> table level) > > > > > > >>>> > > > > > > >>>> c. The only way I could set the parameter to 0 and really > see > > it > > > > is > > > > > > via > > > > > > >>>> the Admin API, updating the table descriptor or the column > > > > > descriptor. > > > > > > >>> Now > > > > > > >>>> I could see the parameter on the web UI. So is it the only > way > > > to > > > > > set > > > > > > >>>> correctly the parameter? If setting the parameter via the > > > > > > configuration > > > > > > >>>> file, shouldn’t the webUI show this on any table created? > > > > > > >>>> > > > > > > >>>> d. I tried also to setup the parameter via hbase shell but > > > setting > > > > > > such > > > > > > >>>> properties is not supported. (do you plan to add such > support > > > via > > > > > the > > > > > > >>>> shell?) > > > > > > >>>> > > > > > > >>>> e. Generally is it possible to get via API the configuration > > > used > > > > by > > > > > > >> the > > > > > > >>>> servers? (at cluster/server level) > > > > > > >>>> > > > > > > >>>> 3. I ran both major compaction requests from the shell > or > > > > from > > > > > > >> API > > > > > > >>>> but since both are async there is no progress indication. > > > Neither > > > > > the > > > > > > >> JMX > > > > > > >>>> nor the Web will help here since you don’t know if a > > compaction > > > > task > > > > > > is > > > > > > >>>> running. Tailling the logs is not an efficient way to do > this > > > > > neither. > > > > > > >>> The > > > > > > >>>> point is that I would like to automate the process and avoid > > > > > > compaction > > > > > > >>>> storm. So I want to do that region, region, but if I don’t > > know > > > > > when a > > > > > > >>>> compaction started/ended I can’t automate it. > > > > > > >>>> > > > > > > >>>> 4. In case there is no compaction files in queue (but > > > still > > > > > you > > > > > > >>> have > > > > > > >>>> more than 1 storefile per store e.g. minor compaction just > > > > finished) > > > > > > >> then > > > > > > >>>> invoking major_compact will indeed decrease the number of > > store > > > > > files, > > > > > > >>> but > > > > > > >>>> the compaction queue will remain to 0 during the compaction > > task > > > > > > >>> (shouldn’t > > > > > > >>>> the compaction queue increase by the number of file to > compact > > > and > > > > > be > > > > > > >>>> reduced when the task ended?) > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> 5. I saw already HBASE-3965 for getting status of > major > > > > > > >> compaction, > > > > > > >>>> nevertheless it has be removed from 0.92, is it possible to > > put > > > it > > > > > > >> back? > > > > > > >>>> Even sooner than 0.92? > > > > > > >>>> > > > > > > >>>> 6. In case a compaction (major) is running it seems > > there > > > is > > > > > no > > > > > > >> way > > > > > > >>>> to stop-it. Do you plan to add such feature? > > > > > > >>>> > > > > > > >>>> 7. Do you plan to add functionality via JMX > > > > (starting/stopping > > > > > > >>>> compaction, splitting....) > > > > > > >>>> > > > > > > >>>> 8. Finally there were some request for allowing custom > > > > > > >> compaction, > > > > > > >>>> part of this was given via the RegionObserver in HBASE-2001, > > > > > > >> nevertheless > > > > > > >>>> do you consider adding support for custom compaction > > (providing > > > > real > > > > > > >>>> pluggable compaction stategy not just observer)? > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> Regards, > > > > > > >>>> Mikael.S > > > > > > >>>> > > > > > > >>>> > > > > > > >>> > > > > > > >>> > > > > > > >>> -- > > > > > > >>> Mikael.S > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Mikael.S > > > > > > > > > > > > > > > > > > > > >
