Re: HBASE-4089 & HBASE-4147 - on the topic of ops output

Fuad Efendi Sun, 31 Jul 2011 15:10:48 -0700

What is it all about? HBase sucks. Too many problems to newcomers,
few-weeks-warm-up to begin with!!!!!!!!!!!!!!!! Is it really-really
supported by Microsoft employees?!

















And, SEO of course:
===================


-- 
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca








On 11-07-29 7:49 PM, "Otis Gospodnetic" <[email protected]> wrote:

>Hi,
>
>I'm for publishing all performance metrics in JMX (in addition to
>exposing it wherever else you guys decide).  That's because JMX is
>probably the easiest for our SPM for HBase [1] to get to HBase
>performance metrics and I suspect we are not alone.
>
>Otis
>[1] http://sematext.com/spm/hbase-performance-monitoring/index.html
>----
>Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
>Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
>>________________________________
>>From: Andrew Purtell <[email protected]>
>>To: Doug Meil <[email protected]>; "[email protected]"
>><[email protected]>
>>Sent: Friday, July 29, 2011 4:34 PM
>>Subject: Re: HBASE-4089 & HBASE-4147 - on the topic of ops output
>>
>>> I'd rather see this output being able to be captured by something the
>>>sink that Todd suggested, rather than focusing on shell access.
>>
>>
>>I don't agree.
>>
>>
>>Look at what we have existing and proposed:
>>
>>    - Java API access to server and region load information, that the
>>shell uses
>>
>>    - A proposal to dump some stats into log files, that then has to be
>>scraped
>>
>>    - A proposal (by the FB guys) to export some JSON via a HTTP servlet
>>
>>This is not good design, this is a bunch of random shit stuck together.
>>
>>Note that what Todd proposed does not preclude adding Java client API
>>support for retrieving it.
>>
>>At a minimum all of this information must be accessible via the Java
>>client API, to enable programmatic monitoring and analysis use cases.
>>I'll add the shell support if nobody else cares about it, that is a
>>relatively small detail, but one I think is important.
>>
>>Best regards,
>>
>>
>>    - Andy
>>
>>
>>Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>(via Tom White)
>>
>>
>>>________________________________
>>>From: Doug Meil <[email protected]>
>>>To: "[email protected]" <[email protected]>;
>>>"[email protected]" <[email protected]>
>>>Sent: Friday, July 29, 2011 11:39 AM
>>>Subject: Re: HBASE-4089 & HBASE-4147 - on the topic of ops output
>>>
>>>
>>>I'd rather see this output being able to be captured by something the
>>>sink
>>>that Todd suggested, rather than focusing on shell access.  HServerLoad
>>>is
>>>super-summary at the RS level, and both the items in 4089 and 4147 are
>>>proposed to be "summarized" but still have reasonable detail (e.g., even
>>>table/CF summary there could be dozens of entries given a reasonably
>>>complex system).
>>>
>>>
>>>
>>>
>>>On 7/29/11 1:15 PM, "Andrew Purtell" <[email protected]> wrote:
>>>
>>>>There is also the matter of HServerLoad and how that is used by the
>>>>shell
>>>>and master UI to report on cluster status.
>>>>
>>>>I'd like the shell to be able to let the user explore all of these
>>>>different reports interactively.
>>>>
>>>>At the very least, they should all be handled the same way.
>>>>
>>>>And then there is Riley's work over at FB on a slow query log. How does
>>>>that fit in? 
>>>> 
>>>>Best regards,
>>>>
>>>>
>>>>   - Andy
>>>>
>>>>Problems worthy of attack prove their worth by hitting back. - Piet
>>>>Hein
>>>>(via Tom White)
>>>>
>>>>
>>>>>________________________________
>>>>>From: Todd Lipcon <[email protected]>
>>>>>To: [email protected]
>>>>>Sent: Friday, July 29, 2011 9:58 AM
>>>>>Subject: Re: HBASE-4089 & HBASE-4147 - on the topic of ops output
>>>>>
>>>>>What I'd prefer is something like:
>>>>>
>>>>>interface BlockCacheReportSink {
>>>>>  public void reportStats(BlockCacheReport report);
>>>>>}
>>>>>
>>>>>class LoggingBlockCacheReportSink {
>>>>>  ... {
>>>>>    log it with whatever formatting you want
>>>>>  }
>>>>>}
>>>>>
>>>>>then a configuration which could default to the logging
>>>>>implementation,
>>>>>but
>>>>>orgs could easily substitute their own implementation. For example, I
>>>>>could
>>>>>see wanting to do an implementation where it keeps local RRD graphs of
>>>>>some
>>>>>stats, or pushes them to a central management server.
>>>>>
>>>>>The assumption is that BlockCacheReport is a fairly straightforward
>>>>>"struct"
>>>>>with the non-formatted information available.
>>>>>
>>>>>-Todd
>>>>>
>>>>>On Fri, Jul 29, 2011 at 4:15 AM, Doug Meil
>>>>><[email protected]>wrote:
>>>>>
>>>>>>
>>>>>> Hi Folks-
>>>>>>
>>>>>> You probably already my email yesterday on this...
>>>>>>  https://issues.apache.org/jira/browse/HBASE-4089 (block cache
>>>>>>report)
>>>>>>
>>>>>> ...and I just created this one...
>>>>>>  https://issues.apache.org/jira/browse/HBASE-4147 (StoreFile query
>>>>>> report)
>>>>>>
>>>>>> What I'd like to run past the dev-list is this:  if Hbase had
>>>>>>periodic
>>>>>> summary usage statistics, where should they go?  What I'd like to
>>>>>>throw
>>>>>> out for discussion is that I'm suggesting that it should simply go
>>>>>>to
>>>>>>the
>>>>>> log files and users can slice and dice this on their own.  No UI
>>>>>>(I.e.,
>>>>>> JSPs), no JMX, etc.
>>>>>>
>>>>>>
>>>>>> The summary out the output is this:
>>>>>> BlockCacheReport:  on configured interval, print out summary of
>>>>>>blockcache
>>>>>> (at table/CF level) to log file. This one is point-in-time, not
>>>>>>delta.
>>>>>>
>>>>>> StoreFile read report:  on configured interval, print out summary of
>>>>>> StoreFile accesses and how much time was spent reading each
>>>>>>StoreFile
>>>>>>to
>>>>>> log file.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Doug
>>>>>>
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>-- 
>>>>>Todd Lipcon
>>>>>Software Engineer, Cloudera
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>

Re: HBASE-4089 & HBASE-4147 - on the topic of ops output

Reply via email to