Re: HBase Design Ideas, Part II

Michael Cafarella Mon, 05 Jun 2006 00:54:23 -0700

Arkady,

Sorry for the delay in responding to your message.


I think the proposal would handle your log-reporting problem quite
well.  My current plan for iterators that process a lot of rows (instead
of querying for them, or updating just a few) is to ask the user to
implement a MapReduce-like class or method.  This method is submitted
to the system, and is invoked for every tuple that passes the query
criteria.

Imagine the user implements an interface like this:

class UserIterator implements HDataIterator {
 void process(String row, String col, String attr) {};
}

and submits it using a query interface like this:

boolean submitIterator(HDataIterator it, HQuery query);

where HQuery describes the rows for which DataIterator.process()
will be invoked.  The user class can store state across invocations,
making it quite easy to compute sums/averages for log summaries.  I
think this is a better way to do aggregates than a fancy query language,
at least until the system and requirements are better-understood.

BTW, for a completely different look at how to do log processing,
you might take a look at the following quite interesting paper:

*PADX*: Querying large-scale ad hoc data with
XQuery<http://scholar.google.com/url?sa=U&q=http://www.cs.princeton.edu/%7Eyitzhakm/publications/padx_planx.pdf>
M Fernandez, K Fisher, Y Mandelbaum - Submitted to PLAN-X, 2006 -
cs.princeton.edu
http://scholar.google.com/url?sa=U&q=http://www.cs.princeton.edu/~yitzhakm/publications/padx_planx.pdf

Best,
--Mike


On 5/19/06, Arkady Borkovsky <[EMAIL PROTECTED]> wrote:



Some potential applications of HBase-like functionality are
batch-oriented: select (a large) subset of records, process a subset of
their columns (often a small subset), and either add either new
columns, or new versions of columns.
Several applications like this may be running on the same "table"
simultaneously.


Think, for example, about query log processing, where the key is a

search query, and the daily stats are represented as columns (several
columns per day), plus columns for data aggregated by week, by month,
etc.

How well does the proposal address this kind of uses?

-- ab

On May 15, 2006, at 9:54 PM, Michael Cafarella wrote:

> Hi everyone,
>
> My previous mail mentioned a bunch of design ideas that were mainly
> lifted from Jeff Dean's BigTable talk.  BigTable seems like a useful
> way to do large-scale row storage, and their decisions largely seem
> like the right ones.
>
> However, BigTable still leaves some things on the table.  Items to
> improve include a query language and multi-row locking, among
> other things.
>
> Dean said explicitly in his talk that they wanted to avoid multirow
> locking because it's complicated, error-prone, and maybe not necessary.
> He's right on at least the first two, and maybe the third.
>
> Multiple row locks are useful when you're making a change to
> several rows that should be atomic; you want all the changes
> or none of the changes.  It's also used in traditional databases
> if you want to perform an expensive read operation (like a
> multiway join) and you want to make sure the results don't
> get modified while you're reading.
>
> Distributed lock acquisition is very hard to do.  It's bug-prone
> and often has very weird performance ramifications.  It's
> difficult to get working, difficult to tune, difficult to everything.
>
> Here are a few ideas on what to do:
> 1)  Suck it up and have the client acquire locks on multiple
> HRegionServers simultaneously.  All clients would have to
> agree to acquire locks according to some global ordering to
> avoid deadlock.  HRegions would not be allowed to migrate
> to a new server if locked.
>
> If this is a rare circumstance, a better approach would be
> to have a dedicated "lock acquirer" through which clients
> make requests.  It doesn't help the theoretical problem here,
> but it would make debugging an awful lot easier.
>
> 2)  In the case of long-lasting read operations, we can
> use versioning to guarantee consistency.  If each row is
> annotated with an edit timestamp, and we know that there
> is sufficient version history available, the long-lasting job
> can run over a specific version only.
>
> Edits can continue to be made to the database while the
> read-only job is ongoing.  The operation is performed over
> the database as of the time the task was submitted.
>
> 3) In the case of multiple row updates, we may be able to
> use different edit semantics to avoid locking.  For example,
> consider that we want to add a single column/value pair to
> multiple rows.  We want this to happen atomically, so that
> both rows get the value or neither of them do so.
>
> If it's just an add, then we don't need to lock the rows at
> all; the add will always succeed, even if other writes
> intervene. Traditionally there's been no difference between
> among data "updates", so they all require locking.  If we
> can get a client to adjust the update semantics slightly,
> then the locking can be much more relaxed.
>
> I'd say that "add" or "append" semantics are likely to be
> at least as common as "edit" semantics.
>
> Can you think of the family of edit semantics you'd like
> to see offered here?
>
> Also, how useful do you think a general-purpose query language
> would be for HBase?  It would be fairly straightforward to implement,
> for example, a poor man's version of SQL that has different locking
> and update behavior (and which chucks out the more exotic elements).
> This might be compiled into a piece of code that is executed
> immediately, or it might be transformed into a long-lasting mapreduce
> job.
>
> I have a few ideas for such a language, but I'm worried it's getting
> a little far afield from what we're interested in for Hadoop.
>
> --Mike

Re: HBase Design Ideas, Part II

Reply via email to