Re: Status of Huawei's 2' Indexing?

Rajeshbabu Chintaguntla Mon, 16 Mar 2015 13:11:16 -0700

Hi Rose,

Sorry for late reply.


bq. Is there work on this that I don’t see?
You can try this [1] for checking something with 0.98.3 version(sorry not
that much latest). We thought of making it independent from HBase. Trying
to do when ever find time(only few kernel changes left in bulkload to
prepare and load data together to data table and all indexes in single
job.).

bq. Did I miss the mailing list thread where the architectural
differences were discussed?
You can find the discussion that time happened here[2].

By the time I started working on this in HBase lot of things done in
Phoenix indexing which I didn't know like 1)failover handling 2)data type
support 3) maintaining standard index meta data separately in catalog
tables 4) expression based filters in Phoenix and many more.. which are
missing in hindex. So we thought of integrating the same solution to
Phoenix first and able to do with minimal changes. To avoid the
complexities with colocation raised an improvement action in Phoenix hope
it simplifies many things[3].


[1] https://github.com/Huawei-Hadoop/hindex/tree/hbase-0.98
[2]
http://search-hadoop.com/m/L1qeI1U99nd1&subj=Re+Design+review+Secondary+index+support+through+coprocess
[3] https://issues.apache.org/jira/browse/PHOENIX-1734

Thanks,
Rajeshbabu.

On Tue, Mar 17, 2015 at 12:52 AM, Michael Segel <[email protected]>
wrote:

> You miss the point.
> Your index is going to be orthogonal to your base table.
> Again, how do you handle joins?
>
> In terms of indexing… you have to ways of building your index.
> 1) In a separate M/R job.
> 2) As each row is inserted, the coprocessor inserts the data in to the
> secondary indexes.
>
> More to your point…
>
> Yes there is a delta between when you write your row to the base table and
> when you write your row to your inverted index table.
> The short answer is that time is relative and it doesn’t matter.  Again,
> you’re going to have to think about that issue for a while before it sinks
> in. You’re not dealing with an RTOS problem… so its not real time but
> subjective real time.
>
> In terms of writing to two tables… what do you think your relational
> database is doing? ;-)
>
> I suggest you think more about the problem and the more you think about
> the problem, you’ll understand that there are tradeoffs and when you walk
> through the problem you’ll come to the conclusion that you want your index
> table(s) to be orthogonal to the base table.
>
>
> > On Mar 16, 2015, at 12:54 PM, lars hofhansl <[email protected]> wrote:
> >
> > Dude... Relax... Let's keep it cordial, please.
> >
> > To the topic:
> > Any CS 101 student can implement an eventually consistent index on top
> of HBase.
> > The part that is always missed is: How do you keep it consistent?There
> you have essentially two choices: (1) every update to an indexed table
> becomes a distributed transaction or (2) you keep region server local
> indexes.
> > There is nothing wrong with #2. It's good for not-so-selective indexes.
> > There is also nothing wrong with #1. This one is good for highly
> selective indexes (PK, etc)
> >
> > Indexes and joins do not have to be conflated. And maybe your use case
> is fine with eventually consistent indexes. In that case just write your
> stuff into two tables and be done with it.
> >
> > -- Lars
> >
> >      From: Michael Segel <[email protected]>
> > To: [email protected]
> > Sent: Monday, March 16, 2015 8:14 AM
> > Subject: Re: Status of Huawei's 2' Indexing?
> >
> > You’ll have to excuse Andy.
> >
> > He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
> was trivial. Just got done last month….
> >
> > But I digress… The long story short…
> >
> > HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
> the region which had two problems.
> > 1) Complexity in that they wanted to keep the index on the same region
> server
> > 2) Joins become impossible.  Well, actually not impossible, but
> incredibly slow when compared to the alternative.
> >
> > You really should go back to the email chain.
> > Their defense (including Salesforce who was going to push this approach)
> fell apart when you asked the simple question on how do you handle joins?
> >
> > That’s their OOPS moment. Once you start to understand that, then
> allowing the index to be orthogonal to the base table, things started to
> come together.
> >
> > In short, you have a query either against a single table, or if you’re
> doing a join.  You then get the indexes and assuming that you’re only using
> the AND predicate, its a simple intersection of the index result sets.
> (Since the result sets are ordered, its relatively trivial to walk through
> and find the intersections of N Lists in a single pass.)
> >
> >
> > Now you have your result set of base table row keys and you can work
> with that data. (Either returning the records to the client, or as input to
> a map/reduce job.
> >
> > That’s the 30K view.  There’s more to it, but once Salesforce got the
> basic idea, they ran with it. It was really that simple concept that the
> index would be orthogonal to the base table that got them moving in the
> right direction.
> >
> >
> > To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
> it seems that some of the Committers are suffering from rectal induced
> hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
> spotting’ but also to get the Committers to focus on bringing the solutions
> that they glum on in the client, back to the server side of things.
> >
> > Unfortunately the last great attempt at fixing things on the server side
> was the bastardization of coprocessors which again, suffers from the lack
> of thought.  This isn’t to say that allowing users to extend the server
> side functionality is wrong. (Because it isn’t.) But that the
> implementation done in HBase is a tad lacking in thought.
> >
> > So in terms of indexing…
> > Longer term picture, there has to be some fixes on the server side of
> things to allow one to associate an index (allowing for different types) to
> a base table, yet the implementation of using the index would end up
> becoming a client.  And by client, it would be an external query engine
> processor that could/should sit on the cluster.
> >
> > But hey! What do I know?
> > I gave up trying to have an intelligent/civilized conversation with
> Andrew because he just couldn’t grasp the basics.  ;-)
> >
> >
> >
> >
> >
> >
> >
> >> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <[email protected]>
> wrote:
> >>
> >> When I made that remark I was thinking of a recent discussion we had at
> a
> >> joint Phoenix and HBase developer meetup. The difference of opinion was
> >> certainly civilized. (smile) I'm not aware of any specific written
> >> discussion, it may or may not exist. I'm pretty sure a revival of
> HBASE-9203
> >> would attract some controversy, but let me be clearer this time than I
> was
> >> before that this is just my opinion, FWIW.
> >>
> >>
> >> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
> >> [email protected]> wrote:
> >>
> >>> I saw that it was added to their project. I’m really not keen on
> bringing
> >>> in all the RDBMS apparatus on top of hbase, so I decided to follow
> other
> >>> avenues first (like trying to patch 0.98, for better or worse.)
> >>>
> >>> That Phoenix article seems like a good breakdown of the various
> indexing
> >>> architectures.
> >>>
> >>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
> (as
> >>> are most of them, it seems) so I didn’t know there were these
> differences
> >>> of opinion. Did I miss the mailing list thread where the architectural
> >>> differences were discussed?
> >>>
> >>>
> >>> -j
> >
> > The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
> >
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Re: Status of Huawei's 2' Indexing?

Reply via email to