Re: HBase table design question

Something Something Tue, 27 Oct 2009 11:48:52 -0700

Thanks, Jean-Daniel, for the reply.  Greatly appreciate it.

So is this the recommended way of implementing Parent-Child relationship in 
HBase?  Like... a User Visits zero to many WebPages   or say...   a Customer 
buys 1 to many Items.  In such cases, would we create a "Customer" HTable with 
a "buys" family and keep adding "ItemsIds" for every "CustomerId"?  Sounds a 
bit akward for some reason.. but if that's the recommended way then that's how 
I will implement it.  Please let me know what's the best way to implement 
Parent-Child relationships in HBase is.


Thanks.




________________________________
From: Jean-Daniel Cryans <[email protected]>
To: [email protected]
Sent: Tue, October 27, 2009 11:06:04 AM
Subject: Re: HBase table design question

I think your question was just forgotten.

So your value will not be overwritten, it will simply be on 2
different timestamps and only the latest one will be retrieved if you
do not specify one on your Get. By default 3 versions of that cell
will be kept but you can change this with the family attributes.

J-D

On Tue, Oct 27, 2009 at 10:17 AM, Something Something
<[email protected]> wrote:
> No responses to this question :(  Is my question that stupid, I wonder!
>
>
>
>
> ________________________________
> From: Something Something <[email protected]>
> To: [email protected]
> Sent: Wed, October 21, 2009 12:16:19 PM
> Subject: Re: HBase table design question
>
> Thanks, Jonathan for the reply.  One quick question...
>
> So in the User table when I perform the put operation:
>
> .put("visited", "pageId", 100);
>
> .put("visited", "pageId", 200);
>
> The 100 gets overwritten with 200.  Correct?  So should I use... something 
> like this...
>
> .put("visited", "pageId100", 100);
> .put("visited", "pageId200", 200);
>
> I guess, I am still missing something... sorry.. Please explain.  Thanks.
>
>
>
>
> ________________________________
> From: Jonathan Gray <[email protected]>
> To: [email protected]
> Sent: Wed, October 21, 2009 10:25:52 AM
> Subject: Re: HBase table design question
>
> You're generally on the right track.  In many cases, rather than using 
> secondary indexes in the relational world, you would have multiple tables in 
> HBase with different keys.
>
> You may not need a table for each query, but that depends on your 
> requirements of performance and the specific details of the data patterns 
> (how sparse or dense certain things will be).
>
> I would start with a User table and a WebPage table, keyed by their ids.
>
> The User table could have a Visited family.  The WebPage table could have a 
> VisitedBy family.
>
> Your queries could be run like this:
>
> 1) Get(table=User, row=userid, family=Visited, qualifier=WebPageID)
>   There are a couple different ways you could model the data here. You could 
> either put in a new version of the same qualifier for each visit, or you 
> could make the qualifier a composite key like WebPageID+VisitStamp, so they 
> would then be grouped together.
>
> 2) Get(table=User, row=userid, family=Visited)
>   All qualifiers would represent all pages visited.
>
> 3) Get(table=WebPage, row=pageid, family=VisitedBy)
>   All qualifiers would represent all users who visited.  You could store 
> multiple visits by the same user in different ways, as above.
>
>
> As for using hive to run these queries, that is not something I would 
> recommend.  For one, hive integration with hbase is not complete (as far as I 
> know).  Second, hive's emphasis is on batch/offline mapreduce jobs.   Running 
> the above 3 queries can be done with the HBase API directly, and efficiently. 
>  There's no need for SQL or anything like it.
>
> Hope that helps.
>
> JG
>
> Something Something wrote:
>> Hello,
>>
>> Trying to figure out what's the recommended way of designing tables under 
>> HBase.  Let's say I need a table to gather statistics regarding user's 
>> visits to different web pages.
>>
>> In the relational database world, we could have a table with following 
>> columns:
>>
>> Primary Key (system generated)
>> UserId (foreign key)
>> WebPageId (foreign key)
>> VisitedDateTime & so on....
>>
>> Basically, this table would allow us to answer (amongst many others) the 
>> following questions...
>>
>> 1)  How many times a User visited a certain Page?
>> 2)  Which web pages did a particular user visit?
>> 3)  Which users visited a particular web page?  etc etc.
>>
>> What's the best way to model this in HTable?
>> Since every HTable is really a distributed hashmap, does that mean I need to 
>> create 3 different HTables (HashMaps) to answer these 3 questions?
>>
>> 1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
>> 2) One table with UserId as the key? (To answer #2)
>> 3) One table with WebPageId as the key? (To answer #3)
>>
>> Along with HTable should I use Hive to run queries such as #1 above?
>> Any help in this regard will be greatly appreciated.  Thanks.
>>
>>
>>
>
>
>

Re: HBase table design question

Reply via email to