Hello,

Trying to figure out what's the recommended way of designing tables under 
HBase.  Let's say I need a table to gather statistics regarding user's visits 
to different web pages.

In the relational database world, we could have a table with following columns:

Primary Key (system generated)
UserId (foreign key)
WebPageId (foreign key)
VisitedDateTime 
& so on....

Basically, this table would allow us to answer (amongst many others) the 
following questions...

1)  How many times a User visited a certain Page?
2)  Which web pages did a particular user visit?
3)  Which users visited a particular web page?  etc etc.

What's the best way to model this in HTable?  

Since every HTable is really a distributed hashmap, does that mean I need to 
create 3 different HTables (HashMaps) to answer these 3 questions?

1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
2) One table with UserId as the key? (To answer #2)
3) One table with WebPageId as the key? (To answer #3)

Along with HTable should I use Hive to run queries such as #1 above?  

Any help in this regard will be greatly appreciated.  Thanks.


      

Reply via email to