Hello,
Trying to figure out what's the recommended way of designing tables under
HBase. Let's say I need a table to gather statistics regarding user's visits
to different web pages.
In the relational database world, we could have a table with following columns:
Primary Key (system generated)
UserId (foreign key)
WebPageId (foreign key)
VisitedDateTime
& so on....
Basically, this table would allow us to answer (amongst many others) the
following questions...
1) How many times a User visited a certain Page?
2) Which web pages did a particular user visit?
3) Which users visited a particular web page? etc etc.
What's the best way to model this in HTable?
Since every HTable is really a distributed hashmap, does that mean I need to
create 3 different HTables (HashMaps) to answer these 3 questions?
1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
2) One table with UserId as the key? (To answer #2)
3) One table with WebPageId as the key? (To answer #3)
Along with HTable should I use Hive to run queries such as #1 above?
Any help in this regard will be greatly appreciated. Thanks.