I am no expert, but I am doing something very similar i.e. tracking user
sessions within Hbase.

2 tables:

Table 1:  'Users'
Column Family 1: WebPages
Columns: Page Names
RowId=UserId

For a given userid, you could retrieve the pages visited, the number of
times (watch out for versions), and the first/last date visited.  I use the
PageId as the column and use the cells for the count and date info.

Table 2: 'Pages'
Column Family 1: Visits
Columns: UserIds
RowId=PageId

For a given PageId, retrieve the userIds.  Depending upon the volume of the
web site and exactly what types of queries you have, you might want to store
the userIds as columns within the Visits.  Then iterate over the userIds.

How much data is kept is governed by the number of VERSIONS and TTL
settings.

My two cents.

Good luck.

On Wed, Oct 21, 2009 at 10:03 AM, Something Something <
[email protected]> wrote:

> Hello,
>
> Trying to figure out what's the recommended way of designing tables under
> HBase.  Let's say I need a table to gather statistics regarding user's
> visits to different web pages.
>
> In the relational database world, we could have a table with following
> columns:
>
> Primary Key (system generated)
> UserId (foreign key)
> WebPageId (foreign key)
> VisitedDateTime
> & so on....
>
> Basically, this table would allow us to answer (amongst many others) the
> following questions...
>
> 1)  How many times a User visited a certain Page?
> 2)  Which web pages did a particular user visit?
> 3)  Which users visited a particular web page?  etc etc.
>
> What's the best way to model this in HTable?
>
> Since every HTable is really a distributed hashmap, does that mean I need
> to create 3 different HTables (HashMaps) to answer these 3 questions?
>
> 1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
> 2) One table with UserId as the key? (To answer #2)
> 3) One table with WebPageId as the key? (To answer #3)
>
> Along with HTable should I use Hive to run queries such as #1 above?
>
> Any help in this regard will be greatly appreciated.  Thanks.
>
>
>

Reply via email to