I am no expert, but I am doing something very similar i.e. tracking user sessions within Hbase.
2 tables: Table 1: 'Users' Column Family 1: WebPages Columns: Page Names RowId=UserId For a given userid, you could retrieve the pages visited, the number of times (watch out for versions), and the first/last date visited. I use the PageId as the column and use the cells for the count and date info. Table 2: 'Pages' Column Family 1: Visits Columns: UserIds RowId=PageId For a given PageId, retrieve the userIds. Depending upon the volume of the web site and exactly what types of queries you have, you might want to store the userIds as columns within the Visits. Then iterate over the userIds. How much data is kept is governed by the number of VERSIONS and TTL settings. My two cents. Good luck. On Wed, Oct 21, 2009 at 10:03 AM, Something Something < [email protected]> wrote: > Hello, > > Trying to figure out what's the recommended way of designing tables under > HBase. Let's say I need a table to gather statistics regarding user's > visits to different web pages. > > In the relational database world, we could have a table with following > columns: > > Primary Key (system generated) > UserId (foreign key) > WebPageId (foreign key) > VisitedDateTime > & so on.... > > Basically, this table would allow us to answer (amongst many others) the > following questions... > > 1) How many times a User visited a certain Page? > 2) Which web pages did a particular user visit? > 3) Which users visited a particular web page? etc etc. > > What's the best way to model this in HTable? > > Since every HTable is really a distributed hashmap, does that mean I need > to create 3 different HTables (HashMaps) to answer these 3 questions? > > 1) One table with (UserId + WebPageId) as the compound key? (To answer #1) > 2) One table with UserId as the key? (To answer #2) > 3) One table with WebPageId as the key? (To answer #3) > > Along with HTable should I use Hive to run queries such as #1 above? > > Any help in this regard will be greatly appreciated. Thanks. > > >
