RE: HBase Sample Schemas

Goel, Ankur Thu, 27 Mar 2008 23:34:08 -0700

 
> ....by adding a column.
Sorry, I meant colon ":"

-----Original Message-----
From: Goel, Ankur [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 28, 2008 12:01 PM
To: [email protected]
Subject: RE: HBase Sample Schemas


The tables below are RDBMS tables with column names simply converted to
column families by adding a column.
I'd like to share ideas on how best these tables can be modified (or
merged ??) to take advantage of column oriented design.

-----Original Message-----
From: Edward J. Yoon [mailto:[EMAIL PROTECTED]
Sent: Friday, March 28, 2008 11:48 AM
To: [email protected]
Subject: Re: HBase Sample Schemas

I don't think this is a good example.

Find the the difference between the two physical schemas for same
logical data modeling of relational database using an relationship
tables on RDBMS and a list of column qualifiers on BigTable.

On Fri, Mar 28, 2008 at 2:28 PM, Goel, Ankur <[EMAIL PROTECTED]>
wrote:
> Hi Bryan,
>         Here is the sample schema I have (looks closer to RDBMS, I
> know)
>
> TABLE:           seed_list
>
> DESCRIPTION: Used to store seed urls (both old and newly discovered).
>             Initially populated with some seed URLs. The crawl 
> controller
>             picks up the seeds from this table that have status=0 (Not
> Visited)
>                 or status=2 (Visited, but ready for re-crawl) and 
> feeds these seeds
>             in batch to different crawl engines that it knows about.
>
> SCHEMA:      Columns families below
>
>          {"referer_id:", "100"}, // Integer here is Max_Length
>        {"url:","1500"},
>        {"site:","500"},
>        {"last_crawl_date:", "1000"},
>        {"next_crawl_date:", "1000"},
>        {"create_date:","100"},
>        {"status:","100"},
>        {"strike:", "100"},
>        {"language:","150"},
>        {"topic:","500"},
>        {"depth:","100000"}
>
> Common attributes are [max versions: 1,  compression: NONE, in memory:
> false, block cache enabled: true, max length: 100, bloom filter: none]
>
>
> TABLE:   web_content
>
> DESCRIPTION: Used to store information retrived after crawling a URL.
>             Each crawl engines provides information about URL it 
> crawled.
>             This information is then stored in this table depending 
> upon
>             the profile settings (what should be stored?)
> SCHEMA:  Column families below
>
>            {"url:", "1500"},
>          {"site:","500"},
>          {"content_type:","100"},
>          {"title:", "1000"},
>          {"content:", Integer.MAX_VALUE + ""},
>          {"parsed_text:",Integer.MAX_VALUE + ""},
>          {"crawl_date:", "1000"},
>          {"last_modified_date:","100"},
>          {"http_headers:","10000"},
>          {"content_length:","11"},
>          {"outlinks_count:","100000"}
>
> Common attributes are [max versions: 1,  compression: BLOCK, in
memory:
> false, block cache enabled: true, max length: 100, bloom filter: none]
>
> Please feel free to suggest modifications/enhancements for column 
> oriented Design.
>
> Thanks
> -Ankur
>
>
> -----Original Message-----
> From: Bryan Duxbury [mailto:[EMAIL PROTECTED]
> Sent: Friday, March 28, 2008 10:33 AM
> To: [email protected]
> Subject: HBase Sample Schemas
>
> All,
>
> One of the more common types of questions we get from people new to 
> HBase are about the differences in the schema between HBase and 
> relational databases. So that we can generate some good examples of 
> RDBMS schemas and their counterparts as they might be represented in 
> HBase, could you guys post some small (1-5 entities) schemas that you 
> might be interested in using and a few sentences about how you'd like 
> to consume them? We can then discuss possible options and see how 
> things might look. This will also help Stack, Jim, and myself to 
> notice interesting access patterns we might want to support.
>
> Thanks in advance,
>
> Bryan
>



--
B. Regards,
Edward J. Yoon

RE: HBase Sample Schemas

Reply via email to