> ....by adding a column. Sorry, I meant colon ":" -----Original Message----- From: Goel, Ankur [mailto:[EMAIL PROTECTED] Sent: Friday, March 28, 2008 12:01 PM To: [email protected] Subject: RE: HBase Sample Schemas
The tables below are RDBMS tables with column names simply converted to column families by adding a column. I'd like to share ideas on how best these tables can be modified (or merged ??) to take advantage of column oriented design. -----Original Message----- From: Edward J. Yoon [mailto:[EMAIL PROTECTED] Sent: Friday, March 28, 2008 11:48 AM To: [email protected] Subject: Re: HBase Sample Schemas I don't think this is a good example. Find the the difference between the two physical schemas for same logical data modeling of relational database using an relationship tables on RDBMS and a list of column qualifiers on BigTable. On Fri, Mar 28, 2008 at 2:28 PM, Goel, Ankur <[EMAIL PROTECTED]> wrote: > Hi Bryan, > Here is the sample schema I have (looks closer to RDBMS, I > know) > > TABLE: seed_list > > DESCRIPTION: Used to store seed urls (both old and newly discovered). > Initially populated with some seed URLs. The crawl > controller > picks up the seeds from this table that have status=0 (Not > Visited) > or status=2 (Visited, but ready for re-crawl) and > feeds these seeds > in batch to different crawl engines that it knows about. > > SCHEMA: Columns families below > > {"referer_id:", "100"}, // Integer here is Max_Length > {"url:","1500"}, > {"site:","500"}, > {"last_crawl_date:", "1000"}, > {"next_crawl_date:", "1000"}, > {"create_date:","100"}, > {"status:","100"}, > {"strike:", "100"}, > {"language:","150"}, > {"topic:","500"}, > {"depth:","100000"} > > Common attributes are [max versions: 1, compression: NONE, in memory: > false, block cache enabled: true, max length: 100, bloom filter: none] > > > TABLE: web_content > > DESCRIPTION: Used to store information retrived after crawling a URL. > Each crawl engines provides information about URL it > crawled. > This information is then stored in this table depending > upon > the profile settings (what should be stored?) > SCHEMA: Column families below > > {"url:", "1500"}, > {"site:","500"}, > {"content_type:","100"}, > {"title:", "1000"}, > {"content:", Integer.MAX_VALUE + ""}, > {"parsed_text:",Integer.MAX_VALUE + ""}, > {"crawl_date:", "1000"}, > {"last_modified_date:","100"}, > {"http_headers:","10000"}, > {"content_length:","11"}, > {"outlinks_count:","100000"} > > Common attributes are [max versions: 1, compression: BLOCK, in memory: > false, block cache enabled: true, max length: 100, bloom filter: none] > > Please feel free to suggest modifications/enhancements for column > oriented Design. > > Thanks > -Ankur > > > -----Original Message----- > From: Bryan Duxbury [mailto:[EMAIL PROTECTED] > Sent: Friday, March 28, 2008 10:33 AM > To: [email protected] > Subject: HBase Sample Schemas > > All, > > One of the more common types of questions we get from people new to > HBase are about the differences in the schema between HBase and > relational databases. So that we can generate some good examples of > RDBMS schemas and their counterparts as they might be represented in > HBase, could you guys post some small (1-5 entities) schemas that you > might be interested in using and a few sentences about how you'd like > to consume them? We can then discuss possible options and see how > things might look. This will also help Stack, Jim, and myself to > notice interesting access patterns we might want to support. > > Thanks in advance, > > Bryan > -- B. Regards, Edward J. Yoon
