Thanks Ankur! Those are very helpful - finding example schemas has been a really sore point for me as well in trying to learn all this.
I was wondering if you had an example that defined a bloom filter for a column, and an example on how to query a bloom filter once it's set up (shell example or rest example if possible). Thanks again! -- Jim R. Wilson (jimbojw) On Fri, Mar 28, 2008 at 1:33 AM, Goel, Ankur <[EMAIL PROTECTED]> wrote: > > > ....by adding a column. > Sorry, I meant colon ":" > > > -----Original Message----- > From: Goel, Ankur [mailto:[EMAIL PROTECTED] > Sent: Friday, March 28, 2008 12:01 PM > To: [email protected] > > > Subject: RE: HBase Sample Schemas > > The tables below are RDBMS tables with column names simply converted to > column families by adding a column. > I'd like to share ideas on how best these tables can be modified (or > merged ??) to take advantage of column oriented design. > > -----Original Message----- > From: Edward J. Yoon [mailto:[EMAIL PROTECTED] > Sent: Friday, March 28, 2008 11:48 AM > To: [email protected] > Subject: Re: HBase Sample Schemas > > I don't think this is a good example. > > Find the the difference between the two physical schemas for same > logical data modeling of relational database using an relationship > tables on RDBMS and a list of column qualifiers on BigTable. > > On Fri, Mar 28, 2008 at 2:28 PM, Goel, Ankur <[EMAIL PROTECTED]> > wrote: > > Hi Bryan, > > Here is the sample schema I have (looks closer to RDBMS, I > > know) > > > > TABLE: seed_list > > > > DESCRIPTION: Used to store seed urls (both old and newly discovered). > > Initially populated with some seed URLs. The crawl > > controller > > picks up the seeds from this table that have status=0 (Not > > Visited) > > or status=2 (Visited, but ready for re-crawl) and > > feeds these seeds > > in batch to different crawl engines that it knows about. > > > > SCHEMA: Columns families below > > > > {"referer_id:", "100"}, // Integer here is Max_Length > > {"url:","1500"}, > > {"site:","500"}, > > {"last_crawl_date:", "1000"}, > > {"next_crawl_date:", "1000"}, > > {"create_date:","100"}, > > {"status:","100"}, > > {"strike:", "100"}, > > {"language:","150"}, > > {"topic:","500"}, > > {"depth:","100000"} > > > > Common attributes are [max versions: 1, compression: NONE, in memory: > > false, block cache enabled: true, max length: 100, bloom filter: none] > > > > > > TABLE: web_content > > > > DESCRIPTION: Used to store information retrived after crawling a URL. > > Each crawl engines provides information about URL it > > crawled. > > This information is then stored in this table depending > > upon > > the profile settings (what should be stored?) > > SCHEMA: Column families below > > > > {"url:", "1500"}, > > {"site:","500"}, > > {"content_type:","100"}, > > {"title:", "1000"}, > > {"content:", Integer.MAX_VALUE + ""}, > > {"parsed_text:",Integer.MAX_VALUE + ""}, > > {"crawl_date:", "1000"}, > > {"last_modified_date:","100"}, > > {"http_headers:","10000"}, > > {"content_length:","11"}, > > {"outlinks_count:","100000"} > > > > Common attributes are [max versions: 1, compression: BLOCK, in > memory: > > false, block cache enabled: true, max length: 100, bloom filter: none] > > > > Please feel free to suggest modifications/enhancements for column > > oriented Design. > > > > Thanks > > -Ankur > > > > > > -----Original Message----- > > From: Bryan Duxbury [mailto:[EMAIL PROTECTED] > > Sent: Friday, March 28, 2008 10:33 AM > > To: [email protected] > > Subject: HBase Sample Schemas > > > > All, > > > > One of the more common types of questions we get from people new to > > HBase are about the differences in the schema between HBase and > > relational databases. So that we can generate some good examples of > > RDBMS schemas and their counterparts as they might be represented in > > HBase, could you guys post some small (1-5 entities) schemas that you > > might be interested in using and a few sentences about how you'd like > > to consume them? We can then discuss possible options and see how > > things might look. This will also help Stack, Jim, and myself to > > notice interesting access patterns we might want to support. > > > > Thanks in advance, > > > > Bryan > > > > > > -- > B. Regards, > Edward J. Yoon >
