I see - i'm trying to figure out if my use-case is valid, any help is appreciated :)
I was thinking about using a bloom filtered column for user-based blacklisting. So I'd have a table, and concatenate say the user_id with a URL or domain that they're blacklisting and store that in a column. Then, when I want to test if a user has blacklisted a URL, concatenate it the user_id/domain as before, then check against the bloom filter for that entry. I guess in this case I'd just make the concatenation the primary key in a special table just for blacklistification? Thanks for helping me understand this stuff. I have a solid grasp on what bloom filters are at the datastructure level, I'm trying to understand how they can be used/queried within the context of hbase. Thanks in advance! -- Jim On Fri, Mar 28, 2008 at 9:42 AM, Bryan Duxbury <[EMAIL PROTECTED]> wrote: > Bloom filters in HBase, as they are currently designed, aren't a > construct that users have to interact with directly. All retrieval > operations take advantage of a bloom filter if it is configured. > > -Bryan > > > > On Mar 28, 2008, at 6:28 AM, Jim R. Wilson wrote: > > > Thanks Ankur! > > > > Those are very helpful - finding example schemas has been a really > > sore point for me as well in trying to learn all this. > > > > I was wondering if you had an example that defined a bloom filter for > > a column, and an example on how to query a bloom filter once it's set > > up (shell example or rest example if possible). > > > > Thanks again! > > > > -- Jim R. Wilson (jimbojw) > > > > On Fri, Mar 28, 2008 at 1:33 AM, Goel, Ankur > > <[EMAIL PROTECTED]> wrote: > >> > >>> ....by adding a column. > >> Sorry, I meant colon ":" > >> > >> > >> -----Original Message----- > >> From: Goel, Ankur [mailto:[EMAIL PROTECTED] > >> Sent: Friday, March 28, 2008 12:01 PM > >> To: [email protected] > >> > >> > >> Subject: RE: HBase Sample Schemas > >> > >> The tables below are RDBMS tables with column names simply > >> converted to > >> column families by adding a column. > >> I'd like to share ideas on how best these tables can be modified (or > >> merged ??) to take advantage of column oriented design. > >> > >> -----Original Message----- > >> From: Edward J. Yoon [mailto:[EMAIL PROTECTED] > >> Sent: Friday, March 28, 2008 11:48 AM > >> To: [email protected] > >> Subject: Re: HBase Sample Schemas > >> > >> I don't think this is a good example. > >> > >> Find the the difference between the two physical schemas for same > >> logical data modeling of relational database using an relationship > >> tables on RDBMS and a list of column qualifiers on BigTable. > >> > >> On Fri, Mar 28, 2008 at 2:28 PM, Goel, Ankur > >> <[EMAIL PROTECTED]> > >> wrote: > >>> Hi Bryan, > >>> Here is the sample schema I have (looks closer to RDBMS, I > >>> know) > >>> > >>> TABLE: seed_list > >>> > >>> DESCRIPTION: Used to store seed urls (both old and newly > >>> discovered). > >>> Initially populated with some seed URLs. The crawl > >>> controller > >>> picks up the seeds from this table that have status=0 > >>> (Not > >>> Visited) > >>> or status=2 (Visited, but ready for re-crawl) and > >>> feeds these seeds > >>> in batch to different crawl engines that it knows about. > >>> > >>> SCHEMA: Columns families below > >>> > >>> {"referer_id:", "100"}, // Integer here is Max_Length > >>> {"url:","1500"}, > >>> {"site:","500"}, > >>> {"last_crawl_date:", "1000"}, > >>> {"next_crawl_date:", "1000"}, > >>> {"create_date:","100"}, > >>> {"status:","100"}, > >>> {"strike:", "100"}, > >>> {"language:","150"}, > >>> {"topic:","500"}, > >>> {"depth:","100000"} > >>> > >>> Common attributes are [max versions: 1, compression: NONE, in > >>> memory: > >>> false, block cache enabled: true, max length: 100, bloom filter: > >>> none] > >>> > >>> > >>> TABLE: web_content > >>> > >>> DESCRIPTION: Used to store information retrived after crawling a > >>> URL. > >>> Each crawl engines provides information about URL it > >>> crawled. > >>> This information is then stored in this table depending > >>> upon > >>> the profile settings (what should be stored?) > >>> SCHEMA: Column families below > >>> > >>> {"url:", "1500"}, > >>> {"site:","500"}, > >>> {"content_type:","100"}, > >>> {"title:", "1000"}, > >>> {"content:", Integer.MAX_VALUE + ""}, > >>> {"parsed_text:",Integer.MAX_VALUE + ""}, > >>> {"crawl_date:", "1000"}, > >>> {"last_modified_date:","100"}, > >>> {"http_headers:","10000"}, > >>> {"content_length:","11"}, > >>> {"outlinks_count:","100000"} > >>> > >>> Common attributes are [max versions: 1, compression: BLOCK, in > >> memory: > >>> false, block cache enabled: true, max length: 100, bloom filter: > >>> none] > >>> > >>> Please feel free to suggest modifications/enhancements for column > >>> oriented Design. > >>> > >>> Thanks > >>> -Ankur > >>> > >>> > >>> -----Original Message----- > >>> From: Bryan Duxbury [mailto:[EMAIL PROTECTED] > >>> Sent: Friday, March 28, 2008 10:33 AM > >>> To: [email protected] > >>> Subject: HBase Sample Schemas > >>> > >>> All, > >>> > >>> One of the more common types of questions we get from people new to > >>> HBase are about the differences in the schema between HBase and > >>> relational databases. So that we can generate some good examples of > >>> RDBMS schemas and their counterparts as they might be represented in > >>> HBase, could you guys post some small (1-5 entities) schemas that > >>> you > >>> might be interested in using and a few sentences about how you'd > >>> like > >>> to consume them? We can then discuss possible options and see how > >>> things might look. This will also help Stack, Jim, and myself to > >>> notice interesting access patterns we might want to support. > >>> > >>> Thanks in advance, > >>> > >>> Bryan > >>> > >> > >> > >> > >> -- > >> B. Regards, > >> Edward J. Yoon > >> > >
