Bloom filters in HBase, as they are currently designed, aren't a construct that users have to interact with directly. All retrieval operations take advantage of a bloom filter if it is configured.

-Bryan

On Mar 28, 2008, at 6:28 AM, Jim R. Wilson wrote:

Thanks Ankur!

Those are very helpful - finding example schemas has been a really
sore point for me as well in trying to learn all this.

I was wondering if you had an example that defined a bloom filter for
a column, and an example on how to query a bloom filter once it's set
up (shell example or rest example if possible).

Thanks again!

-- Jim R. Wilson (jimbojw)

On Fri, Mar 28, 2008 at 1:33 AM, Goel, Ankur <[EMAIL PROTECTED]> wrote:

....by adding a column.
 Sorry, I meant colon ":"


 -----Original Message-----
 From: Goel, Ankur [mailto:[EMAIL PROTECTED]
 Sent: Friday, March 28, 2008 12:01 PM
 To: [email protected]


Subject: RE: HBase Sample Schemas

The tables below are RDBMS tables with column names simply converted to
 column families by adding a column.
 I'd like to share ideas on how best these tables can be modified (or
 merged ??) to take advantage of column oriented design.

 -----Original Message-----
 From: Edward J. Yoon [mailto:[EMAIL PROTECTED]
 Sent: Friday, March 28, 2008 11:48 AM
 To: [email protected]
 Subject: Re: HBase Sample Schemas

 I don't think this is a good example.

 Find the the difference between the two physical schemas for same
 logical data modeling of relational database using an relationship
 tables on RDBMS and a list of column qualifiers on BigTable.

On Fri, Mar 28, 2008 at 2:28 PM, Goel, Ankur <[EMAIL PROTECTED]>
 wrote:
Hi Bryan,
        Here is the sample schema I have (looks closer to RDBMS, I
know)

TABLE:           seed_list

DESCRIPTION: Used to store seed urls (both old and newly discovered).
            Initially populated with some seed URLs. The crawl
controller
picks up the seeds from this table that have status=0 (Not
Visited)
                or status=2 (Visited, but ready for re-crawl) and
feeds these seeds
            in batch to different crawl engines that it knows about.

SCHEMA:      Columns families below

         {"referer_id:", "100"}, // Integer here is Max_Length
       {"url:","1500"},
       {"site:","500"},
       {"last_crawl_date:", "1000"},
       {"next_crawl_date:", "1000"},
       {"create_date:","100"},
       {"status:","100"},
       {"strike:", "100"},
       {"language:","150"},
       {"topic:","500"},
       {"depth:","100000"}

Common attributes are [max versions: 1, compression: NONE, in memory: false, block cache enabled: true, max length: 100, bloom filter: none]


TABLE:   web_content

DESCRIPTION: Used to store information retrived after crawling a URL.
            Each crawl engines provides information about URL it
crawled.
            This information is then stored in this table depending
upon
            the profile settings (what should be stored?)
SCHEMA:  Column families below

           {"url:", "1500"},
         {"site:","500"},
         {"content_type:","100"},
         {"title:", "1000"},
         {"content:", Integer.MAX_VALUE + ""},
         {"parsed_text:",Integer.MAX_VALUE + ""},
         {"crawl_date:", "1000"},
         {"last_modified_date:","100"},
         {"http_headers:","10000"},
         {"content_length:","11"},
         {"outlinks_count:","100000"}

Common attributes are [max versions: 1,  compression: BLOCK, in
 memory:
false, block cache enabled: true, max length: 100, bloom filter: none]

Please feel free to suggest modifications/enhancements for column
oriented Design.

Thanks
-Ankur


-----Original Message-----
From: Bryan Duxbury [mailto:[EMAIL PROTECTED]
Sent: Friday, March 28, 2008 10:33 AM
To: [email protected]
Subject: HBase Sample Schemas

All,

One of the more common types of questions we get from people new to
HBase are about the differences in the schema between HBase and
relational databases. So that we can generate some good examples of
RDBMS schemas and their counterparts as they might be represented in
HBase, could you guys post some small (1-5 entities) schemas that you might be interested in using and a few sentences about how you'd like
to consume them? We can then discuss possible options and see how
things might look. This will also help Stack, Jim, and myself to
notice interesting access patterns we might want to support.

Thanks in advance,

Bryan




 --
 B. Regards,
 Edward J. Yoon


Reply via email to