Re: Feed Aggregator Schema

Andrei Savu Mon, 17 Aug 2009 01:30:50 -0700

Thanks for your answer Peter.

I will give it a try using this approach and I will let you know how it works.


On Mon, Aug 17, 2009 at 10:26 AM, Peter
Rietzler<[email protected]> wrote:
>
> Hi
>
> In our project we are handling event lists where we have similar
> requirements. We do ordering by choosing our row keys wisely. We use the
> following key for our events (they should be ordered by time in ascending
> order):
>
> eventListName/yyyyMMddHHmmssSSS-000[-111]
>
> where eventListName is the name of the event list and 000 is a three digit
> instance id to disambiguate between different running instances of
> application, and -111 is optional to disambiguate events that occured in the
> same millisecond on one instance.
>
> We additionally insert and artifical row for each day with the id
>
> eventListName/yyyyMMddHHmmssSSS
>
> This allows us to start scanning at the beginning of each day without
> searching through the event list.
>
> You need to be aware of the fact that if you have a very high load of
> inserts, then always one hbase region server is busy inserting while the
> others are idle ... if that's a problem for you, you have to find different
> keys for your purpose.
>
> You could also use an HBase index table but I have no experience with it and
> I remember an email on the mailing list that this would double all requests
> because the API would first lookup the index table and then the original
> table ??? (please correct me if this is not right ...)
>
> Kind regards,
> Peter
>
>
>
> Andrei Savu wrote:
>>
>> Hello,
>>
>> I am working on a project involving monitoring a large number of
>> rss/atom feeds. I want to use hbase for data storage and I have some
>> problems designing the schema. For the first iteration I want to be
>> able to generate an aggregated feed (last 100 posts from all feeds in
>> reverse chronological order).
>>
>> Currently I am using two tables:
>>
>> Feeds: column families Content and Meta : raw feed stored in Content:raw
>> Urls: column families Content and Meta : raw post version stored in
>> Content:raw and the rest of the data found in RSS stored in Meta
>>
>> I need some sort of index table for the aggregated feed. How should I
>> build that? Is hbase a good choice for this kind of application?
>>
>> In other words: Is it possible( in hbase) to design a schema that
>> could efficiently answer to queries like the one listed bellow?
>>
>> SELECT data FROM Urls ORDER BY date DESC LIMIT 100
>>
>> Thanks.
>>
>> --
>> Savu Andrei
>>
>> Website: http://www.andreisavu.ro/
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Feed-Aggregator-Schema-tp24974071p25002264.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>



-- 
Savu Andrei

Website: http://www.andreisavu.ro/

Re: Feed Aggregator Schema

Reply via email to