Hello,

I am working on a project involving monitoring a large number of
rss/atom feeds. I want to use hbase for data storage and I have some
problems designing the schema. For the first iteration I want to be
able to generate an aggregated feed (last 100 posts from all feeds in
reverse chronological order).

Currently I am using two tables:

Feeds: column families Content and Meta : raw feed stored in Content:raw
Urls: column families Content and Meta : raw post version stored in
Content:raw and the rest of the data found in RSS stored in Meta

I need some sort of index table for the aggregated feed. How should I
build that? Is hbase a good choice for this kind of application?

In other words: Is it possible( in hbase) to design a schema that
could efficiently answer to queries like the one listed bellow?

SELECT data FROM Urls ORDER BY date DESC LIMIT 100

Thanks.

--
Savu Andrei

Website: http://www.andreisavu.ro/

Reply via email to