Hello, I am working on a project involving monitoring a large number of rss/atom feeds. I want to use hbase for data storage and I have some problems designing the schema. For the first iteration I want to be able to generate an aggregated feed (last 100 posts from all feeds in reverse chronological order).
Currently I am using two tables: Feeds: column families Content and Meta : raw feed stored in Content:raw Urls: column families Content and Meta : raw post version stored in Content:raw and the rest of the data found in RSS stored in Meta I need some sort of index table for the aggregated feed. How should I build that? Is hbase a good choice for this kind of application? In other words: Is it possible( in hbase) to design a schema that could efficiently answer to queries like the one listed bellow? SELECT data FROM Urls ORDER BY date DESC LIMIT 100 Thanks. -- Savu Andrei Website: http://www.andreisavu.ro/
