Thanks for your answer Peter. I will give it a try using this approach and I will let you know how it works.
On Mon, Aug 17, 2009 at 10:26 AM, Peter Rietzler<[email protected]> wrote: > > Hi > > In our project we are handling event lists where we have similar > requirements. We do ordering by choosing our row keys wisely. We use the > following key for our events (they should be ordered by time in ascending > order): > > eventListName/yyyyMMddHHmmssSSS-000[-111] > > where eventListName is the name of the event list and 000 is a three digit > instance id to disambiguate between different running instances of > application, and -111 is optional to disambiguate events that occured in the > same millisecond on one instance. > > We additionally insert and artifical row for each day with the id > > eventListName/yyyyMMddHHmmssSSS > > This allows us to start scanning at the beginning of each day without > searching through the event list. > > You need to be aware of the fact that if you have a very high load of > inserts, then always one hbase region server is busy inserting while the > others are idle ... if that's a problem for you, you have to find different > keys for your purpose. > > You could also use an HBase index table but I have no experience with it and > I remember an email on the mailing list that this would double all requests > because the API would first lookup the index table and then the original > table ??? (please correct me if this is not right ...) > > Kind regards, > Peter > > > > Andrei Savu wrote: >> >> Hello, >> >> I am working on a project involving monitoring a large number of >> rss/atom feeds. I want to use hbase for data storage and I have some >> problems designing the schema. For the first iteration I want to be >> able to generate an aggregated feed (last 100 posts from all feeds in >> reverse chronological order). >> >> Currently I am using two tables: >> >> Feeds: column families Content and Meta : raw feed stored in Content:raw >> Urls: column families Content and Meta : raw post version stored in >> Content:raw and the rest of the data found in RSS stored in Meta >> >> I need some sort of index table for the aggregated feed. How should I >> build that? Is hbase a good choice for this kind of application? >> >> In other words: Is it possible( in hbase) to design a schema that >> could efficiently answer to queries like the one listed bellow? >> >> SELECT data FROM Urls ORDER BY date DESC LIMIT 100 >> >> Thanks. >> >> -- >> Savu Andrei >> >> Website: http://www.andreisavu.ro/ >> >> > > -- > View this message in context: > http://www.nabble.com/Feed-Aggregator-Schema-tp24974071p25002264.html > Sent from the HBase User mailing list archive at Nabble.com. > > -- Savu Andrei Website: http://www.andreisavu.ro/
