Hi Piyush, I think you just wanna fetch the most recent 20 updates for a user, do you? If so, you can just use versions for the updates, and let hbase keep only 20 versions, IMO. How about?
sincerely, Evan 2009/7/13 Piyush Goel <[email protected]>: > Hi, >> >> >> I am trying to design a high scale key value storage system. The hbase >> table for the same is outlined below: >> >> { >> "userid1" : { >> "update" : { >> t3 : "some update1", >> t2 : "some update2", >> t1 : "some update3" >> }, >> "sender" : { >> t3 : "sender3" >> t2 : "sender2" >> t1 : "sender1" >> }, >> >> "userid2" : { >> "update" : { >> t9 : "some update9", >> t6 : "some update534", >> t1 : "some update343" >> }, >> "sender" : { >> t9 : "sender3" >> t6 : "sender2" >> t1 : "sender1" >> }, >> >> >> } >> >> The system is going to have around 15-20M users with around 3-4M put write >> operations per day (which rules out mysql automatically). The max number of >> entries in "update" and "sender" columns will be around 1000 (around 1 >> weeks updates) >> >> My queries would be like "For a given userid, return top 20 updates, >> senders based on timestamp". Is there a way to make a secondary index on >> "userid, timestamp" which can help speed up my "get" calls? Or how can I >> change my schema design to minimize response time for get calls ? >> >> >> thanks, > > piyush >
