Hi,
I am trying to design a high scale key value storage system. The hbase table
for the same is outlined below:
{
"userid1" : {
"update" : {
t3 : "some update1",
t2 : "some update2",
t1 : "some update3"
},
"sender" : {
t3 : "sender3"
t2 : "sender2"
t1 : "sender1"
},
"userid2" : {
"update" : {
t9 : "some update9",
t6 : "some update534",
t1 : "some update343"
},
"sender" : {
t9 : "sender3"
t6 : "sender2"
t1 : "sender1"
},
}
The system is going to have around 15-20M users with around 3-4M put write
operations per day (which rules out mysql automatically). The max number of
entries in "update" and "sender" columns will be around 1000 (around 1
weeks updates)
My queries would be like "For a given userid, return top 20 updates, senders
based on timestamp". Is there a way to make a secondary index on "userid,
timestamp" which can help speed up my "get" calls? Or how can I change my
schema design to minimize response time for get calls ?
Regards,
Piyush Goel
Software Engineer
Yahoo! Software Development India Pvt. Ltd.
Bangalore, India
Ph : +91 80 66949816 (O)
9980616752 (M)
If you're not failing every now and again, it's a sign you're not doing
anything very innovative. - Woody Allen