Storing globally sorted data

2014-05-16 Thread Kevin Burton
Let's say I have an external job (MR, pig, etc) sorting a cassandra table by some complicated mechanism. We want to store the sorted records BACK into cassandra so that clients can read the records sorted. What I was just thinking of doing was storing the records as pages. So page 0 would have

Re: Storing globally sorted data

2014-05-16 Thread DuyHai Doan
What you show is basically the idea of bucketing data. One bucket = one physical partition. Within each bucket, there is a fixed number of column (1000 in your example). This strategy works fine and avoid too large partition. The only draw back I would see is the need to fetch data over buckets