Let's say I have an external job (MR, pig, etc) sorting a cassandra table
by some complicated mechanism.
We want to store the sorted records BACK into cassandra so that clients can
read the records sorted.
What I was just thinking of doing was storing the records as pages.
So page 0 would have
What you show is basically the idea of bucketing data. One bucket = one
physical partition. Within each bucket, there is a fixed number of column
(1000 in your example).
This strategy works fine and avoid too large partition. The only draw back
I would see is the need to fetch data over buckets