Hi,

 

I have following table structure in Cassandra 1.2.1:

 

*       TimeStamp
*       MACAddress
*       Data Transfer
*       LocationID

 

*       Primary KEY(TimeStamp, MacAddress)              // Composite key,
partitioned on TimeStamp

 

 

There are close to 500K different MAC Address, and 10K timestamps. So a
total of 5 billion records are there. Each record is 50 bytes, so total size
of the data is 250 GB. I have a 4 node cluster with no replication where all
this data is stored.

 

Here are my requirements:

 

*       I want to retrieve all the records for a particular timestamp real
quick (say < 10 seconds). The above mentioned numbers mean that a total of
500 K entries would be retrieved, which is equal to around 25 MB of data. I
think this is only possible if there's partition on TimeStamp
*       I want to fetch all the data for a particular MAC Address in real
time (say < 5 seconds). The above mentioned numbers mean that a total of 10K
entries (~.5 MB) will be there per MACAddress.

 

While I'm able to access the first query in acceptable time limit, the
second query takes lot of time to return. The way this could be improved is
by addition of an index on MacAddress. However, I found out that if the PKEY
is (TimeStamp, MacAddress), then it is not possible to index on MacAddress. 

 

Keeping in mind the above requirements, is there any thing that I could do
in Cassandra to achieve both the requirements?

 

Thanks

Pushkar

Reply via email to