Hi,

 Cassandra Explorer has been testing with 7 million row entries with BAM
data and it gives timeout errors with such a load. The main reason for this
is calculating the total no of rows
to show how many row entries are available and to enable full numbered
pagination. In Cassandra calculating total no of Rows is an anti pattern
but that is the key information which is used
heavily to verify inserted data by application. Almost all the available
tools are using a limit such as 10000 rows as the limit and not going for
total records.

I have tried fetching records as batches (10000 each and 100,000 each on
different occasions) but to complete a limit like 7 million it takes a
considerable time. When i googled they have advised it is not a good
idea to calculate the total row count as it can take really a long time to
fetch all the records in a cluster and recommended to load off it to
something such as a map reduce job.

to fetch 100,000 records it took 2.73 Seconds. So all together it takes
around 191 seconds to complete it.

What would be the best way to overcome this ?

Thanks

-- 
*Shelan Perera*

Software Engineer
**
*WSO2, Inc. : wso2.com*
lean.enterprise.middleware.

*Home Page*  :    shelan.org
*Blog*             : blog.shelan.org
*Linked-i*n      :http://www.linkedin.com/pub/shelan-perera/a/194/465
*Twitter*         :https://twitter.com/#!/shelan

*Mobile*          : +94 772 604 402
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to