Hi, Cassandra Explorer has been testing with 7 million row entries with BAM data and it gives timeout errors with such a load. The main reason for this is calculating the total no of rows to show how many row entries are available and to enable full numbered pagination. In Cassandra calculating total no of Rows is an anti pattern but that is the key information which is used heavily to verify inserted data by application. Almost all the available tools are using a limit such as 10000 rows as the limit and not going for total records.
I have tried fetching records as batches (10000 each and 100,000 each on different occasions) but to complete a limit like 7 million it takes a considerable time. When i googled they have advised it is not a good idea to calculate the total row count as it can take really a long time to fetch all the records in a cluster and recommended to load off it to something such as a map reduce job. to fetch 100,000 records it took 2.73 Seconds. So all together it takes around 191 seconds to complete it. What would be the best way to overcome this ? Thanks -- *Shelan Perera* Software Engineer ** *WSO2, Inc. : wso2.com* lean.enterprise.middleware. *Home Page* : shelan.org *Blog* : blog.shelan.org *Linked-i*n :http://www.linkedin.com/pub/shelan-perera/a/194/465 *Twitter* :https://twitter.com/#!/shelan *Mobile* : +94 772 604 402
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
