Re: [Dev] Cassandra Explorer Timeouts with high volume of data

Shelan Perera Mon, 03 Sep 2012 00:07:54 -0700

Hi,

I have not committed the changes . I could get it working with a delay (
could resolve the axis2 timeout issue). I need to add a fix to search which
is related to this.
Will commit changes asap.


Thanks

On Fri, Aug 31, 2012 at 2:00 AM, Tharindu Mathew <[email protected]> wrote:

> Is the batch wise processing implemented or is all the data fetched still?
>
>
> On Fri, Aug 24, 2012 at 6:20 PM, Anjana Fernando <[email protected]> wrote:
>
>> Hi,
>>
>> IMO, we just should not be doing the counting of the records. So
>> basically, you will not get the record count by default. But we can add it
>> as an option for the Cassandra explorer, to get the whole row count of a CF
>> when the user request it, i.e. pressing a button. So with this, pagination
>> will not be supported by default, and just give the user to advance a batch
>> of records with a "next" button. And also, it would be great, if we can
>> integrate CQL support for the explorer, so the user can filter the data in
>> a specific way and get the resultant records.
>>
>> Cheers,
>> Anjana.
>>
>> On Fri, Aug 24, 2012 at 6:01 PM, Shelan Perera <[email protected]> wrote:
>>
>>> Hi,
>>>
>>>  Cassandra Explorer has been testing with 7 million row entries with BAM
>>> data and it gives timeout errors with such a load. The main reason for this
>>> is calculating the total no of rows
>>> to show how many row entries are available and to enable full numbered
>>> pagination. In Cassandra calculating total no of Rows is an anti pattern
>>> but that is the key information which is used
>>> heavily to verify inserted data by application. Almost all the available
>>> tools are using a limit such as 10000 rows as the limit and not going for
>>> total records.
>>>
>>> I have tried fetching records as batches (10000 each and 100,000 each on
>>> different occasions) but to complete a limit like 7 million it takes a
>>> considerable time. When i googled they have advised it is not a good
>>> idea to calculate the total row count as it can take really a long time
>>> to fetch all the records in a cluster and recommended to load off it to
>>> something such as a map reduce job.
>>>
>>> to fetch 100,000 records it took 2.73 Seconds. So all together it takes
>>> around 191 seconds to complete it.
>>>
>>> What would be the best way to overcome this ?
>>>
>>> Thanks
>>>
>>> --
>>> *Shelan Perera*
>>>
>>> Software Engineer
>>> **
>>> *WSO2, Inc. : wso2.com*
>>> lean.enterprise.middleware.
>>>
>>> *Home Page*  :    shelan.org
>>> *Blog*             : blog.shelan.org
>>> *Linked-i*n      :http://www.linkedin.com/pub/shelan-perera/a/194/465
>>> *Twitter*         :https://twitter.com/#!/shelan
>>>
>>> *Mobile*          : +94 772 604 402
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list
>>> [email protected]
>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>
>>>
>>
>>
>> --
>> *Anjana Fernando*
>> Associate Technical Lead
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>> _______________________________________________
>> Dev mailing list
>> [email protected]
>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>
>>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
> M: +94777759908
>
>


-- 
*Shelan Perera*

Software Engineer
**
*WSO2, Inc. : wso2.com*
lean.enterprise.middleware.

*Home Page*  :    shelan.org
*Blog*             : blog.shelan.org
*Linked-i*n      :http://www.linkedin.com/pub/shelan-perera/a/194/465
*Twitter*         :https://twitter.com/#!/shelan

*Mobile*          : +94 772 604 402

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Cassandra Explorer Timeouts with high volume of data

Reply via email to