Re: Possibility of going OOM using get_count

2011-09-25 Thread Boris Yen
Hi Aaron, Thanks for the explanation, I know the performance will be varied when the offset is a very large number, just like what has been mentioned on CASSANDRA-261. Even if the users implement the offset on the client side, they suffer the same issues, I just think it would be nice if

Re: Possibility of going OOM using get_count

2011-09-24 Thread aaron morton
The changes in get_count() are designed to stop counts for very large rows running out of memory as they try to hold millions of columns in memory. So if you ask to count all the cols in a row with 1M cols, it will (by default) read the first 1024 columns, and then the next 1024 using the last

Re: Possibility of going OOM using get_count

2011-09-23 Thread Boris Yen
On Fri, Sep 23, 2011 at 12:28 PM, aaron morton aa...@thelastpickle.comwrote: Offsets have been discussed in previously. IIRC the main concerns were either: There is no way to reliably count to start the offset, i.e. we do not lock the row In the new get_count function, cassandra does the

Re: Possibility of going OOM using get_count

2011-09-22 Thread Boris Yen
I was wondering if it is possible to use similar way as CASSANDRA-2894https://issues.apache.org/jira/browse/CASSANDRA-2894 to have the slice_predict support the offset concept? With the offset, it would be much easier to implement the paging from the client side. Boris On Mon, Sep 19, 2011 at

Re: Possibility of going OOM using get_count

2011-09-22 Thread aaron morton
Offsets have been discussed in previously. IIRC the main concerns were either: There is no way to reliably count to start the offset, i.e. we do not lock the row Or performance related in, as there is not a reliable way to skip 10,000 columns other than counting 10,000 columns. With a start

Re: Possibility of going OOM using get_count

2011-09-19 Thread Tharindu Mathew
On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud ben...@noisette.ch wrote: The workaround for 0.7 is calling get_slice and count on client side. It's heavier, sure, but you will then be able to set start column accordingly. I was afraid of that :( Will follow that method. Thanks.

Re: Possibility of going OOM using get_count

2011-09-19 Thread aaron morton
get_count() supports the same predicate as get_slice. So you can implement the paging yourself. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote: On Mon, Sep 19, 2011 at 12:40

Re: Possibility of going OOM using get_count

2011-09-19 Thread Jonathan Ellis
Unfortunately no, because you don't know what the actual last-column-counted was. On Mon, Sep 19, 2011 at 4:25 AM, aaron morton aa...@thelastpickle.com wrote: get_count() supports the same predicate as get_slice. So you can implement the paging yourself. Cheers - Aaron Morton

Re: Possibility of going OOM using get_count

2011-09-19 Thread Tharindu Mathew
Yes, Aaron that self implemented paging is what I'm trying. Jonathan, the last column read in the previous result fetched is the starting column of the next iteration. The end column remains constant. This is using slice ranges. Afaiu, that should work. Regards, Tharindu Sent from my iPhone

Possibility of going OOM using get_count

2011-09-18 Thread Tharindu Mathew
Hi everyone, I noticed this line in the API docs, The method is not O(1). It takes all the columns from disk to calculate the answer. The only benefit of the method is that you do not need to pull all the columns over Thrift interface to count them. Does this mean if a row has a large number of

Re: Possibility of going OOM using get_count

2011-09-18 Thread aaron morton
yes. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 19/09/2011, at 7:16 AM, Tharindu Mathew wrote: Hi everyone, I noticed this line in the API docs, The method is not O(1). It takes all the columns from disk to calculate the

Re: Possibility of going OOM using get_count

2011-09-18 Thread Jake Luciani
This is fixed in 1.0 https://issues.apache.org/jira/browse/CASSANDRA-2894 On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew mcclou...@gmail.comwrote: Hi everyone, I noticed this line in the API docs, The method is not O(1). It takes all the columns from disk to calculate the answer. The

Re: Possibility of going OOM using get_count

2011-09-18 Thread Tharindu Mathew
Thanks Aaron and Jake for the replies. Any chance of a possible workaround to use for Cassandra 0.7? On Mon, Sep 19, 2011 at 3:48 AM, aaron morton aa...@thelastpickle.comwrote: Cool Thanks, A - Aaron Morton Freelance Cassandra Developer @aaronmorton