Re: OperationTimedOut in selerct count statement in cqlsh

Robert Wille Wed, 22 Apr 2015 07:31:37 -0700

Use a counter table to maintain the count so you don’t have to compute it. When 
you do something that affects the count, its generally easy to issue an 
asynchronous query to update the counter in parallel with the actual work. It 
definitely complicates the code, especially if you have a lot of places where 
you do things that affect the count, but generally doesn’t cost much, if 
anything, in terms of performance.


Due to Cassandra’s eventually consistent model and lack atomicity, you need to 
write your code to deal gracefully with the possibility of the counter being 
inaccurate. How hard that is really depends a lot on your data model.

Robert

On Apr 22, 2015, at 8:07 AM, Mich Talebzadeh 
<m...@peridale.co.uk<mailto:m...@peridale.co.uk>> wrote:

Thanks Robert for explanation.

Please correct me if I am wrong.

Currently running a single node cluster of Cassandra. There is the primary key 
on object_id column in both RDBMS and Cassandra.

As you correctly pointed out RDBMS does not need to touch the base table. It 
can just go through the primary key B-tree index to work out the rows


       |ROOT:EMIT Operator (VA = 2)
       |
       |   |SCALAR AGGREGATE Operator (VA = 1)
       |   |  Evaluate Ungrouped COUNT AGGREGATE.
       |   |
       |   |   |SCAN Operator (VA = 0)
       |   |   |  FROM TABLE
       |   |   |  t
       |   |   |  Using Clustered Index.
       |   |   |  Index : t_ui
       |   |   |  Forward Scan.
       |   |   |  Positioning at index start.
       |   |   |  Index contains all needed columns. Base table will not be 
read.
       |   |   |  Using I/O Size 64 Kbytes for index leaf pages.
       |   |   |  With LRU Buffer Replacement Strategy for index leaf pages.


Total estimated I/O cost for statement 1 (at line 1): 144996.


-----------
      300000


Whereas in Cassandra it has to retrieve every row and count the total of the 
rows without sending results back?

What are the other alternatives to make it faster if any?


Cheers,


Mich Talebzadeh

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

From: Robert Wille [mailto:rwi...@fold3.com]
Sent: 22 April 2015 15:00
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: OperationTimedOut in selerct count statement in cqlsh

I should have been more clear. What I meant was that its about the same amount 
of work for the cluster to do a “select count(l)” as it is to do a “select l” 
(unlike in the RDBMS world, where count(l) can use the primary key index). The 
reason why is the coordinator has to retrieve all the rows from all the nodes 
and count them. The only thing you’re saving is that the rows don’t have to be 
sent to the client.

I heard from another Cassandra user that they found “select l" to be faster 
than "select count(l)”. I don’t know why that would be, but I’ve seen stranger 
things.

Robert

On Apr 22, 2015, at 7:49 AM, Mich Talebzadeh 
<m...@peridale.co.uk<mailto:m...@peridale.co.uk>> wrote:


Thanks Robert,

In RDBMS select count(1) basically returns the rows.

1> select count(1) from t
2> go

-----------
      300000

(1 row affected)

Is count(1) fundamentally different in Cassandra?

Does count(1) means return (in my case) 1 three hundred thousand time?

Cheers,


Mich Talebzadeh

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

From: Robert Wille [mailto:rwi...@fold3.com]
Sent: 22 April 2015 14:44
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: OperationTimedOut in selerct count statement in cqlsh

Keep in mind that "select count(l)" and "select l" amount to essentially the 
same thing.

On Apr 22, 2015, at 3:41 AM, Tommy Stendahl 
<tommy.stend...@ericsson.com<mailto:tommy.stend...@ericsson.com>> wrote:



Hi,

Checkout CASSANDRA-8899, my guess is that you have to increase the timeout in 
cqlsh.

/Tommy
On 2015-04-22 11:15, Mich Talebzadeh wrote:
Hi,

I have a table of 300,000 rows.

When I try to do a simple

cqlsh:ase> select count(1) from t;
OperationTimedOut: errors={}, last_host=127.0.0.1

Appreciate any feedback

Thanks,

Mich


NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

Re: OperationTimedOut in selerct count statement in cqlsh

Reply via email to