Re: 1, 2, 3...

Emīls Šolmanis Mon, 11 Apr 2016 09:15:57 -0700

You're not mistaken, just thought you were after partition keys and didn't
read the question that carefully. Afaik, you're SOOL if you need to
distinguish clustering keys as unique. Well, other than doing a full table
scan of course, which I'm assuming is not too plausible.


On Mon, 11 Apr 2016 at 16:52 Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Unless I'm mistaken, nodetool tablestats gives you the number of
> partitions (partition keys), not the number of primary keys. IOW, the term
> "keys" is ambiguous. That's why I phrased the original question as count of
> (CQL) rows, to distinguish from the pre-CQL3 concept of a partition being
> treated as a single row.
>
> -- Jack Krupansky
>
> On Mon, Apr 11, 2016 at 11:46 AM, Emīls Šolmanis <emils.solma...@gmail.com
> > wrote:
>
>> Wouldn't the "number of keys" part of *nodetool cfstats* run on every
>> node, summed and divided by replication factor give you a decent
>> approximation? Or are you really after a completely precise number?
>>
>> On Mon, 11 Apr 2016 at 16:18 Jack Krupansky <jack.krupan...@gmail.com>
>> wrote:
>>
>>> Agreed, that anything requiring a full table scan, short of batch
>>> analytics,is an antipattern, although the goal is not to do a full scan per
>>> se, but just get the row count. It still surprises people that Cassandra
>>> cannot quickly get COUNT(*). The easy answer: Use DSE Search and do a Solr
>>> query for q=*:* and that will very quickly return the total row count. I
>>> presume that Stratio will handle this fine as well.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Mon, Apr 11, 2016 at 11:10 AM, <sean_r_dur...@homedepot.com> wrote:
>>>
>>>> Cassandra is not good for table scan type queries (which count(*)
>>>> typically is). While there are some attempts to do that (as noted below),
>>>> this is a path I avoid.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Sean Durity
>>>>
>>>>
>>>>
>>>> *From:* Max C [mailto:mc_cassan...@core43.com]
>>>> *Sent:* Saturday, April 09, 2016 6:19 PM
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Re: 1, 2, 3...
>>>>
>>>>
>>>>
>>>> Looks like this guy (Brian Hess) wrote a script to split the token
>>>> range and run count(*) on each subrange:
>>>>
>>>>
>>>>
>>>> https://github.com/brianmhess/cassandra-count
>>>>
>>>>
>>>>
>>>> - Max
>>>>
>>>>
>>>>
>>>> On Apr 8, 2016, at 10:56 pm, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> SELECT COUNT(*) probably works (with internal paging) on many datasets
>>>> with enough time and assuming you don’t have any partitions that will kill
>>>> you.
>>>>
>>>>
>>>>
>>>> No, it doesn’t count extra replicas / duplicates.
>>>>
>>>>
>>>>
>>>> The old way to do this (before paging / fetch size) was to use manual
>>>> paging based on tokens/clustering keys:
>>>>
>>>>
>>>>
>>>> https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html –
>>>> SELECT’s WHERE clause can use token(), which is what you’d want to use to
>>>> page through the whole token space.
>>>>
>>>>
>>>>
>>>> You could, in theory, issue thousands of queries in parallel, all for
>>>> different token ranges, and then sum the results. That’s what something
>>>> like spark would be doing. If you want to determine rows per node, limit
>>>> the token range to that owned by the node (easier with 1 token than vnodes,
>>>> with vnodes repeat num_tokens times).
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> The information in this Internet Email is confidential and may be
>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>> Email by anyone else is unauthorized. If you are not the intended
>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>> When addressed to our clients any opinions or advice contained in this
>>>> Email are subject to the terms and conditions expressed in any applicable
>>>> governing The Home Depot terms of business or client engagement letter. The
>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>> content of this attachment and for any damages or losses arising from any
>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>> items of a destructive nature, which may be contained in this attachment
>>>> and shall not be liable for direct, indirect, consequential or special
>>>> damages in connection with this e-mail message or its attachment.
>>>>
>>>
>>>
>

Re: 1, 2, 3...

Reply via email to