Re: Full table scan with cassandra

2017-08-17 Thread Alex Kotelnikov
So it is also terribly slow. Does not work with materialized views, quick hack about that below and UDT, this requires more time to fix. So I used it to retrieve the only built-in type column, the key. To make the task more time-consuming I exteneded the dataset a bit, to ~2.5M records. All of

Re: Full table scan with cassandra

2017-08-17 Thread Dmitry Saprykin
Hi Alex, How do you generate you subrange set for running queries? It may happen that some of your ranges intersect data ownership range borders (check it running 'nodetool describering [keyspace_name]') Those range queries will be highly ineffective in that case and that could explain your

Re: Full table scan with cassandra

2017-08-17 Thread Jeff Jirsa
Brian Hess has perhaps the best open source code example of the right way to do this: https://github.com/brianmhess/cassandra-loader/blob/master/src/main/java/com/datastax/loader/CqlDelimUnload.java On Thu, Aug 17, 2017 at 10:00 AM, Alex Kotelnikov < alex.kotelni...@diginetica.com> wrote: >

Re: Full table scan with cassandra

2017-08-17 Thread Alex Kotelnikov
yup, user_id is the primary key. First of all,can you share, how to "go to a node directly"?. Also such approach will retrieve all the data RF times, coordinator should have enough metadata to avoid that. Should not requesting multiple coordinators provide certain concurrency? On 17 August

Re: Full table scan with cassandra

2017-08-17 Thread Dor Laor
On Thu, Aug 17, 2017 at 9:36 AM, Alex Kotelnikov < alex.kotelni...@diginetica.com> wrote: > Dor, > > I believe, I tried it in many ways and the result is quite disappointing. > I've run my scans on 3 different clusters, one of which was using on VMs > and I was able to scale it up and down (3-5-7

Re: Full table scan with cassandra

2017-08-17 Thread Alex Kotelnikov
Dor, I believe, I tried it in many ways and the result is quite disappointing. I've run my scans on 3 different clusters, one of which was using on VMs and I was able to scale it up and down (3-5-7 VMs, 8 to 24 cores) to see, how this affects the performance. I also generated the flow from spark

Re: Full table scan with cassandra

2017-08-16 Thread Dor Laor
Hi Alex, You probably didn't get the paralelism right. Serial scan has a paralelism of one. If the paralelism isn't large enough, perf will be slow. If paralelism is too large, Cassandra and the disk will trash and have too many context switches. So you need to find your cluster's sweet spot. We

Re: Full table scan with cassandra

2017-08-16 Thread Ben Bromhead
Apache Cassandra is not great in terms of performance at the moment for batch analytics workloads that require a full table scan. I would look at FiloDB for all the benefits and familiarity of Cassandra with better streaming and analytics performance: https://github.com/filodb/FiloDB There are

Full table scan with cassandra

2017-08-16 Thread Alex Kotelnikov
Hey, we are trying Cassandra as an alternative for storage huge stream of data coming from our customers. Storing works quite fine, and I started to validate how retrieval does. We have two types of that: fetching specific records and bulk retrieval for general analysis. Fetching single record