Re: Migrate from DSE (Datastax) to Apache Cassandra

2017-08-17 Thread Ioannis Zafiropoulos
Ok found a solution for this problem. I deleted the system's keyspace directory and restarted COSS and it was rebuilt. rm -rf /var/lib/cassandra/data/system A bit drastic but I'll test it also on a multi-node cluster. On Thu, Aug 17, 2017 at 3:57 PM, Ioannis Zafiropoulos

Re: Full table scan with cassandra

2017-08-17 Thread Alex Kotelnikov
So it is also terribly slow. Does not work with materialized views, quick hack about that below and UDT, this requires more time to fix. So I used it to retrieve the only built-in type column, the key. To make the task more time-consuming I exteneded the dataset a bit, to ~2.5M records. All of

Re: Adding a new node with the double of disk space

2017-08-17 Thread Jeff Jirsa
If you really double the hardware in every way, it's PROBABLY reasonable to double num_tokens. It won't be quite the same as doubling all-the-things, because you still have a single JVM, and you'll still have to deal with GC as you're now reading twice as much and generating twice as much garbage,

Re: Full table scan with cassandra

2017-08-17 Thread Dmitry Saprykin
Hi Alex, How do you generate you subrange set for running queries? It may happen that some of your ranges intersect data ownership range borders (check it running 'nodetool describering [keyspace_name]') Those range queries will be highly ineffective in that case and that could explain your

Re: Adding a new node with the double of disk space

2017-08-17 Thread Kevin O'Connor
Are you saying if a node had double the hardware capacity in every way it would be a bad idea to up num_tokens? I thought that was the whole idea of that setting though? On Thu, Aug 17, 2017 at 9:52 AM, Carlos Rolo wrote: > No. > > If you would double all the hardware on that

Re: Migrate from DSE (Datastax) to Apache Cassandra

2017-08-17 Thread Ioannis Zafiropoulos
Thanks Felipe and Erick, Yes, your comment helped a lot, I was able to resolve that by: ALTER KEYSPACE dse_system WITH replication = {'class': 'SimpleStrategy', 'replication_factor':'1'}; Another problem I had was with CentOS release 6.7 (Final) I was getting glibc 2.14 not found. Based on this

Re: Full table scan with cassandra

2017-08-17 Thread Jeff Jirsa
Brian Hess has perhaps the best open source code example of the right way to do this: https://github.com/brianmhess/cassandra-loader/blob/master/src/main/java/com/datastax/loader/CqlDelimUnload.java On Thu, Aug 17, 2017 at 10:00 AM, Alex Kotelnikov < alex.kotelni...@diginetica.com> wrote: >

Re: write time corrupted and not sure how

2017-08-17 Thread Jeff Jirsa
There are certainly cases where corruption has happened in cassandra (rare, thankfully), but like I mentioned, I'm not aware of any that only corrupted timestamps. It wouldn't surprise me to see a really broken clock, and it wouldnt' surprise me to see bit flips on bad hardware (even hardware with

Re: write time corrupted and not sure how

2017-08-17 Thread Greg Saylor
Thanks for your help, I wrote a script to cycle through these early records and try to update them (some columns were missing, but could be gleaned from another db), then do the update, re-read, and if its not correct figure out the write time and re-issue the update with a timestamp + 1.

Re: Full table scan with cassandra

2017-08-17 Thread Alex Kotelnikov
yup, user_id is the primary key. First of all,can you share, how to "go to a node directly"?. Also such approach will retrieve all the data RF times, coordinator should have enough metadata to avoid that. Should not requesting multiple coordinators provide certain concurrency? On 17 August

Re: Full table scan with cassandra

2017-08-17 Thread Dor Laor
On Thu, Aug 17, 2017 at 9:36 AM, Alex Kotelnikov < alex.kotelni...@diginetica.com> wrote: > Dor, > > I believe, I tried it in many ways and the result is quite disappointing. > I've run my scans on 3 different clusters, one of which was using on VMs > and I was able to scale it up and down (3-5-7

Re: Adding a new node with the double of disk space

2017-08-17 Thread Carlos Rolo
No. If you would double all the hardware on that node vs the others would still be a bad idea. Keep the cluster uniform vnodes wise. Regards, Carlos Juzarte Rolo Cassandra Consultant / Datastax Certified Architect / Cassandra MVP Pythian - Love your data rolo@pythian | Twitter: @cjrolo |

Adding a new node with the double of disk space

2017-08-17 Thread Cogumelos Maravilha
Hi all, I need to add a new node to my cluster but this time the new node will have the double of disk space comparing to the other nodes. I'm using the default vnodes (num_tokens: 256). To fully use the disk space in the new node I just have to configure num_tokens: 512? Thanks in advance.

Re: Full table scan with cassandra

2017-08-17 Thread Alex Kotelnikov
Dor, I believe, I tried it in many ways and the result is quite disappointing. I've run my scans on 3 different clusters, one of which was using on VMs and I was able to scale it up and down (3-5-7 VMs, 8 to 24 cores) to see, how this affects the performance. I also generated the flow from spark

Re: write time corrupted and not sure how

2017-08-17 Thread Jeff Jirsa
It's a long, so you can't grab it with readInt - 8 bytes instead of 4 You can delete it by issuing a delete with an explicit time stamp at least 1 higher the. The timestamp on the cell DELETE FROM table USING TIMESTAMP=? WHERE https://cassandra.apache.org/doc/latest/cql/dml.html#delete

write time corrupted and not sure how

2017-08-17 Thread Greg Saylor
Hello, We have a Cassandra database that is about 5 years old and has gone through multiple upgrades. Today I noticed a very odd thing (current timestamp would be around 1502957436214912): cqlsh:siq_prod> select id,account_id,sweep_id from items where id=34681132; id |