Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Jean Tremblay
How can I restart? It blocks with the error listed below. Are my memory settings good for my configuration? On 14 Jan 2016, at 18:30, Jake Luciani > wrote: Yes you can restart without data loss. Can you please include info about how much data you have

Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Jean Tremblay
Ok, I will open a ticket. How could I restart my cluster without loosing everything ? Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node. Thanks Jean On 14 Jan 2016, at 18:19, Tyler Hobbs

Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Jake Luciani
Yes you can restart without data loss. Can you please include info about how much data you have loaded per node and perhaps what your schema looks like? Thanks On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay < jean.tremb...@zen-innovations.com> wrote: > > Ok, I will open a ticket. > > How

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-14 Thread Zhiyan Shao
Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this forum, you can find another thread I sent a while ago. The perf test I did indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with 2.0.14. On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen wrote:

Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Rahul Ramesh
Hi Jan, I checked it. There are no old Key Spaces or tables. Thanks for your pointer, I started looking inside the directories. I see lot of snapshots directory inside the table directory. These directories are consuming space. However these snapshots are not shown when I issue listsnapshots

Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Rahul Ramesh
One update. I cleared the snapshot using nodetool clearsnapshot command. Disk space is recovered now. Because of this issue, I have mounted one more drive to the server and there are some data files there. How can I migrate the data so that I can decommission the drive? Will it work if I just

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

2016-01-14 Thread Hiroyuki Yamada
Thanks DuyHan ! That's clear and helpful. (and I realized that we need to call setSerialConsistency for SERIAL and setConsistency for others.) Thanks, Hiro On Tue, Jan 12, 2016 at 9:34 PM, DuyHai Doan wrote: > There are 2 levels of consistency levels you can define on

Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Jan Kesten
Hi Rahul, it should work as you would expect - simply copy over the sstables from your extra disk to the original one. To minimize downtime of the node you can do something like this: - rsync the files while the node is still running (sstables are immutable) to copy most of the data - edit

Modeling approach to widely used textual information

2016-01-14 Thread I PVP
Hi everyone, I am new to Cassandra and moving a existing myqql application to Cassandra. As a generic rule, what is the recommended approach for keeping textual information like a user_nickname, a company_name, product_title, that will potentially be updated at some time and is routinely and

Re: New node has high network and disk usage.

2016-01-14 Thread James Griffin
A summary of what we've done this morning: - Noted that there are no GCInspector lines in system.log on bad node (there are GCInspector logs on other healthy nodes) - Turned on GC logging, noted that we had logs which stated out total time for which application threads were stopped

Re: New node has high network and disk usage.

2016-01-14 Thread Kai Wang
James, Can you post the result of "nodetool netstats" on the bad node? On Thu, Jan 14, 2016 at 9:09 AM, James Griffin < james.grif...@idioplatform.com> wrote: > A summary of what we've done this morning: > >- Noted that there are no GCInspector lines in system.log on bad node >(there

Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Tyler Hobbs
I don't think that's a known issue. Can you open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along with the commitlog files and the mutation that was saved to /tmp? On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay < jean.tremb...@zen-innovations.com> wrote:

Re: Encryption in cassandra

2016-01-14 Thread Jack Krupansky
Cassandra supports both client to node and inter-node security. IOW, Cassandra can also be a client to another Cassandra node. To repeat (and you seem to keep ignoring this) - the presumption is that the user, outside of Cassandra, is responsible for securing the system, including the file

Encryption in cassandra

2016-01-14 Thread oleg yusim
Greetings, Guys, can you please help me to understand following: I'm reading through the way keystore and truststore are implemented, and it is all fine and great, but at the end Cassandra documentation instructing to extract all the keystore content and leave all certs and keys in a clear. Do

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Anurag Khandelwal
Hi Jack, > So, your 1GB input size means roughly 716 thousand rows of data and 128GB > means roughly 92 million rows, correct? Yes, that's correct. > Are your gets and searches returning single rows, or a significant number of > rows? Like I mentioned in my first email, get always returns a

Re: Encryption in cassandra

2016-01-14 Thread oleg yusim
Jack, Thanks for your answer. I guess, I'm a little confused by general architecture choice. It doesn't seem to be consistent to me. I mean, if we are building the layer of database specific security (i.e. we are saying, let's assume intruder is on the box, and he is root, what we can do?), then

Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Sebastian Estevez
Try starting the other nodes. You may have to delete or mv the commitlog segment referenced in the error message for the node to come up since apparently it is corrupted. All the best, [image: datastax_logo.png] Sebastián Estévez Solutions Architect | 954 905 8615 |

Re: Encryption in cassandra

2016-01-14 Thread Jack Krupansky
Cassandra is definitely assuming that you, the user, are separately assuring that no intruder gets access to the box/root/login. The keystore and truststore in Cassandra having nothing to do with system security, they are solely for Cassandra API security. System security and Cassandra API

Re: Encryption in cassandra

2016-01-14 Thread Jack Krupansky
The point of encryption in Cassandra is to protect data in flight between the cluster and clients (or between nodes in the cluster.) The presumption is that normal system network access control (e.g., remote login, etc.) will preclude bad actors from directly accessing the file system on a cluster

Re: Encryption in cassandra

2016-01-14 Thread daemeon reiydelle
The keys don't have to be on the box. You do need a logi/password for c*. sent from my mobile Daemeon C.M. Reiydelle USA 415.501.0198 London +44.0.20.8144.9872 On Jan 14, 2016 5:16 PM, "oleg yusim" wrote: > Greetings, > > Guys, can you please help me to understand

Re: Encryption in cassandra

2016-01-14 Thread oleg yusim
Daemeon, Can you, please, give me a bit of beef to your idea? I'm not sure I'm fully on board here. Thanks, Oleg On Thu, Jan 14, 2016 at 4:52 PM, daemeon reiydelle wrote: > The keys don't have to be on the box. You do need a logi/password for c*. > > sent from my mobile >

Re: Encryption in cassandra

2016-01-14 Thread oleg yusim
Jack, thank you for the link, but I'm not sure what you are referring to by Cassandra API security. If you mean TLS connection, Cassandra establishing to client and between nodes, than keystore and truststore do not seem to participate in it at all because Cassandra is using certs and keys,

Re: max connection per user

2016-01-14 Thread oleg yusim
Let me revive this thread a little. I see, it is possible to limit concurrent connections based on IP or client: # The maximum number of concurrent client connections. # The default is -1, which means unlimited. # native_transport_max_concurrent_connections: -1 # The maximum number of

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-14 Thread Jeff Jirsa
This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249 / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not this is really the case depends on how much of your data is in page cache, and whether or not you’re using mmap. Since the original question was asked

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-14 Thread Peddi, Praveen
Hi, We will try with reduced “rar_buffer_size” to 4KB. However CASSANDRA-10249 says "this only affects users who have 1. disabled compression, 2. switched to buffered i/o from mmap’d”. None of this is true for us I believe. We use default

Re: Sorting & pagination in apache cassandra 2.1

2016-01-14 Thread anuja jain
@Jonathan what do you mean by "you'll need to maintain your own materialized view tables"? does it mean we have to create new table for each query? On Wed, Jan 13, 2016 at 7:40 PM, Narendra Sharma wrote: > In the example you gave the primary key user _ name is the row

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jack Krupansky
What exactly is "input size" here (1GB to 128GB)? I mean, the test spec "The dataset used comprises of ~1.5KB records... there are 105 attributes in each record." Does each test run have exactly the same number of rows and columns and you're just making each column bigger, or what? Cassandra

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Anurag Khandelwal
To clarify: Input size is the size of the dataset as a CSV file, before loading it into Cassandra; for each input size, the number of columns is fixed but the number of rows is different. By 1.5KB record, I meant that each row, when represented as a CSV entry, occupies 1500 bytes. I've used the

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jonathan Haddad
I think you actually get a really useful metric by benchmarking 1 machine. You understand your cluster's theoretical maximum performance, which would be Nodes * number of queries. Yes, adding in replication and CL is important, but 1 machine lets you isolate certain performance metrics. On Thu,

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-14 Thread Jeff Jirsa
Sorry I wasn’t as explicit as I should have been The same buffer size is used by compressed reads as well, but tuned with compression_chunk_size table property. It’s likely true that if you lower compression_chunk_size, you’ll see improved read performance. This was covered in the AWS

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Robert Wille
I disagree. I think that you can extrapolate very little information about RF>1 and CL>1 by benchmarking with RF=1 and CL=1. On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal > wrote: Hi John, Thanks for responding! The aim of this benchmark was

Re: New node has high network and disk usage.

2016-01-14 Thread Kai Wang
James, I may miss something. You mentioned your cluster had RF=3. Then why does "nodetool status" show each node owns 1/3 of the data especially after a full repair? On Thu, Jan 14, 2016 at 9:56 AM, James Griffin < james.grif...@idioplatform.com> wrote: > Hi Kai, > > Below - nothing going on

Re: New node has high network and disk usage.

2016-01-14 Thread James Griffin
Hi Kai, Below - nothing going on that I can see $ nodetool netstats Mode: NORMAL Not sending any streams. Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a

Re: New node has high network and disk usage.

2016-01-14 Thread James Griffin
Hi Kai, Well observed - running `nodetool status` without specifying keyspace does report ~33% on each node. We have two keyspaces on this cluster - if I specify either of them the ownership reported by each node is 100%, so I believe the repair completed successfully. Best wishes, Griff

Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Jean Tremblay
Hi, I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM. I use Cassandra 3.1.1. I use the following setup for the memory: MAX_HEAP_SIZE="6G" HEAP_NEWSIZE="496M" I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jack Krupansky
Thanks for that clarification. So, your 1GB input size means roughly 716 thousand rows of data and 128GB means roughly 92 million rows, correct? FWIW, a best practice recommendation is that you avoid using secondary indexes in favor of using "query tables" - store the same data in multiple