Creating a copy of a C* cluster

2017-08-07 Thread Robert Wille
We need to make a copy of a cluster. We’re going to do some testing against the 
copy and then discard it. What’s the best way of doing that? I created another 
datacenter, and then have tried to divorce it from the original datacenter, but 
have had troubles doing so.

Suggestions?

Thanks in advance

Robert


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Robert Wille
In my opinion, this is not broken and “fixing” it would break existing code. 
Consider a batch that includes multiple inserts, each of which inserts the 
value returned by now(). Getting the same UUID for each insert would be a major 
problem.

Cheers

Robert

On Nov 30, 2016, at 4:46 PM, Todd Fast 
> wrote:

FWIW I'd suggest opening a bug--this behavior is certainly quite unexpected and 
more than just a documentation issue. In general I can't imagine any desirable 
properties of the current implementation, and there are likely a bunch of 
latent bugs sitting out there, so it should be fixed.

Todd

On Wed, Nov 30, 2016 at 12:37 PM Terry Liu 
> wrote:
Sorry for my typo. Obviously, I meant:
"It appears that a single query that calls Cassandra's`now()` time function 
multiple times may actually cause a query to write or return different times."

Less of a surprise now that I realize more about the implementation, but I 
agree that more explicit documentation around when exactly the "execution" of 
each now() statement happens and what implications it has for the resulting 
timestamps would be helpful when running into this.

Thanks for the quick responses!

-Terry



On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek 
> wrote:
every now() call in statement is under the hood "replaced" with newly generated 
uuid.

It can happen that they belong to  different milliseconds in time.

If you need to have same timestamps you need to set them on the client side.


@msvaljek

2016-11-29 22:49 GMT+01:00 Terry Liu 
>:
It appears that a single query that calls Cassandra's `now()` time function may 
actually cause a query to write or return different times.

Is this the expected or defined behavior, and if so, why does it behave like 
this rather than evaluating `now()` once across an entire statement?

This really affects UPDATE statements but to test it more easily, you could try 
something like:

SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
FROM keyspace.table
LIMIT 100;

If you run that a few times, you should eventually see that the timestamp 
returned moves onto the next millisecond mid-query.

--
Software Engineer
Turnitin - http://www.turnitin.com
t...@turnitin.com




--
Software Engineer
Turnitin - http://www.turnitin.com
t...@turnitin.com



Re: Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Robert Wille
I used to think it was terrible as well. But it really isn’t. Just put your 
non-counter columns in a separate table with the same primary key. If you want 
to query both counter and non-counter columns at the same time, just query both 
tables at the same time with asynchronous queries.

On Nov 1, 2016, at 7:29 AM, Ali Akhtar 
> wrote:

That's a terrible gotcha rule.

On Tue, Nov 1, 2016 at 6:27 PM, Cody Yancey 
> wrote:

In your table schema, you have KEYS and you have VALUES. Your KEYS are text, 
but they could be any non-counter type or compound thereof. KEYS obviously 
cannot ever be counters.

Your VALUES, however, must be either all counters or all non-counters. The 
official example you posted conforms to this limitation.

Thanks,
Cody

On Nov 1, 2016 7:16 AM, "Ali Akhtar" 
> wrote:
I'm not referring to the primary key, just to other columns.

My primary key is a text, and my table contains a mix of texts, ints, and 
timestamps.

If I try to change one of the ints to a counter and run the create table query, 
I get the error ' Cannot mix counter and non counter columns in the same table'


On Tue, Nov 1, 2016 at 6:11 PM, Cody Yancey 
> wrote:

For counter tables, non-counter types are of course allowed in the primary key. 
Counters would be meaningless otherwise.

Thanks,
Cody

On Nov 1, 2016 7:00 AM, "Ali Akhtar" 
> wrote:
In the documentation for counters:

https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html

The example table is created via:

CREATE TABLE counterks.page_view_counts
  (counter_value counter,
  url_name varchar,
  page_name varchar,
  PRIMARY KEY (url_name, page_name)
);

Yet if I try to create a table with a mixture of texts, ints, timestamps, and 
counters, i get the error ' Cannot mix counter and non counter columns in the 
same table'

Is that supposed to be allowed or not allowed, given that the official example 
contains a mix of counters and non-counters?





Re: Transaction failed because of timeout, retry failed because of the first try actually succeeded.

2016-06-30 Thread Robert Wille
I had this problem, and it was caused by my retry policy. For reasons I don’t 
remember (but is documented in a C* Jira ticket), when onWriteTimeout() is 
called, you cannot call RetryDecision.retry(cl), as it will be a CL that is 
incompatible with LWT. After the fix (2.1.?), you can pass null, and it will 
use the original CL.

On Jun 30, 2016, at 6:11 PM, Justin Lin  wrote:

> Hi everyone,
> 
> I recently encountered a problem wrt light weight transaction. My query is to 
> insert a row to a table if the row doesn't exist. It goes like this:
> 
> Insert Into mytable (key, col1, col2) Value("key1", 1, 2) If Not Exist
> 
> My case is the driver somehow gets time out from waiting for coordinator to 
> response, but the transaction actually succeeded. So my code retry the query 
> and failed.
> 
> This is not an idempotent write, so the retry might be a bad idea. And 
> honestly this is not a cassandra issue. But i wonder if anyone in the 
> community ever had this problem before and how would you recommend to solve 
> it?
> 
> Thanks
> 
> -- 
> come on



Re: Intermittent CAS error

2016-05-19 Thread Robert Wille
I bet that’s it. I have a shared library that I use with another project that 
is still on 2.0, so I’ve kept the driver at 2.0. The issue was addressed in 
version 2.1 of the driver.

Thanks

On May 19, 2016, at 9:03 AM, Joel Knighton 
<joel.knigh...@datastax.com<mailto:joel.knigh...@datastax.com>> wrote:

That particular error is thrown directly from the Java driver (unless it is 
also copied in other drivers, either way, not from Cassandra).

There has been a bug related to this in the past - 
JAVA-764<https://datastax-oss.atlassian.net/browse/JAVA-764>. You may be on an 
affected version, or you may have found a similar bug.

The Java driver mailing list is the best place to follow up on this. It can be 
found at 
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user.

On Thu, May 19, 2016 at 12:11 AM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
When executing bulk CAS queries, I intermittently get the following error:

SERIAL is not supported as conditional update commit consistency. Use ANY if 
you mean "make sure it is accepted but I don't care how many replicas commit it 
for non-SERIAL reads”

This doesn’t make any sense. Obviously, it IS supported because it works most 
of the time. Is this just a result of not enough replicas, and the error 
message is jacked up?

I’m running 2.1.13.

Thanks

Robert




--
[https://lh6.googleusercontent.com/zujWCxiKGPoNTpU9rYwVo1GIvn0MXxFMiHJMgB_UZdESbDD0cn0gKnUw4UcYkyjI3uEbnxwVcjrGuvnXaTR4FoTt3F6u2whn_5qLRoExWTqSMYpv-9OtXIysw9rN5mXz564v5oY]<http://www.datastax.com/>
Joel Knighton
Cassandra Developer | 
joel.knigh...@datastax.com<mailto:joel.knigh...@datastax.com>

[https://lh3.googleusercontent.com/yF5_lV4h7D2UxEplm05DKDejwQH2rN54kQRIAMwYl3cOFAWVwodcGTBiCsOkOVBSKD644UsZpdwaHZTpN4TH56tTGAjI6X0QAPdSy3BHUHKM_L6-38Z4iZnaju6iTXyRNFFJYYs]<https://www.linkedin.com/company/datastax>
 
[https://lh6.googleusercontent.com/rfmg_Y2DLoU0blnPfsgAXGhlvS5qFs_Env_XvgglzoC8oYjpMrXkeeGtCOs1n4O-c4hWt7sB6JP_bVoshrvSDRs9d6t2h-94rPgih78BO7eizEHIkojHoFjFlbp9ev6VXowy9Uc]
 <https://www.facebook.com/datastax>  
[https://lh5.googleusercontent.com/f51dqYsxdPnH8vFbRcv01-CfYgwLWMRy6h0duHVx20vZdVGofchf9EwXO-QbK2iYu4B_XK39s-CUkTALWWRKAT5h5muJlJDE1G9aP0AS6_CHEehFXHal9QhmRqxEy0APsne2jRY]
 <https://twitter.com/datastax>  
[https://lh4.googleusercontent.com/1nF3jCTQgssSgkR8t8YGl29Xh0m4j6cjnwK-f8MYw3hs0ntHRPekqX7nOgsUTC8pe1skAHyqQJk58mYl1O02CYcT9Dm_QF_bITwZrperb5ufpSCNLVAHRnzWldryaRDe5Q3AsBQ]
 <https://plus.google.com/+Datastax/about>  
[https://lh5.googleusercontent.com/vkQxsPuzc0ZSSFU5mxnSHN7WaQN2GKK3hhORrINgHpxl6QB0eReJ3RjSNvKgFTjB85qTrvxHm125957h_vWszsE5GF2sjBJa8_kbEdN8tRZfLbCSZ2JbnrIpNH1r-PHafmFhPEE]
 <http://feeds.feedburner.com/datastax>  
[https://lh4.googleusercontent.com/70iAX8N2qH7_bxPX42x9HswoVfGFFSl4ueEXoYyp4APr3S1O1chNJSVTCVbY0Ta1-IjnqtirbcoFI9SGxkFBncliESx4uz8XHsQ5SsG82WOFDnMMkSKcpe10rr3Lbv9jkIYgQsU]
 <https://github.com/datastax/>

[http://datastax.com/images/Summit2016_sig_Email.png]<http://cassandrasummit.org/Email_Signature>



Intermittent CAS error

2016-05-18 Thread Robert Wille
When executing bulk CAS queries, I intermittently get the following error: 

SERIAL is not supported as conditional update commit consistency. Use ANY if 
you mean "make sure it is accepted but I don't care how many replicas commit it 
for non-SERIAL reads”

This doesn’t make any sense. Obviously, it IS supported because it works most 
of the time. Is this just a result of not enough replicas, and the error 
message is jacked up?

I’m running 2.1.13.

Thanks

Robert



Re: Large primary keys

2016-04-14 Thread Robert Wille
That would be a nice solution, but 3.4 is way too bleeding edge. I’ll just go 
with the digest for now. Thanks for pointing it out. I’ll have to consider a 
migration in the future when production is on 3.x.

On Apr 11, 2016, at 10:19 PM, Jack Krupansky 
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:

Check out the text indexing feature of the new SASI feature in Cassandra 3.4. 
You could write a custom tokenizer to extract entities and then be able to 
query for documents that contain those entities.

That said, using a SHA digest key for the primary key has merit for direct 
access to the document given the document text.

-- Jack Krupansky

On Mon, Apr 11, 2016 at 7:12 PM, James Carman 
<ja...@carmanconsulting.com<mailto:ja...@carmanconsulting.com>> wrote:
S3 maybe?

On Mon, Apr 11, 2016 at 7:05 PM Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
I do realize its kind of a weird use case, but it is legitimate. I have a 
collection of documents that I need to index, and I want to perform entity 
extraction on them and give the extracted entities special treatment in my 
full-text index. Because entity extraction costs money, and each document will 
end up being indexed multiple times, I want to cache them in Cassandra. The 
document text is the obvious key to retrieve entities from the cache. If I use 
the document ID, then I have to track timestamps. I know that sounds like a 
simple workaround, but I’m presenting a much-simplified view of my actual data 
model.

The reason for needing the text in the table, and not just a digest, is that 
sometimes entity extraction has to be deferred due to license limitations. In 
those cases, the entity extraction occurs on a background process, and the 
entities will be included in the index the next time the document is indexed.

I will use a digest as the key. I suspected that would be the answer, but its 
good to get confirmation.

Robert

On Apr 11, 2016, at 4:36 PM, Jan Kesten 
<j.kes...@enercast.de<mailto:j.kes...@enercast.de>> wrote:

> Hi Robert,
>
> why do you need the actual text as a key? I sounds a bit unatural at least 
> for me. Keep in mind that you cannot do "like" queries on keys in cassandra. 
> For performance and keeping things more readable I would prefer hashing your 
> text and use the hash as key.
>
> You should also take into account to store the keys (hashes) in a seperate 
> table per day / hour or something like that, so you can quickly get all keys 
> for a time range. A query without the partition key may be very slow.
>
> Jan
>
> Am 11.04.2016 um 23:43 schrieb Robert Wille:
>> I have a need to be able to use the text of a document as the primary key in 
>> a table. These texts are usually less than 1K, but can sometimes be 10’s of 
>> K’s in size. Would it be better to use a digest of the text as the key? I 
>> have a background process that will occasionally need to do a full table 
>> scan and retrieve all of the texts, so using the digest doesn’t eliminate 
>> the need to store the text. Anyway, is it better to keep primary keys small, 
>> or is C* okay with large primary keys?
>>
>> Robert
>>
>





Re: Large primary keys

2016-04-11 Thread Robert Wille
I do realize its kind of a weird use case, but it is legitimate. I have a 
collection of documents that I need to index, and I want to perform entity 
extraction on them and give the extracted entities special treatment in my 
full-text index. Because entity extraction costs money, and each document will 
end up being indexed multiple times, I want to cache them in Cassandra. The 
document text is the obvious key to retrieve entities from the cache. If I use 
the document ID, then I have to track timestamps. I know that sounds like a 
simple workaround, but I’m presenting a much-simplified view of my actual data 
model.

The reason for needing the text in the table, and not just a digest, is that 
sometimes entity extraction has to be deferred due to license limitations. In 
those cases, the entity extraction occurs on a background process, and the 
entities will be included in the index the next time the document is indexed.

I will use a digest as the key. I suspected that would be the answer, but its 
good to get confirmation.

Robert

On Apr 11, 2016, at 4:36 PM, Jan Kesten <j.kes...@enercast.de> wrote:

> Hi Robert,
> 
> why do you need the actual text as a key? I sounds a bit unatural at least 
> for me. Keep in mind that you cannot do "like" queries on keys in cassandra. 
> For performance and keeping things more readable I would prefer hashing your 
> text and use the hash as key.
> 
> You should also take into account to store the keys (hashes) in a seperate 
> table per day / hour or something like that, so you can quickly get all keys 
> for a time range. A query without the partition key may be very slow.
> 
> Jan
> 
> Am 11.04.2016 um 23:43 schrieb Robert Wille:
>> I have a need to be able to use the text of a document as the primary key in 
>> a table. These texts are usually less than 1K, but can sometimes be 10’s of 
>> K’s in size. Would it be better to use a digest of the text as the key? I 
>> have a background process that will occasionally need to do a full table 
>> scan and retrieve all of the texts, so using the digest doesn’t eliminate 
>> the need to store the text. Anyway, is it better to keep primary keys small, 
>> or is C* okay with large primary keys?
>> 
>> Robert
>> 
> 



Large primary keys

2016-04-11 Thread Robert Wille
I have a need to be able to use the text of a document as the primary key in a 
table. These texts are usually less than 1K, but can sometimes be 10’s of K’s 
in size. Would it be better to use a digest of the text as the key? I have a 
background process that will occasionally need to do a full table scan and 
retrieve all of the texts, so using the digest doesn’t eliminate the need to 
store the text. Anyway, is it better to keep primary keys small, or is C* okay 
with large primary keys?

Robert



Re: disable compaction if all data are read-only?

2016-04-08 Thread Robert Wille
You still need compaction. Compaction is what organizes your data into levels. 
Without compaction, every query would have to look at every SSTable.

Also, due to commit log rotation, your memtable may get flushed from time to 
time before it is full, resulting in small SSTables that would benefit from 
compaction.

On Apr 8, 2016, at 5:49 AM, Yatong Zhang 
> wrote:

I am using leveled strategy. What if my data are 'append-only'? I mean there 
are always new data but will be never changed once written to cassandra?

On Fri, Apr 8, 2016 at 6:33 PM, Pedro Gordo 
> wrote:
Hi Yatong

My understanding is that if you have a table whichi read-only and hence doesn't 
receive any writes, then no SSTables will be created, and hence, no compaction 
will happen. What compaction strategy do you have on your table?

Best regards

Pedro Gordo

On 8 April 2016 at 10:42, Yatong Zhang 
> wrote:
Hi there,
I am wondering if it is possible to disable compaction when all my data are 
read-only?





Re: Practical limit on number of column families

2016-02-29 Thread Robert Wille
Yes, there is memory overhead for each column family, effectively limiting the 
number of column families. The general wisdom is that you should limit yourself 
to a few hundred.

Robert

On Feb 29, 2016, at 10:30 AM, Fernando Jimenez 
> 
wrote:

Hi all

I have a use case for Cassandra that would require creating a large number of 
column families. I have found references to early versions of Cassandra where 
each column family would require a fixed amount of memory on all nodes, 
effectively imposing an upper limit on the total number of CFs. I have also 
seen rumblings that this may have been fixed in later versions.

To put the question to rest, I have setup a DSE sandbox and created some code 
to generate column families populated with 3,000 entries each.

Unfortunately I have now hit this issue: 
https://issues.apache.org/jira/browse/CASSANDRA-9291

So I will have to retest against Cassandra 3.0 instead

However, I would like to understand the limitations regarding creation of 
column families.

* Is there a practical upper limit?
* is this a fixed limit, or does it scale as more nodes are added into the 
cluster?
* Is there a difference between one keyspace with thousands of column families, 
vs thousands of keyspaces with only a few column families each?

I haven’t found any hard evidence/documentation to help me here, but if you can 
point me in the right direction, I will oblige and RTFM away.

Many thanks for your help!

Cheers
FJ





Re: Duplicated key with an IN statement

2016-02-04 Thread Robert Wille
You shouldn’t be using IN anyway. It is better to issue multiple queries, each 
for a single key, and issue them in parallel. Better performance. Less GC 
pressure.

On Feb 4, 2016, at 7:54 AM, Sylvain Lebresne 
> wrote:

That behavior has been changed in 2.2 and upwards. If you don't like it, 
upgrade. In the meantime, it's probably not hard to avoid passing duplicate 
keys in IN.

On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE 
> wrote:
Hello,

When running that kind of query with TRACING ON; I noticed the coordinator is 
also performing multiple time the same query

Because the element in the IN statement can involve many nodes, it makes sense 
to map/reduce the query, but running multiple time the same sub query should 
not happen. What if the result set change? Let’s imagine that query : SELECT * 
FROM t WHERE key IN (123, 123, …. X1000, 123), and while this query runs, the 
data for 123 change?

key | value
-+---
123 |   456
123 |   456
 123 |   456
 123 |   789 <-- Change here :(
123 |   789


There’s also something very important: when your table define a tuple being 
unique for a specific key, this is a real problem to be able to have a result 
set having multiple time the same key, which should be unique. This is why on 
every SQL implementation, this is not happening

I think this is a bug

Edouard COLE


De : Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Envoyé : Thursday, February 04, 2016 11:55 AM
À : Edouard COLE
Cc : user@cassandra.apache.org
Objet : Re: Duplicated key with an IN statement

Hi,

This is interesting.

It seems rational that if you are looking at 2 keys and both exist (which is 
the case) it returns you 2 keys, it. Yet, I just checked this kind of command 
on MySQL and it gives a one line result. So here CQL differs from SQL (at least 
MySQL). I know we are trying to fit as much as possible with SQL to avoid 
loosing people, so we might want to change this.
Not sure if this behavior is intentional / known. Not even sure someone ever 
tried to do this kind of query actually :).

Does anyone know about that ? Should we raise a ticket ?

-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com



2016-02-04 8:36 GMT+00:00 Edouard COLE 
>:
Hello,

I just discovered this, and I think this is weird:

ed@debian:~$ cqlsh 192.168.10.8
Connected to _CLUSTER_ at 192.168.10.8:9160.
[cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE ks-test ;
cqlsh:ks-test> CREATE TABLE t (
... key int,
... value int,
... PRIMARY KEY (key)
... );
cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
cqlsh:ks-test> SELECT * FROM t ;

 key | value
-+---
 123 |   456

(1 rows)

cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);

 key | value
-+---
 123 |   456
 123 |   456 <- WTF?

(2 rows)

Adding multiple time the same key into an IN statement make the query returns 
multiple time the tuple

This looks weird to me, can anyone give me some feedback on such a behavior?

Edouard COLE





Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Robert Wille
I disagree. I think that you can extrapolate very little information about RF>1 
and CL>1 by benchmarking with RF=1 and CL=1.

On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal 
> wrote:

Hi John,

Thanks for responding!

The aim of this benchmark was not to benchmark Cassandra as an end-to-end 
distributed system, but to understand a break down of the performance. For 
instance, if we understand the performance characteristics that we can expect 
from a single machine cassandra instance with RF=Consistency=1, we can have a 
good estimate of what the distributed performance with higher replication 
factors and consistency are going to look like. Even in the ideal case, the 
performance improvement would scale at most linearly with more machines and 
replicas.

That being said, I still want to understand whether this is the performance I 
should expect for the setup I described; if the performance for the current 
setup can be improved, then clearly the performance for a production setup 
(with multiple nodes, replicas) would also improve. Does that make sense?

Thanks!
Anurag

On Jan 6, 2016, at 9:31 AM, John Schulz 
> wrote:

Anurag,

Unless you are planning on continuing to use only one machine with RF=1 
benchmarking a single system using RF=Consistancy=1 is mostly a waste of time. 
If you are going to use RF=1 and a single host then why use Cassandra at all. 
Plain old relational dbs should do the job just fine.

Cassandra is designed to be distributed. You won't get the full impact of how 
it scales and the limits on scaling unless you benchmark a distributed system. 
For example the scaling impact of secondary indexes will not be visible on a 
single node.

John



On Tue, Jan 5, 2016 at 3:16 PM, Anurag Khandelwal 
> wrote:
Hi,

I’ve been benchmarking Cassandra to get an idea of how the performance scales 
with more data on a single machine. I just wanted to get some feedback to 
whether these are the numbers I should expect.

The benchmarks are quite simple — I measure the latency and throughput for two 
kinds of queries:

1. get() queries - These fetch an entire row for a given primary key.
2. search() queries - These fetch all the primary keys for rows where a 
particular column matches a particular value (e.g., “name” is “John Smith”).

Indexes are constructed for all columns that are queried.

Dataset

The dataset used comprises of ~1.5KB records (on an average) when represented 
as CSV; there are 105 attributes in each record.

Queries

For get() queries, randomly generated primary keys are used.

For search() queries, column values are selected such that their total number 
of occurrences in the dataset is between 1 - 4000. For example, a query for  
“name” = “John Smith” would only be performed if the number of rows that 
contain the same lies between 1-4000.

The results for the benchmarks are provided below:

Latency Measurements

The latency measurements are an average of 1 queries.





Throughput Measurements

The throughput measurements were repeated for 1-16 client threads, and the 
numbers reported for each input size is for the configuration (i.e., # client 
threads) with the highest throughput.





Any feedback here would be greatly appreciated!

Thanks!
Anurag




--

John H. Schulz

Principal Consultant

Pythian - Love your data


sch...@pythian.com |  Linkedin 
www.linkedin.com/pub/john-schulz/13/ab2/930/

Mobile: 248-376-3380

www.pythian.com


--







Re: Write/read heavy usecase in one cluster

2015-12-23 Thread Robert Wille
I would personally classify both of those use cases as light, and I wouldn’t 
have any qualms about using a single cluster for both of those.

On Dec 23, 2015, at 3:06 PM, cass savy  wrote:

> How do you determine if we can share cluster in prod for 2 different 
> applications
> 
>  1. Has anybody shared cluster in prod a write heavy use case that captures 
> user login info (few 100 rpm)  and hardly performs few reads per day
> and
>  Use case that is read heavy use case that is 92% read with 10k requests per 
> min,higher consistency level of quorum
> 
> 
> 2. Use of  in-memory tables for lookup tables  that will be referred to for 
> every request prior to writing to transactional tables. Has anyone ised it in 
> prod and what were the issues encountered. what will be tuning/reco to follow 
> for prod 
> 
> 3. Use of multiple data directories for different applications like having 
> different data partitions for write/read heavy and one separate for 
> commitlog/caches
>  
> 4. plan to use C* 2.1 with vnodes/murmur for above usecases. Need feedback of 
> if people have tried tuning heap size, off-heap parameters in c* 2.0 and 
> above. in prod
> 
> 5. Java 8 with c* 2.0 and higher pros/cons especially with G1GC garbage 
> collection



Re: lots of tombstone after compaction

2015-12-07 Thread Robert Wille
The nulls in the original data created the tombstones. They won’t go away until 
gc_grace_seconds have passed (default is 10 days).

On Dec 7, 2015, at 4:46 PM, Kai Wang  wrote:

> I bulkloaded a few tables using CQLSStableWrite/sstableloader. The data are 
> large amount of wide rows with lots of null's. It takes one day or two for 
> the compaction to complete. sstable count is at single digit. Maximum 
> partition size is ~50M and mean size is ~5M. However I am seeing frequent 
> read query timeouts caused by tombstone_failure_threshold (10). These 
> tables are basically read-only. There're no writes.
> 
> I just kicked off compaction on those tables using nodetool. Hopefully it can 
> remove those tombstones. But is it normal to have these many tombstones after 
> the initial compactions? Is this related to the fact the original data has 
> lots of nulls?
> 
> Thanks.



Re: Behavior difference between 2.0 and 2.1

2015-12-04 Thread Robert Wille
2.0.16

I could argue that either way is correct. It’s just disconcerting that the 
behavior changed. I spent some time and found and fixed everywhere in my code 
where this change could be a problem, and I fixed it in such a way that if 
works for both behaviors. I’d hate for this to come back to bite me if I 
upgrade again and the behavior reverts back to how 2.0 works.

I recently found a very simple and 100% reproducible paging bug in 2.0.16 
related to static columns (you get a duplicate row at page boundaries 100% of 
the time with a simple select by partition key). Even though static columns 
have existed for some time, it seems perhaps they aren’t fully baked.

Robert

On Dec 3, 2015, at 10:18 PM, Graham Sanderson 
<gra...@vast.com<mailto:gra...@vast.com>> wrote:

You didn’t specify which version of 2.0 you were on.

There were a number of inconsistencies with static columns fixed in 2.0.10

for example CASSANDRA-7490, and CASSANDRA-7455, but there were others, and the 
same bugs may have caused a bunch of other issues.

It very much depends exactly how you insert data (and indeed I believe is a 
rare case where an UPDATE is not equivalent to an INSERT) whether a partition 
exists when it only has static columns. The behavior you see does make sense 
though, in that it should be possible to insert static data only, and thus at 
the partition key must exist (so it is entirely reasonable to create CQL rows 
which have no actual - i.e. all null - values). Taking it a step further if you 
have TTL on all non static (clustering and data) columns, you don’t 
(necessarily) want the static data to disappear when the other cells do - 
though you can achieve this with statement wide TTL-ing on insertion of the 
static data.

On Dec 3, 2015, at 6:31 PM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:

With this schema:

CREATE TABLE roll (
id INT,
image BIGINT,
data VARCHAR static,
PRIMARY KEY ((id), image)
) WITH gc_grace_seconds = 3456000 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

if I run SELECT image FROM roll WHERE id = X on 2.0, where partition X has only 
static data, no rows were returned. In 2.1.11, it returns one row with a null 
value. Was this change in behavior intentional? Is there an option to get the 
old behavior back? I potentially have broken code anywhere that I access a 
table with a static column. Kind of a mess, and not the kind of thing a person 
expects when upgrading.

Thanks

Robert





Behavior difference between 2.0 and 2.1

2015-12-03 Thread Robert Wille
With this schema:

CREATE TABLE roll (
id INT,
image BIGINT,
data VARCHAR static,
PRIMARY KEY ((id), image)
) WITH gc_grace_seconds = 3456000 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

if I run SELECT image FROM roll WHERE id = X on 2.0, where partition X has only 
static data, no rows were returned. In 2.1.11, it returns one row with a null 
value. Was this change in behavior intentional? Is there an option to get the 
old behavior back? I potentially have broken code anywhere that I access a 
table with a static column. Kind of a mess, and not the kind of thing a person 
expects when upgrading.

Thanks

Robert



Upgrade instructions don't make sense

2015-11-23 Thread Robert Wille
I’m wanting to upgrade from 2.0 to 2.1. The upgrade instructions at 
http://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeCassandraDetails.html
 has the following, which leaves me with more questions than it answers:

If your cluster does not use vnodes, disable vnodes in each new cassandra.yaml 
before doing the rolling restart.
In Cassandra 2.0.x, virtual nodes (vnodes) are enabled by default. Disable 
vnodes in the 2.0.x version before upgrading.

  1.  In the 
cassandra.yaml
 file, set num_tokens to 1.
  2.  Uncomment the initial_token property and set it to 1 or to the value of a 
generated 
token
 for a multi-node cluster.

It seems strange that vnodes has to be disabled to upgrade, but whatever. If I 
use an initial token generator to set the initial_token property of each node, 
then I assume that my token ranges are all going to change, and that there’s 
going to be a whole bunch of streaming as the data is shuffled around. The docs 
don’t mention that. Should I wait until the streaming is done before proceeding 
with the upgrade?

The docs don’t talk about vnodes and initial_tokens post-upgrade. Can I turn 
vnodes back on? Am I forever after stuck with having to have manually generated 
initial tokens (and needing to have a unique cassandra.yaml for every node)? 
Can I just set num_tokens = 256 and comment out initial_token and do a rolling 
restart?

Thanks in advance

Robert



Re: Upgrade instructions don't make sense

2015-11-23 Thread Robert Wille
I guess I need to learn to read. Yes, I’m using vnodes, and yes, the 
instructions say to disable them if you aren’t using them, not if you are.

Sorry about cluttering up the mailing list.

Rboert

On Nov 23, 2015, at 4:22 PM, Sebastian Estevez 
<sebastian.este...@datastax.com<mailto:sebastian.este...@datastax.com>> wrote:

If your cluster does not use vnodes, disable vnodes in each new cassandra.yaml

If your cluster does use vnodes do not disable them.

All the best,

[datastax_logo.png]<http://www.datastax.com/>
Sebastián Estévez
Solutions Architect | 954 905 8615 | 
sebastian.este...@datastax.com<mailto:sebastian.este...@datastax.com>
[linkedin.png]<https://www.linkedin.com/company/datastax> [facebook.png] 
<https://www.facebook.com/datastax>  [twitter.png] 
<https://twitter.com/datastax>  [g+.png] 
<https://plus.google.com/+Datastax/about>  
[https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg]
 <http://feeds.feedburner.com/datastax>
<http://goog_410786983/>

[http://learn.datastax.com/rs/059-YLZ-577/images/Gartner_728x90_Sig4.png]<http://www.datastax.com/gartner-magic-quadrant-odbms>

DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Nov 23, 2015 at 5:55 PM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
I’m wanting to upgrade from 2.0 to 2.1. The upgrade instructions at 
http://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeCassandraDetails.html
 has the following, which leaves me with more questions than it answers:

If your cluster does not use vnodes, disable vnodes in each new cassandra.yaml 
before doing the rolling restart.
In Cassandra 2.0.x, virtual nodes (vnodes) are enabled by default. Disable 
vnodes in the 2.0.x version before upgrading.

  1.  In the 
cassandra.yaml<http://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeCassandraDetails.html#upgradeCassandraDetails__cassandrayaml_unique_7>
 file, set num_tokens to 1.
  2.  Uncomment the initial_token property and set it to 1 or to the value of a 
generated 
token<http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configGenTokens_c.html>
 for a multi-node cluster.

It seems strange that vnodes has to be disabled to upgrade, but whatever. If I 
use an initial token generator to set the initial_token property of each node, 
then I assume that my token ranges are all going to change, and that there’s 
going to be a whole bunch of streaming as the data is shuffled around. The docs 
don’t mention that. Should I wait until the streaming is done before proceeding 
with the upgrade?

The docs don’t talk about vnodes and initial_tokens post-upgrade. Can I turn 
vnodes back on? Am I forever after stuck with having to have manually generated 
initial tokens (and needing to have a unique cassandra.yaml for every node)? 
Can I just set num_tokens = 256 and comment out initial_token and do a rolling 
restart?

Thanks in advance

Robert





2.1 counters and CL=ONE

2015-10-27 Thread Robert Wille
I’m planning an upgrade from 2.0 to 2.1, and was reading about counters, and 
ended up with a question. I read that in 2.0, counters are implemented by 
storing deltas, and in 2.1, read-before-write is used to store totals instead. 
What does this mean for the following scenario?

Suppose we have a cluster with two nodes, RF=2 and CL=ONE. With node 2 down, a 
previously nonexistent counter is incremented twice. With node 1 down, the 
counter is incremented once. When both nodes are up, repair is run.

Does this mean that 2.0 would repair the counter by replicating the missing 
deltas so that both nodes have all three increments, and 2.1 would repair the 
counter by replicating node 2’s total to node 1? With 2.0, the count would end 
up 3, and with 2.1 the count would end up 1?

I assume that the implementation isn’t that naive, but I need to make sure.

Thanks

Robert



Anything special about upgrading from 2.0 to 2.1

2015-10-22 Thread Robert Wille
I’m on 2.0.16 and want to upgrade to the latest 2.1.x. I’ve seen some comments 
about issues with counters not migrating properly. I have a lot of counters. 
Any concerns there? Do I need to run nodetool upgradesstables? Any other 
gotchas?

Thanks

Robert



Node won't go away

2015-10-08 Thread Robert Wille
We had some problems with a node, so we decided to rebootstrap it. My IT guy 
screwed up, and when he added -Dcassandra.replace_address to cassandra-env.sh, 
he forgot the closing quote. The node bootstrapped, and then refused to join 
the cluster. We shut it down, and then noticed that nodetool status no longer 
showed that node, and the “Owns” column had increased from ~10% per node to 
~11% (we originally had 10 nodes). I don’t know why Cassandra decided to 
automatically remove the node from the cluster, but it did. We figured it would 
be best to make sure the node was completely forgotten, and then add it back 
into the cluster as a new node. Problem is, it won’t completely go away.

nodetool status doesn’t list it, but its still in system.peers, and OpsCenter 
still shows it. When I run nodetool removenode, it says that it can’t find the 
node.

How do I completely get rid of it?

Thanks in advance

Robert



Re: Duplicate records returned

2015-10-08 Thread Robert Wille
If anyone is following this, I also logged the bug at 
https://datastax-oss.atlassian.net/browse/JAVA-943. I suspect that its a driver 
bug, so I anticipate CASSANDRA-10442 being closed, and hopefully the folks at 
datastax can get this fixed. This bug must affect a whole lot of people.

On Oct 3, 2015, at 2:33 PM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:

It's a paging bug. I ALWAYS get a duplicated record every fetchSize records. 
Easily duplicated 100% of the time.

I’ve logged a bug: https://issues.apache.org/jira/browse/CASSANDRA-10442

Robert

On Oct 3, 2015, at 10:59 AM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:

Oops, I was trimming out some irrelevant stuff, and trimmed out too much. The 
second snippet should be this:

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
long lastImageId = -1;

for (Row row : rs)
{
long imageId = row.getLong(PARAM_IMAGE_ID);

if (imageId == lastImageId)
{
logger.warn("Cassandra duplicated " + imageId);
continue;
}

total++;
lastImageId = imageId;
}


On Oct 3, 2015, at 10:54 AM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:

I don’t think its an application problem. The following simple snippets produce 
different totals:

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
for (Row row : rs)
{
total++;
}

-

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
long lastImageId = -1;
for (Row row : rs)
{
long imageId = row.getLong(PARAM_IMAGE_ID);

total++;
lastImageId = imageId;
}

This doesn’t happen for all partitions. In fact most don’t have this problem 
(maybe 20% do this). But the ones that do repeat records, do so 
deterministically. I see this problem in multiple tables.

I’m only retrieving the clustering key, so it has nothing to do with the data 
field.

I suspect this is a paging problem, and might be a driver issue.

Robert

On Oct 3, 2015, at 9:29 AM, Eric Stevens 
<migh...@gmail.com<mailto:migh...@gmail.com>> wrote:

Can you give us an example of the duplicate records that comes back?  How 
reliable is it (i.e. is it every record, is it one record per read, etc)?  By 
any chance is it just the `data` field that duplicates while the other fields 
change per row?

> I don’t see duplicates in cqlsh.

I've never seen this, and I can't think of a failure mode which would cause it 
to happen.  Not to say it's impossible, but Cassandra's standard read path 
involves collapsing duplicate or otherwise overlapping answers from multiple 
replicas; such a thing would be a pretty substantial deviation.  Especially 
since you don't see the duplicates in cqlsh, I have a hunch this is an 
application bug.


On Fri, Oct 2, 2015 at 4:58 PM Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
When I run the query "SELECT image FROM roll WHERE roll = :roll“ against this 
table

CREATE TABLE roll (
roll INT,
image BIGINT,
data VARCHAR static,
mid VARCHAR,
imp_st VARCHAR,
PRIMARY KEY ((roll), image)
) WITH gc_grace_seconds = 3456000 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

I often get duplicate records back. Seems like a very simple query to botch. 
I’m running 2.0.16 with RF=3 and CL=QUORUM and Java client 2.0.10.1. I don’t 
see duplicates in cqlsh. Any thoughts?

Thanks

Robert







Re: Duplicate records returned

2015-10-03 Thread Robert Wille
Oops, I was trimming out some irrelevant stuff, and trimmed out too much. The 
second snippet should be this:

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
long lastImageId = -1;


for (Row row : rs)
{
long imageId = row.getLong(PARAM_IMAGE_ID);

if (imageId == lastImageId)
{
logger.warn("Cassandra duplicated " + imageId);
continue;
}

total++;
lastImageId = imageId;
}


On Oct 3, 2015, at 10:54 AM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:

I don’t think its an application problem. The following simple snippets produce 
different totals:

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
for (Row row : rs)
{
total++;
}

-

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
long lastImageId = -1;
for (Row row : rs)
{
long imageId = row.getLong(PARAM_IMAGE_ID);

total++;
lastImageId = imageId;
}

This doesn’t happen for all partitions. In fact most don’t have this problem 
(maybe 20% do this). But the ones that do repeat records, do so 
deterministically. I see this problem in multiple tables.

I’m only retrieving the clustering key, so it has nothing to do with the data 
field.

I suspect this is a paging problem, and might be a driver issue.

Robert

On Oct 3, 2015, at 9:29 AM, Eric Stevens 
<migh...@gmail.com<mailto:migh...@gmail.com>> wrote:

Can you give us an example of the duplicate records that comes back?  How 
reliable is it (i.e. is it every record, is it one record per read, etc)?  By 
any chance is it just the `data` field that duplicates while the other fields 
change per row?

> I don’t see duplicates in cqlsh.

I've never seen this, and I can't think of a failure mode which would cause it 
to happen.  Not to say it's impossible, but Cassandra's standard read path 
involves collapsing duplicate or otherwise overlapping answers from multiple 
replicas; such a thing would be a pretty substantial deviation.  Especially 
since you don't see the duplicates in cqlsh, I have a hunch this is an 
application bug.


On Fri, Oct 2, 2015 at 4:58 PM Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
When I run the query "SELECT image FROM roll WHERE roll = :roll“ against this 
table

CREATE TABLE roll (
roll INT,
image BIGINT,
data VARCHAR static,
mid VARCHAR,
imp_st VARCHAR,
PRIMARY KEY ((roll), image)
) WITH gc_grace_seconds = 3456000 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

I often get duplicate records back. Seems like a very simple query to botch. 
I’m running 2.0.16 with RF=3 and CL=QUORUM and Java client 2.0.10.1. I don’t 
see duplicates in cqlsh. Any thoughts?

Thanks

Robert





Re: Duplicate records returned

2015-10-03 Thread Robert Wille
I don’t think its an application problem. The following simple snippets produce 
different totals:

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
for (Row row : rs)
{
total++;
}

-

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
long lastImageId = -1;
for (Row row : rs)
{
long imageId = row.getLong(PARAM_IMAGE_ID);

total++;
lastImageId = imageId;
}

This doesn’t happen for all partitions. In fact most don’t have this problem 
(maybe 20% do this). But the ones that do repeat records, do so 
deterministically. I see this problem in multiple tables.

I’m only retrieving the clustering key, so it has nothing to do with the data 
field.

I suspect this is a paging problem, and might be a driver issue.

Robert

On Oct 3, 2015, at 9:29 AM, Eric Stevens 
<migh...@gmail.com<mailto:migh...@gmail.com>> wrote:

Can you give us an example of the duplicate records that comes back?  How 
reliable is it (i.e. is it every record, is it one record per read, etc)?  By 
any chance is it just the `data` field that duplicates while the other fields 
change per row?

> I don’t see duplicates in cqlsh.

I've never seen this, and I can't think of a failure mode which would cause it 
to happen.  Not to say it's impossible, but Cassandra's standard read path 
involves collapsing duplicate or otherwise overlapping answers from multiple 
replicas; such a thing would be a pretty substantial deviation.  Especially 
since you don't see the duplicates in cqlsh, I have a hunch this is an 
application bug.


On Fri, Oct 2, 2015 at 4:58 PM Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
When I run the query "SELECT image FROM roll WHERE roll = :roll“ against this 
table

CREATE TABLE roll (
roll INT,
image BIGINT,
data VARCHAR static,
mid VARCHAR,
imp_st VARCHAR,
PRIMARY KEY ((roll), image)
) WITH gc_grace_seconds = 3456000 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

I often get duplicate records back. Seems like a very simple query to botch. 
I’m running 2.0.16 with RF=3 and CL=QUORUM and Java client 2.0.10.1. I don’t 
see duplicates in cqlsh. Any thoughts?

Thanks

Robert




Re: Duplicate records returned

2015-10-03 Thread Robert Wille
It's a paging bug. I ALWAYS get a duplicated record every fetchSize records. 
Easily duplicated 100% of the time.

I’ve logged a bug: https://issues.apache.org/jira/browse/CASSANDRA-10442

Robert

On Oct 3, 2015, at 10:59 AM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:

Oops, I was trimming out some irrelevant stuff, and trimmed out too much. The 
second snippet should be this:

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
long lastImageId = -1;

for (Row row : rs)
{
long imageId = row.getLong(PARAM_IMAGE_ID);

if (imageId == lastImageId)
{
logger.warn("Cassandra duplicated " + imageId);
continue;
}

total++;
lastImageId = imageId;
}


On Oct 3, 2015, at 10:54 AM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:

I don’t think its an application problem. The following simple snippets produce 
different totals:

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
for (Row row : rs)
{
total++;
}

-

ResultSet rs = cassandraUtil.executeNamedQuery(Q_LIST_IMAGES, rollId);

int total = 0;
long lastImageId = -1;
for (Row row : rs)
{
long imageId = row.getLong(PARAM_IMAGE_ID);

total++;
lastImageId = imageId;
}

This doesn’t happen for all partitions. In fact most don’t have this problem 
(maybe 20% do this). But the ones that do repeat records, do so 
deterministically. I see this problem in multiple tables.

I’m only retrieving the clustering key, so it has nothing to do with the data 
field.

I suspect this is a paging problem, and might be a driver issue.

Robert

On Oct 3, 2015, at 9:29 AM, Eric Stevens 
<migh...@gmail.com<mailto:migh...@gmail.com>> wrote:

Can you give us an example of the duplicate records that comes back?  How 
reliable is it (i.e. is it every record, is it one record per read, etc)?  By 
any chance is it just the `data` field that duplicates while the other fields 
change per row?

> I don’t see duplicates in cqlsh.

I've never seen this, and I can't think of a failure mode which would cause it 
to happen.  Not to say it's impossible, but Cassandra's standard read path 
involves collapsing duplicate or otherwise overlapping answers from multiple 
replicas; such a thing would be a pretty substantial deviation.  Especially 
since you don't see the duplicates in cqlsh, I have a hunch this is an 
application bug.


On Fri, Oct 2, 2015 at 4:58 PM Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
When I run the query "SELECT image FROM roll WHERE roll = :roll“ against this 
table

CREATE TABLE roll (
roll INT,
image BIGINT,
data VARCHAR static,
mid VARCHAR,
imp_st VARCHAR,
PRIMARY KEY ((roll), image)
) WITH gc_grace_seconds = 3456000 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

I often get duplicate records back. Seems like a very simple query to botch. 
I’m running 2.0.16 with RF=3 and CL=QUORUM and Java client 2.0.10.1. I don’t 
see duplicates in cqlsh. Any thoughts?

Thanks

Robert






Duplicate records returned

2015-10-02 Thread Robert Wille
When I run the query "SELECT image FROM roll WHERE roll = :roll“ against this 
table

CREATE TABLE roll (
roll INT,
image BIGINT,
data VARCHAR static,
mid VARCHAR,
imp_st VARCHAR,
PRIMARY KEY ((roll), image)
) WITH gc_grace_seconds = 3456000 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

I often get duplicate records back. Seems like a very simple query to botch. 
I’m running 2.0.16 with RF=3 and CL=QUORUM and Java client 2.0.10.1. I don’t 
see duplicates in cqlsh. Any thoughts?

Thanks

Robert



Re: Compaction not happening

2015-09-29 Thread Robert Wille
CASSANDRA-9662 definitely sounds like the source of my spikes. Good to know 
they are fake. Just wish I knew why it won’t compact when 50% of my data has 
been tombstoned. The other day it shed 10% of its size, and hasn’t grown since, 
so I guess that’s something.

On Sep 28, 2015, at 6:04 PM, Paulo Motta 
<pauloricard...@gmail.com<mailto:pauloricard...@gmail.com>> wrote:

I don't know about the other issues, but the compaction pending spikes looks 
like CASSANDRA-9662 (https://issues.apache.org/jira/browse/CASSANDRA-9662). 
Could you try upgrading to 2.0.17/2.1.9 and check if that is fixed?

Also, if you're not already doing this, try to monitor the droppable tombstone 
ratio JMX metric (or inspect sstables droppable tombstone ratio with 
sstablemetadata) and play with the tombstone compaction subproperties: 
tombstone_threshold, tombstone_compaction_interval and 
unchecked_tombstone_compaction (more details on: 
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html)

Cheers,

2015-09-28 16:36 GMT-07:00 Dan Kinder 
<dkin...@turnitin.com<mailto:dkin...@turnitin.com>>:
+1 would be great to hear a response on this. I see similar strange behavior 
where "Compactions Pending" spikes up into the thousands. In my case it's a LCS 
table with fluctuating-but-sometimes-pretty-high write load and lots of 
(intentional) overwrite, infrequent deletes. C* 2.1.7.

On Thu, Sep 24, 2015 at 12:59 PM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
I have some tables that have quite a bit of churn, and the deleted data doesn’t 
get compacted out of them, in spite of gc_grace_seconds=0. I periodically get 
updates in bulk for some of my tables. I write the new data to the tables, and 
then delete the old data (no updates, just insertion with new primary keys, 
followed by deletion of the old records). On any given day, I might add and 
delete 5 to 10 percent of the records. The amount of disk space that these 
tables take up has historically been surprisingly constant. For many months, 
the space didn’t vary by more than a few gigs. A couple of months ago, the 
tables started growing. They grew to be about 50% bigger than they used to be, 
and they just kept growing. We decided to upgrade our cluster (from 2.0.14 to 
2.0.16), and right after the upgrade, the tables got compacted down to their 
original size. The size then stayed pretty constant and I was feeling pretty 
good about it. Unfortunately, a couple of weeks ago, they started growing 
again, and are now about twice their original size. I’m using leveled 
compaction.

One thing I’ve noticed is that back when compaction was working great, whenever 
I’d start writing to these tables, compactions would get triggered, and they 
would run for hours following the bulk writing. Now, when I’m writing records, 
I see short little compactions that take several seconds.

One other thing that may be relevant is that while I'm writing, max compactions 
pending can get into the thousands, but drops to 0 as soon as I’m done writing. 
Seems quite strange that Cassandra can chug through the pending compactions so 
quickly, while achieving so little. Half the data in these tables can be 
compacted out, and yet compaction does almost nothing.

This seems really strange to me:

 

Compactions pending shoots up when I’m done writing. Doesn’t make a lot of 
sense.

Any thoughts on how I can figure out what’s going on? Any idea what caused the 
tables to be compacted following the upgrade? Any thoughts on why I used to 
have compactions that took hours and actually did something, but now I get 
compactions that run really fast, but don’t really do anything? Perhaps if I’m 
patient enough, the space will eventually get compacted out, and yearning for 
the good-old days is just a waste of time. I can accept that, although if 
that’s the case, I may need to buy more nodes.

Thanks in advance

Robert






Re: High CPU usage on some of nodes

2015-09-10 Thread Robert Wille
It sounds like its probably GC. Grep for GC in system.log to verify. If it is 
GC, there are a myriad of issues that could cause it, but at least you’ve 
narrowed it down.

On Sep 9, 2015, at 11:05 PM, Roman Tkachenko  wrote:

> Hey guys,
> 
> We've been having issues in the past couple of days with CPU usage / load 
> average suddenly skyrocketing on some nodes of the cluster, affecting 
> performance significantly so majority of requests start timing out. It can go 
> on for several hours, with CPU spiking through the roof then coming back down 
> to norm and so on. Weirdly, it affects only a subset of nodes and it's always 
> the same ones. The boxes Cassandra is running on are pretty beefy, 24 cores, 
> and these CPU spikes go up to >1000%.
> 
> What is the best way to debug such kind of issues and find out what Cassandra 
> is doing during spikes like this? Doesn't seem to be compaction related as 
> sometimes during these spikes "nodetool compactionstats" says no compactions 
> are running.
> 
> Thanks!
> 



Re: Order By limitation or bug?

2015-09-07 Thread Robert Wille
Thanks. Based on what I know about the architecture, it seems like it should be 
pretty easy to support. Thanks for the confirmation and the ticket.

On Sep 4, 2015, at 3:30 PM, Tyler Hobbs 
<ty...@datastax.com<mailto:ty...@datastax.com>> wrote:

This query would be reasonable to support, so I've opened 
https://issues.apache.org/jira/browse/CASSANDRA-10271 to fix that.

On Thu, Sep 3, 2015 at 7:48 PM, Alec Collier 
<alec.coll...@macquarie.com<mailto:alec.coll...@macquarie.com>> wrote:
You should be able to execute the following

SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY type, id 
DESC;

Essentially the order by clause has to specify the clustering columns in order 
in full. It doesn’t by default know that you have already essentially filtered 
by type.

Alec Collier | Workplace Service Design
Corporate Operations Group - Technology | Macquarie Group Limited •

From: Robert Wille [mailto:rwi...@fold3.com<mailto:rwi...@fold3.com>]
Sent: Friday, 4 September 2015 7:17 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Order By limitation or bug?

If you only specify the partition key, and none of the clustering columns, you 
can order by in either direction:

SELECT data FROM import_file WHERE roll = 1 order by type;
SELECT data FROM import_file WHERE roll = 1 order by type DESC;

These are both valid. Seems like specifying the prefix of the clustering 
columns is just a specialization of an already-supported pattern.

Robert

On Sep 3, 2015, at 2:46 PM, DuyHai Doan 
<doanduy...@gmail.com<mailto:doanduy...@gmail.com>> wrote:


Limitation, not bug. The reason ?

On disk, data are sorted by type first, and FOR EACH type value, the data are 
sorted by id.

So to do an order by Id, C* will need to perform an in-memory re-ordering, not 
sure how bad it is for performance. In any case currently it's not possible, 
maybe you should create a JIRA to ask for lifting the limitation.

On Thu, Sep 3, 2015 at 10:27 PM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:

Given this table:

CREATE TABLE import_file (
  roll int,
  type text,
  id timeuuid,
  data text,
  PRIMARY KEY ((roll), type, id)
)

This should be possible:

SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY id DESC;

but it results in the following error:

Bad Request: Order by currently only support the ordering of columns following 
their declared order in the PRIMARY KEY

I am ordering in the declared order in the primary key. I don’t see why this 
shouldn’t be able to be supported. Is this a known limitation or a bug?

In this example, I can get the results I want by omitting the ORDER BY clause 
and adding WITH CLUSTERING ORDER BY (id DESC) to the schema. However, now I can 
only get descending order. I have to choose either ascending or descending 
order. I cannot get both.

Robert




This email, including any attachments, is confidential. If you are not the 
intended recipient, you must not disclose, distribute or use the information in 
this email in any way. If you received this email in error, please notify the 
sender immediately by return email and delete the message. Unless expressly 
stated otherwise, the information in this email should not be regarded as an 
offer to sell or as a solicitation of an offer to buy any financial product or 
service, an official confirmation of any transaction, or as an official 
statement of the entity sending this message. Neither Macquarie Group Limited, 
nor any of its subsidiaries, guarantee the integrity of any emails or attached 
files and are not responsible for any changes made to them by any other person.



--
Tyler Hobbs
DataStax<http://datastax.com/>



Re: Order By limitation or bug?

2015-09-03 Thread Robert Wille
If you only specify the partition key, and none of the clustering columns, you 
can order by in either direction:

SELECT data FROM import_file WHERE roll = 1 order by type;
SELECT data FROM import_file WHERE roll = 1 order by type DESC;

These are both valid. Seems like specifying the prefix of the clustering 
columns is just a specialization of an already-supported pattern.

Robert

On Sep 3, 2015, at 2:46 PM, DuyHai Doan 
<doanduy...@gmail.com<mailto:doanduy...@gmail.com>> wrote:

Limitation, not bug. The reason ?

On disk, data are sorted by type first, and FOR EACH type value, the data are 
sorted by id.

So to do an order by Id, C* will need to perform an in-memory re-ordering, not 
sure how bad it is for performance. In any case currently it's not possible, 
maybe you should create a JIRA to ask for lifting the limitation.

On Thu, Sep 3, 2015 at 10:27 PM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
Given this table:

CREATE TABLE import_file (
  roll int,
  type text,
  id timeuuid,
  data text,
  PRIMARY KEY ((roll), type, id)
)

This should be possible:

SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY id DESC;

but it results in the following error:

Bad Request: Order by currently only support the ordering of columns following 
their declared order in the PRIMARY KEY

I am ordering in the declared order in the primary key. I don’t see why this 
shouldn’t be able to be supported. Is this a known limitation or a bug?

In this example, I can get the results I want by omitting the ORDER BY clause 
and adding WITH CLUSTERING ORDER BY (id DESC) to the schema. However, now I can 
only get descending order. I have to choose either ascending or descending 
order. I cannot get both.

Robert





Order By limitation or bug?

2015-09-03 Thread Robert Wille
Given this table:

CREATE TABLE import_file (
  roll int,
  type text,
  id timeuuid,
  data text,
  PRIMARY KEY ((roll), type, id)
)

This should be possible:

SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY id DESC;

but it results in the following error:

Bad Request: Order by currently only support the ordering of columns following 
their declared order in the PRIMARY KEY

I am ordering in the declared order in the primary key. I don’t see why this 
shouldn’t be able to be supported. Is this a known limitation or a bug?

In this example, I can get the results I want by omitting the ORDER BY clause 
and adding WITH CLUSTERING ORDER BY (id DESC) to the schema. However, now I can 
only get descending order. I have to choose either ascending or descending 
order. I cannot get both.

Robert



Re: Written data is lost and no exception thrown back to the client

2015-08-21 Thread Robert Wille
RF=1 with QUORUM consistency. I know QUORUM is weird with RF=1, but it should 
be the same as ONE. If’s QUORUM instead of ONE because production has RF=3, and 
I was running this against my test cluster with RF=1.

On Aug 20, 2015, at 7:28 PM, Jason 
jkushm...@rocketfuelinc.commailto:jkushm...@rocketfuelinc.com wrote:

What consistency level were the writes?

From: Robert Willemailto:rwi...@fold3.com
Sent: ‎8/‎20/‎2015 18:25
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Written data is lost and no exception thrown back to the client

I wrote a data migration application which I was testing, and I pushed it too 
hard and the FlushWriter thread pool blocked, and I ended up with dropped 
mutation messages. I compared the source data against what is in my cluster, 
and as expected I have missing records. The strange thing is that my 
application didn’t error out. I’ve been doing some forensics, and there’s a lot 
about this that makes no sense and makes me feel very uneasy.

I use a lot of asynchronous queries, and I thought it was possible that I had 
bad error handling, so I checked for errors in other, independent ways.

I have a retry policy that on the first failure logs the error and then 
requests a retry. On the second failure it logs the error and then rethrows. A 
few retryable errors appeared in my logs, but no fatal errors. In theory, I 
should have a fatal error in my logs for any error that gets reported back to 
the client.

I wrap my Session object, and all queries go through this wrapper. This wrapper 
logs all query errors. Synchronous queries are wrapped in a try/catch which 
logs and rethrows. Asynchronous queries use a FutureCallback to log any 
onFailure invocations.

My logs indicate that no errors whatsoever were reported back to me. I do not 
understand how I can get dropped mutation messages and not know about it. I am 
running 2.0.16 with datastax Java driver 2.0.8. Three node cluster with RF=1. 
If someone could help me understand how this can occur, I would greatly 
appreciate it. A database that errors out is one thing. A database that errors 
out and makes you think everything was fine is quite another.

Thanks

Robert




Re: Written data is lost and no exception thrown back to the client

2015-08-21 Thread Robert Wille
But it shouldn’t matter. I have missing data, and no errors, which shouldn’t be 
possible except with CL=ANY.

FWIW, I’m working on some sample code so I can post a Jira.

Robert

On Aug 21, 2015, at 5:04 AM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:

RF=1 with QUORUM consistency. I know QUORUM is weird with RF=1, but it should 
be the same as ONE. If’s QUORUM instead of ONE because production has RF=3, and 
I was running this against my test cluster with RF=1.

On Aug 20, 2015, at 7:28 PM, Jason 
jkushm...@rocketfuelinc.commailto:jkushm...@rocketfuelinc.com wrote:

What consistency level were the writes?

From: Robert Willemailto:rwi...@fold3.com
Sent: ‎8/‎20/‎2015 18:25
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Written data is lost and no exception thrown back to the client

I wrote a data migration application which I was testing, and I pushed it too 
hard and the FlushWriter thread pool blocked, and I ended up with dropped 
mutation messages. I compared the source data against what is in my cluster, 
and as expected I have missing records. The strange thing is that my 
application didn’t error out. I’ve been doing some forensics, and there’s a lot 
about this that makes no sense and makes me feel very uneasy.

I use a lot of asynchronous queries, and I thought it was possible that I had 
bad error handling, so I checked for errors in other, independent ways.

I have a retry policy that on the first failure logs the error and then 
requests a retry. On the second failure it logs the error and then rethrows. A 
few retryable errors appeared in my logs, but no fatal errors. In theory, I 
should have a fatal error in my logs for any error that gets reported back to 
the client.

I wrap my Session object, and all queries go through this wrapper. This wrapper 
logs all query errors. Synchronous queries are wrapped in a try/catch which 
logs and rethrows. Asynchronous queries use a FutureCallback to log any 
onFailure invocations.

My logs indicate that no errors whatsoever were reported back to me. I do not 
understand how I can get dropped mutation messages and not know about it. I am 
running 2.0.16 with datastax Java driver 2.0.8. Three node cluster with RF=1. 
If someone could help me understand how this can occur, I would greatly 
appreciate it. A database that errors out is one thing. A database that errors 
out and makes you think everything was fine is quite another.

Thanks

Robert





Written data is lost and no exception thrown back to the client

2015-08-20 Thread Robert Wille
I wrote a data migration application which I was testing, and I pushed it too 
hard and the FlushWriter thread pool blocked, and I ended up with dropped 
mutation messages. I compared the source data against what is in my cluster, 
and as expected I have missing records. The strange thing is that my 
application didn’t error out. I’ve been doing some forensics, and there’s a lot 
about this that makes no sense and makes me feel very uneasy.

I use a lot of asynchronous queries, and I thought it was possible that I had 
bad error handling, so I checked for errors in other, independent ways.

I have a retry policy that on the first failure logs the error and then 
requests a retry. On the second failure it logs the error and then rethrows. A 
few retryable errors appeared in my logs, but no fatal errors. In theory, I 
should have a fatal error in my logs for any error that gets reported back to 
the client.

I wrap my Session object, and all queries go through this wrapper. This wrapper 
logs all query errors. Synchronous queries are wrapped in a try/catch which 
logs and rethrows. Asynchronous queries use a FutureCallback to log any 
onFailure invocations.

My logs indicate that no errors whatsoever were reported back to me. I do not 
understand how I can get dropped mutation messages and not know about it. I am 
running 2.0.16 with datastax Java driver 2.0.8. Three node cluster with RF=1. 
If someone could help me understand how this can occur, I would greatly 
appreciate it. A database that errors out is one thing. A database that errors 
out and makes you think everything was fine is quite another.

Thanks

Robert



Re: Schema questions for data structures with recently-modified access patterns

2015-07-24 Thread Robert Wille
When performing an update, the following needs to happen:

1. Read document.last_modified
2. Get the current timestamp
3. Update document with last_modified=current timestamp
4. Insert into doc_by_last_modified with last_modified=current timestamp
5. Delete from doc_by_last_modified with last_modified=the timestamp from step 1

If two parties do the above at roughly the same time, such that in step 1 they 
both read the same last_modified timestamp, then when they do step 5, they’ll 
both delete the same old record from doc_by_last_modified, and you’ll get two 
records for the same document in doc_by_last_modified.

Would it work to put steps 3-5 into an atomic batch and use a lightweight 
transaction for step 3? (e.g. UPDATE document SET doc = :doc, last_modified = 
:cur_ts WHERE docid = :docid IF last_modified = :prev_ts) If a lightweight 
transaction is batched with other statements on other tables, will the other 
statements get cancelled if the lightweight transaction is cancelled?

Robert

On Jul 23, 2015, at 9:49 PM, Jack Krupansky 
jack.krupan...@gmail.commailto:jack.krupan...@gmail.com wrote:

Concurrent update should not be problematic. Duplicate entries should not be 
created. If it appears to be, explain your apparent issue so we can see whether 
it is a real issue.

But at least from all of the details you have disclosed so far, there does not 
appear to be any indication that this type of time series would be anything 
other than a good fit for Cassandra.

Besides, the new materialized view feature of Cassandra 3.0 would make it an 
even easier fit.

-- Jack Krupansky

On Thu, Jul 23, 2015 at 6:30 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
I obviously worded my original email poorly. I guess that’s what happens when 
you post at the end of the day just before quitting.

I want to get a list of documents, ordered from most-recently modified to 
least-recently modified, with each document appearing exactly once.

Jack, your schema does exactly that, and is essentially the same as mine (with 
exception of my missing the DESC clause, and I have a partitioning column and 
you only have clustering columns).

The problem I have with my schema (or Jack’s) is that it is very easy for a 
document to get in the list multiple times. Concurrent updates to the document, 
for example. Also, a consistency issue could cause the document to appear in 
the list more than once.

I think that Alec Collier’s comment is probably accurate, that this kind of a 
pattern just isn’t a good fit for Cassandra.

On Jul 23, 2015, at 1:54 PM, Jack Krupansky 
jack.krupan...@gmail.commailto:jack.krupan...@gmail.com wrote:

Maybe you could explain in more detail what you mean by recently modified 
documents, since that is precisely what I thought I suggested with descending 
ordering.

-- Jack Krupansky

On Thu, Jul 23, 2015 at 3:40 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Carlos’ suggestion (nor yours) didn’t didn’t provide a way to query 
recently-modified documents.

His updated suggestion provides a way to get recently-modified documents, but 
not ordered.

On Jul 22, 2015, at 4:19 PM, Jack Krupansky 
jack.krupan...@gmail.commailto:jack.krupan...@gmail.com wrote:

No way to query recently-modified documents.

I don't follow why you say that. I mean, that was the point of the data model 
suggestion I proposed. Maybe you could clarify.

I also wanted to mention that the new materialized view feature of Cassandra 
3.0 might handle this use case, including taking care of the delete, 
automatically.


-- Jack Krupansky

On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
The time series doesn’t provide the access pattern I’m looking for. No way to 
query recently-modified documents.

On Jul 21, 2015, at 9:13 AM, Carlos Alonso 
i...@mrcalonso.commailto:i...@mrcalonso.com wrote:

Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in the 
row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == the docId LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can 
always expire old ones using TTL or on a batch job. Tombstones will never be a 
problem in this case as, due to the specified clustering order, the latest 
modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonsohttps://twitter.com/calonso

On 21 July 2015 at 05:59, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Data structures that have a recently-modified access pattern seem to be a poor 
fit for Cassandra. I’m wondering if any of you smart guys can provide 
suggestions.

For the sake of discussion, lets assume I have

Compaction no longer working properly

2015-07-24 Thread Robert Wille
I have a database which has a fair amount of churn. When I need to update a 
data structure, I create a new one, and when it is complete, I delete the old 
one. I have gc_grace_seconds=0, so the space for the old data structures should 
be reclaimed on the next compaction. This has been working fine for months. 
Unfortunately, compaction has stopped doing its job, and about a week ago disk 
space started increasing steadily. Disk space usage has increased almost 50% in 
the last week.

I do see compaction tasks get kicked off, but they complete very quickly, 
generally in well less than a minute. Compaction of the bigger tables used to 
take more than an hour. Admittedly, its been quite a while since I’ve watched 
compactions run, and I probably haven’t watched them since we last upgraded, so 
perhaps there’s been a change in how LeveledCompaction works. Our last upgrade 
was well more than a week ago, so an upgrade isn’t responsible for this change 
in behavior. We’re running 2.0.14 currently.

Looking at the logs, I see stuff like this:

INFO [CompactionExecutor:49180] 2015-07-24 09:31:45,960 CompactionTask.java 
(line 296) Compacted 2 sstables to 
[/var/lib/cassandra/data/fold31_browse/node/fold31_browse-node-jb-0,].  
117,066,921 bytes to 117,346,402 (~100% of original) in 33,766ms = 
3.314288MB/s.  1,444,907 total partitions merged to 1,400,563.  Partition merge 
counts were {1:1356219, 2:44344, }

So, compaction seems to be doing something, just not very much. I’ve checked 
the process that deletes stale data, and it is doing its job. Plenty of 
deletion going on, just no space reclamation.

Any thoughts?

Thanks in advance

Robert



Re: Schema questions for data structures with recently-modified access patterns

2015-07-23 Thread Robert Wille
Carlos’ suggestion (nor yours) didn’t didn’t provide a way to query 
recently-modified documents.

His updated suggestion provides a way to get recently-modified documents, but 
not ordered.

On Jul 22, 2015, at 4:19 PM, Jack Krupansky 
jack.krupan...@gmail.commailto:jack.krupan...@gmail.com wrote:

No way to query recently-modified documents.

I don't follow why you say that. I mean, that was the point of the data model 
suggestion I proposed. Maybe you could clarify.

I also wanted to mention that the new materialized view feature of Cassandra 
3.0 might handle this use case, including taking care of the delete, 
automatically.


-- Jack Krupansky

On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
The time series doesn’t provide the access pattern I’m looking for. No way to 
query recently-modified documents.

On Jul 21, 2015, at 9:13 AM, Carlos Alonso 
i...@mrcalonso.commailto:i...@mrcalonso.com wrote:

Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in the 
row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == the docId LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can 
always expire old ones using TTL or on a batch job. Tombstones will never be a 
problem in this case as, due to the specified clustering order, the latest 
modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonsohttps://twitter.com/calonso

On 21 July 2015 at 05:59, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Data structures that have a recently-modified access pattern seem to be a poor 
fit for Cassandra. I’m wondering if any of you smart guys can provide 
suggestions.

For the sake of discussion, lets assume I have the following tables:

CREATE TABLE document (
docId UUID,
doc TEXT,
last_modified TIMEUUID,
PRIMARY KEY ((docid))
)

CREATE TABLE doc_by_last_modified (
date TEXT,
last_modified TIMEUUID,
docId UUID,
PRIMARY KEY ((date), last_modified)
)

When I update a document, I retrieve its last_modified time, delete the current 
record from doc_by_last_modified, and add a new one. Unfortunately, if you’d 
like each document to appear at most once in the doc_by_last_modified table, 
then this doesn’t work so well.

Documents can get into the doc_by_last_modified table multiple times if there 
is concurrent access, or if there is a consistency issue.

Any thoughts out there on how to efficiently provide recently-modified access 
to a table? This problem exists for many types of data structures, not just 
recently-modified. Any ordered data structure that can be dynamically reordered 
suffers from the same problems. As I’ve been doing schema design, this pattern 
keeps recurring. A nice way to address this problem has lots of applications.

Thanks in advance for your thoughts

Robert







Re: Schema questions for data structures with recently-modified access patterns

2015-07-23 Thread Robert Wille
I obviously worded my original email poorly. I guess that’s what happens when 
you post at the end of the day just before quitting.

I want to get a list of documents, ordered from most-recently modified to 
least-recently modified, with each document appearing exactly once.

Jack, your schema does exactly that, and is essentially the same as mine (with 
exception of my missing the DESC clause, and I have a partitioning column and 
you only have clustering columns).

The problem I have with my schema (or Jack’s) is that it is very easy for a 
document to get in the list multiple times. Concurrent updates to the document, 
for example. Also, a consistency issue could cause the document to appear in 
the list more than once.

I think that Alec Collier’s comment is probably accurate, that this kind of a 
pattern just isn’t a good fit for Cassandra.

On Jul 23, 2015, at 1:54 PM, Jack Krupansky 
jack.krupan...@gmail.commailto:jack.krupan...@gmail.com wrote:

Maybe you could explain in more detail what you mean by recently modified 
documents, since that is precisely what I thought I suggested with descending 
ordering.

-- Jack Krupansky

On Thu, Jul 23, 2015 at 3:40 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Carlos’ suggestion (nor yours) didn’t didn’t provide a way to query 
recently-modified documents.

His updated suggestion provides a way to get recently-modified documents, but 
not ordered.

On Jul 22, 2015, at 4:19 PM, Jack Krupansky 
jack.krupan...@gmail.commailto:jack.krupan...@gmail.com wrote:

No way to query recently-modified documents.

I don't follow why you say that. I mean, that was the point of the data model 
suggestion I proposed. Maybe you could clarify.

I also wanted to mention that the new materialized view feature of Cassandra 
3.0 might handle this use case, including taking care of the delete, 
automatically.


-- Jack Krupansky

On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
The time series doesn’t provide the access pattern I’m looking for. No way to 
query recently-modified documents.

On Jul 21, 2015, at 9:13 AM, Carlos Alonso 
i...@mrcalonso.commailto:i...@mrcalonso.com wrote:

Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in the 
row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == the docId LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can 
always expire old ones using TTL or on a batch job. Tombstones will never be a 
problem in this case as, due to the specified clustering order, the latest 
modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonsohttps://twitter.com/calonso

On 21 July 2015 at 05:59, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Data structures that have a recently-modified access pattern seem to be a poor 
fit for Cassandra. I’m wondering if any of you smart guys can provide 
suggestions.

For the sake of discussion, lets assume I have the following tables:

CREATE TABLE document (
docId UUID,
doc TEXT,
last_modified TIMEUUID,
PRIMARY KEY ((docid))
)

CREATE TABLE doc_by_last_modified (
date TEXT,
last_modified TIMEUUID,
docId UUID,
PRIMARY KEY ((date), last_modified)
)

When I update a document, I retrieve its last_modified time, delete the current 
record from doc_by_last_modified, and add a new one. Unfortunately, if you’d 
like each document to appear at most once in the doc_by_last_modified table, 
then this doesn’t work so well.

Documents can get into the doc_by_last_modified table multiple times if there 
is concurrent access, or if there is a consistency issue.

Any thoughts out there on how to efficiently provide recently-modified access 
to a table? This problem exists for many types of data structures, not just 
recently-modified. Any ordered data structure that can be dynamically reordered 
suffers from the same problems. As I’ve been doing schema design, this pattern 
keeps recurring. A nice way to address this problem has lots of applications.

Thanks in advance for your thoughts

Robert









Re: Best Practise for Updating Index and Reporting Tables

2015-07-23 Thread Robert Wille
My guess is that you don’t understand what an atomic batch is, give that you 
used the phrase “updated synchronously”. Atomic batches do not provide 
isolation, and do not guarantee immediate consistency. The only thing an atomic 
batch guarantees is that all of the statements in the batch will eventually be 
executed. Both approaches are eventually consistent, so you have to deal with 
inconsistency either way.

On Jul 23, 2015, at 11:46 AM, Anuj Wadehra 
anujw_2...@yahoo.co.inmailto:anujw_2...@yahoo.co.in wrote:

We have a transaction table,3 manually created index tables and few tables for 
reporting.

One option is to go for atomic batch mutations so that for each transaction 
every index table and other reporting tables are updated synchronously.

Other option is to update other tables async, there may be consistency issues 
if some mutations drop under load or node goes down. Logic for rolling back or 
retrying idempodent updates will be at client.

We dont have a persistent queue in the system yet and even if we introduce one 
so that transaction table is updated and other updates are done async via 
queue, we are bothered about its throughput as we go for around 1000 tps in 
large clusters. We value consistency but small delay in updating index and 
reporting table is acceptable.

Which design seems more appropriate?

Thanks
Anuj

Sent from Yahoo Mail on 
Androidhttps://overview.mail.yahoo.com/mobile/?.src=Android




Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Robert Wille
The time series doesn’t provide the access pattern I’m looking for. No way to 
query recently-modified documents.

On Jul 21, 2015, at 9:13 AM, Carlos Alonso 
i...@mrcalonso.commailto:i...@mrcalonso.com wrote:

Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in the 
row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == the docId LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can 
always expire old ones using TTL or on a batch job. Tombstones will never be a 
problem in this case as, due to the specified clustering order, the latest 
modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonsohttps://twitter.com/calonso

On 21 July 2015 at 05:59, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Data structures that have a recently-modified access pattern seem to be a poor 
fit for Cassandra. I’m wondering if any of you smart guys can provide 
suggestions.

For the sake of discussion, lets assume I have the following tables:

CREATE TABLE document (
docId UUID,
doc TEXT,
last_modified TIMEUUID,
PRIMARY KEY ((docid))
)

CREATE TABLE doc_by_last_modified (
date TEXT,
last_modified TIMEUUID,
docId UUID,
PRIMARY KEY ((date), last_modified)
)

When I update a document, I retrieve its last_modified time, delete the current 
record from doc_by_last_modified, and add a new one. Unfortunately, if you’d 
like each document to appear at most once in the doc_by_last_modified table, 
then this doesn’t work so well.

Documents can get into the doc_by_last_modified table multiple times if there 
is concurrent access, or if there is a consistency issue.

Any thoughts out there on how to efficiently provide recently-modified access 
to a table? This problem exists for many types of data structures, not just 
recently-modified. Any ordered data structure that can be dynamically reordered 
suffers from the same problems. As I’ve been doing schema design, this pattern 
keeps recurring. A nice way to address this problem has lots of applications.

Thanks in advance for your thoughts

Robert





Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Robert Wille
If last_modified is a clustering column, it needs a partitioning column, which 
is what date is for (although I should have named it day, and I also forgot to 
add the order by desc clause). This is essentially what I came up with. Still 
not liking how easy it is to get duplicates.

On Jul 21, 2015, at 9:31 AM, Jack Krupansky 
jack.krupan...@gmail.commailto:jack.krupan...@gmail.com wrote:

Keep the original document base table, but then the query table should have the 
PK as last_modified, docId, with last_modified descending, so that a query can 
get the n most recently modified documents.

Yes, you still need to manually delete the old entry for the document in the 
query table if duplicates are a problem for you.

Yeah, a TTL would be good if you don't care about documents modified a month or 
a week ago.

-- Jack Krupansky

On Tue, Jul 21, 2015 at 11:13 AM, Carlos Alonso 
i...@mrcalonso.commailto:i...@mrcalonso.com wrote:
Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in the 
row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == the docId LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can 
always expire old ones using TTL or on a batch job. Tombstones will never be a 
problem in this case as, due to the specified clustering order, the latest 
modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonsohttps://twitter.com/calonso

On 21 July 2015 at 05:59, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Data structures that have a recently-modified access pattern seem to be a poor 
fit for Cassandra. I’m wondering if any of you smart guys can provide 
suggestions.

For the sake of discussion, lets assume I have the following tables:

CREATE TABLE document (
docId UUID,
doc TEXT,
last_modified TIMEUUID,
PRIMARY KEY ((docid))
)

CREATE TABLE doc_by_last_modified (
date TEXT,
last_modified TIMEUUID,
docId UUID,
PRIMARY KEY ((date), last_modified)
)

When I update a document, I retrieve its last_modified time, delete the current 
record from doc_by_last_modified, and add a new one. Unfortunately, if you’d 
like each document to appear at most once in the doc_by_last_modified table, 
then this doesn’t work so well.

Documents can get into the doc_by_last_modified table multiple times if there 
is concurrent access, or if there is a consistency issue.

Any thoughts out there on how to efficiently provide recently-modified access 
to a table? This problem exists for many types of data structures, not just 
recently-modified. Any ordered data structure that can be dynamically reordered 
suffers from the same problems. As I’ve been doing schema design, this pattern 
keeps recurring. A nice way to address this problem has lots of applications.

Thanks in advance for your thoughts

Robert






Schema questions for data structures with recently-modified access patterns

2015-07-20 Thread Robert Wille
Data structures that have a recently-modified access pattern seem to be a poor 
fit for Cassandra. I’m wondering if any of you smart guys can provide 
suggestions.

For the sake of discussion, lets assume I have the following tables:

CREATE TABLE document (
docId UUID,
doc TEXT,
last_modified TIMEUUID,
PRIMARY KEY ((docid))
)

CREATE TABLE doc_by_last_modified (
date TEXT,
last_modified TIMEUUID,
docId UUID,
PRIMARY KEY ((date), last_modified)
)

When I update a document, I retrieve its last_modified time, delete the current 
record from doc_by_last_modified, and add a new one. Unfortunately, if you’d 
like each document to appear at most once in the doc_by_last_modified table, 
then this doesn’t work so well.

Documents can get into the doc_by_last_modified table multiple times if there 
is concurrent access, or if there is a consistency issue.

Any thoughts out there on how to efficiently provide recently-modified access 
to a table? This problem exists for many types of data structures, not just 
recently-modified. Any ordered data structure that can be dynamically reordered 
suffers from the same problems. As I’ve been doing schema design, this pattern 
keeps recurring. A nice way to address this problem has lots of applications.

Thanks in advance for your thoughts

Robert



Truncate really slow

2015-07-01 Thread Robert Wille
I have two test clusters, both 2.0.15. One has a single node and one has three 
nodes. Truncate on the three node cluster is really slow, but is quite fast on 
the single-node cluster. My test cases truncate tables before each test, and  
95% of the time in my test cases is spent truncating tables on the 3-node 
cluster. Auto-snapshotting is off. 

I know there’s some coordination that has to occur when a truncate happens, but 
it seems really excessive. Almost one second to truncate each table with an 
otherwise idle cluster.

Any thoughts?

Thanks in advance

Robert



Re: Missing data

2015-06-15 Thread Robert Wille
You can get tombstones from inserting null values. Not sure if that’s the 
problem, but it is another way of getting tombstones in your data.

On Jun 15, 2015, at 10:50 AM, Jean Tremblay 
jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
wrote:

Dear all,

I identified a bit more closely the root cause of my missing data.

The problem is occurring when I use

dependency
groupIdcom.datastax.cassandra/groupId
artifactIdcassandra-driver-core/artifactId
version2.1.6/version
/dependency

on my client against Cassandra 2.1.6.

I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4.
Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 
2.1.6.  !!

So as far as I can locate the problem, I would say that the version 2.1.6 of 
the driver is not working properly and is loosing some of my records.!!!

——

As far as my tombstones are concerned I don’t understand their origin.
I removed all location in my code where I delete items, and I do not use TTL 
anywhere ( I don’t need this feature in my project).

And yet I have many tombstones building up.

Is there another origin for tombstone beside TTL, and deleting items? Could the 
compaction of LeveledCompactionStrategy be the origin of them?

@Carlos thanks for your guidance.

Kind regards

Jean



On 15 Jun 2015, at 11:17 , Carlos Rolo 
r...@pythian.commailto:r...@pythian.com wrote:

Hi Jean,

The problem of that Warning is that you are reading too many tombstones per 
request.

If you do have Tombstones without doing DELETE it because you probably TTL'ed 
the data when inserting (By mistake? Or did you set default_time_to_live in 
your table?). You can use nodetool cfstats to see how many tombstones per read 
slice you have. This is, probably, also the cause of your missing data. Data 
was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: 
linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.comhttp://www.pythian.com/

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay 
jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
wrote:
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput 
has been change.

I loaded my data with simple insert statements. This took a bit more than one 
day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically 
had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no 
errors.
On the nodes side there I see many WARNING, all related with tombstones, but 
there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have 
never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could 
find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - 
Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for 
key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, 
slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.



--







Re: Dropped mutation messages

2015-06-13 Thread Robert Wille

Internode messages which are received by a node, but do not get not to be 
processed within rpc_timeout are dropped rather than processed. As the 
coordinator node will no longer be waiting for a response. If the Coordinator 
node does not receive Consistency Level responses before the rpc_timeout it 
will return a TimedOutException to the client.

I understand that, but that’s where this makes no sense. I’m running with RF=1, 
and CL=QUORUM, which means each update goes to one node, and I need one 
response for a success. I have many thousands of dropped mutation messages, but 
no TimedOutExceptions thrown back to the client. If I have GC problems, or 
other issues that are making my cluster unresponsive, I can deal with that. But 
having writes that fail and no error is clearly not acceptable. How is it 
possible to be getting errors and not be informed about them?

Thanks

Robert



Dropped mutation messages

2015-06-12 Thread Robert Wille
I am preparing to migrate a large amount of data to Cassandra. In order to test 
my migration code, I’ve been doing some dry runs to a test cluster. My test 
cluster is 2.0.15, 3 nodes, RF=1 and CL=QUORUM. I know RF=1 and CL=QUORUM is a 
weird combination, but my production cluster that will eventually receive this 
data is RF=3. I am running with RF=1 so its faster while I work out the kinks 
in the migration.

There are a few things that have puzzled me, after writing several 10’s of 
millions records to my test cluster.

My main concern is that I have a few tens of thousands of dropped mutation 
messages. I’m overloading my cluster. I never have more than about 10% CPU 
utilization (even my I/O wait is negligible). A curious thing about that is 
that the driver hasn’t thrown any exceptions, even though mutations have been 
dropped. I’ve seen dropped mutation messages on my production cluster, but like 
this, I’ve never gotten errors back from the client. I had always assumed that 
one node dropped mutation messages, but the other two did not, and so quorum 
was satisfied. With RF=1, I don’t understand how mutation messages are being 
dropped and the client doesn’t tell me about it. Does this mean my cluster is 
missing data, and I have no idea?

Each node has a couple dozen all-time blocked FlushWriters. Is that bad?

I have around 100 dropped counter mutations, which is very weird because I 
don’t write any counters. I have counters in my schema for tracking view 
counts, but the migration code doesn’t write them. How could I get dropped 
counter mutation messages when I don’t modify them?

Any insights would be appreciated. Thanks in advance.

Robert



Re: Dropped mutation messages

2015-06-12 Thread Robert Wille
I meant to say I’m *not* overloading my cluster.

On Jun 12, 2015, at 6:52 PM, Robert Wille rwi...@fold3.com wrote:

 I am preparing to migrate a large amount of data to Cassandra. In order to 
 test my migration code, I’ve been doing some dry runs to a test cluster. My 
 test cluster is 2.0.15, 3 nodes, RF=1 and CL=QUORUM. I know RF=1 and 
 CL=QUORUM is a weird combination, but my production cluster that will 
 eventually receive this data is RF=3. I am running with RF=1 so its faster 
 while I work out the kinks in the migration.
 
 There are a few things that have puzzled me, after writing several 10’s of 
 millions records to my test cluster.
 
 My main concern is that I have a few tens of thousands of dropped mutation 
 messages. I’m overloading my cluster. I never have more than about 10% CPU 
 utilization (even my I/O wait is negligible). A curious thing about that is 
 that the driver hasn’t thrown any exceptions, even though mutations have been 
 dropped. I’ve seen dropped mutation messages on my production cluster, but 
 like this, I’ve never gotten errors back from the client. I had always 
 assumed that one node dropped mutation messages, but the other two did not, 
 and so quorum was satisfied. With RF=1, I don’t understand how mutation 
 messages are being dropped and the client doesn’t tell me about it. Does this 
 mean my cluster is missing data, and I have no idea?
 
 Each node has a couple dozen all-time blocked FlushWriters. Is that bad?
 
 I have around 100 dropped counter mutations, which is very weird because I 
 don’t write any counters. I have counters in my schema for tracking view 
 counts, but the migration code doesn’t write them. How could I get dropped 
 counter mutation messages when I don’t modify them?
 
 Any insights would be appreciated. Thanks in advance.
 
 Robert
 



Coordination of expired TTLs compared to tombstones

2015-05-29 Thread Robert Wille
I was wondering something about Cassandra’s internals.

Suppose I have CL  1 and I read a partition with a bunch of tombstones. Those 
tombstones have to be sent to the coordinator for consistency reasons so that 
if another replica produces non-tombstone data that is older than the 
tombstone, it can know that the data has been deleted.

I was wondering how that compares to cells with expired TTLs. Does the node get 
to skip sending data back to the coordinator for an expired TTL? I am under the 
impression that expired data doesn’t have to be sent to the coordinator, but as 
I think about it, it seems like that might not be true. 

Suppose you wrote a cell with no TTL, and then updated it with a TTL. Suppose 
that node 1 got both writes, but node 2 only got the first one. If you asked 
for the cell after it expired, and node 1 did not send anything to the 
coordinator, it seems to me that that could violate consistency levels. Also, 
read repair could never fix node 2. So, how does that work?

On a related note, do cells with expired TTLs have to wait gc_grace_seconds 
before they can be compacted out? It seems to me that if they could get 
compacted out immediately after expiration, you could get zombie data, just 
like you can with tombstones. For example, write a cell with no TTL to all 
replicas, shut down one replica, update the cell with a TTL, compact after the 
TTL has expired, then bring the other node back up. Voila, the formerly down 
node has a value that will replicate to the other nodes.

Thanks in advance

Robert



Re: After running nodetool clean up, the used disk space was increased

2015-05-15 Thread Robert Wille
Have you cleared snapshots?

On May 15, 2015, at 2:24 PM, Analia Lorenzatto 
analialorenza...@gmail.commailto:analialorenza...@gmail.com wrote:

The Replication Factor = 2.  The RP is the default, but not sure how to check 
it.
I am attaching the output of: nodetool ring

Thanks a lot!

On Fri, May 15, 2015 at 4:17 PM, Kiran mk 
coolkiran2...@gmail.commailto:coolkiran2...@gmail.com wrote:

run cleanup on all the nodes and wait till it completes.

On May 15, 2015 10:47 PM, Analia Lorenzatto 
analialorenza...@gmail.commailto:analialorenza...@gmail.com wrote:
Hello guys,

I have a cassandra cluster = 2.1.0-2 comprised of 3 nodes.  I successfully 
added the third node last week.  After that, I ran nodetool cleanup on one of 
the other two nodes, and it finished well but it increased the used disk space.
Before running the clean up the node was 197 GB of used space, and after that 
it is 329GB used.  It is my understanding that the clean up frees up some 
space, but in this case it was highly increased.

I am running out of space, that's why I added a third node.  Do you have any 
clue on how to proceed with that situation?

Thanks in advance!!

--
Saludos / Regards.

Analía Lorenzatto.

“It's possible to commit no errors and still lose. That is not weakness.  That 
is life.  By Captain Jean-Luc Picard.



--
Saludos / Regards.

Analía Lorenzatto.

“It's possible to commit no errors and still lose. That is not weakness.  That 
is life.  By Captain Jean-Luc Picard.
list



Re: Updating only modified records (where lastModified current date)

2015-05-13 Thread Robert Wille
You probably shouldn’t use batch updates. Your records are probably unrelated 
to each other, and therefore there really is no reason to use batches. Use 
asynchronous queries to improve performance. executeAsync() is your friend.

A common misconception is that batches will improve performance. They don’t. 
Mostly they just increase the load on your cluster.

In my project, I have written a collection of classes that help me manage 
asynchronous queries. They aren’t complicated and didn’t take very long to 
write, but they take away most of the pain that occurs when you need to execute 
a whole bunch of asynchronous queries, and want to meter them out, wait for 
them to complete, etc. I probably execute 75% of my queries asynchronously. Its 
relatively painless.

On May 13, 2015, at 6:51 AM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:

Can lightweight txns be used in a batch update?

On Wed, May 13, 2015 at 5:48 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
The 6k is only the starting value, its expected to scale up to ~200 million 
records.

On Wed, May 13, 2015 at 5:44 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
You could use lightweight transactions to update only if the record is newer. 
It doesn’t avoid the read, it just happens under the covers, so it’s not really 
going to be faster compared to a read-before-write pattern (which is an 
anti-pattern, BTW). It is probably the easiest way to avoid getting a whole 
bunch of copies of each record.

But even with a read-before-write pattern, I don’t understand why you are 
worried about 6K records per hour. That’s nothing. You’re probably looking at 
several milliseconds to do the read and write for each record (depending on 
your storage, RF and CL), so you’re probably looking at under a minute to do 6K 
records. If you do them in parallel, you’re probably looking at several 
seconds. I don’t get why something that probably takes less than a minute that 
is done once an hour is a problem.

BTW, I wouldn’t do all 6K in parallel. I’d use some kind of limiter (e.g. a 
semaphore) to ensure that you don’t execute more than X queries at a time.

Robert

On May 13, 2015, at 6:20 AM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:

But your previous email talked about when T1 is different:

 Assume timestamp T1  T2 and you stored value V with timestamp T2. Then you 
 store V’ with timestamp T1.

What if you issue an update twice, but with the same timestamp? E.g if you ran:

Update  where foo=bar USING TIMESTAMP = 1000

and 1 hour later, you ran exactly the same query again. In this case, the value 
of T is the same for both queries. Would that still cause multiple values to be 
stored?

On Wed, May 13, 2015 at 5:17 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
It will cause an overhead (compaction and read) as I described in the previous 
email.

From: Ali Akhtar [mailto:ali.rac...@gmail.commailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 3:13 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified  current date)


 I don’t understand the ETL use case and its relevance here. Can you provide 
 more details?

Basically, every 1 hour a job runs which queries an external API and gets some 
records. Then, I want to take only new or updated records, and insert / update 
them in cassandra. For records that are already in cassandra and aren't 
modified, I want to ignore them.

Each record returns a lastModified datetime, I want to use that to determine 
whether a record was changed or not (if it was, it'd be updated, if not, it'd 
be ignored).

The issue was, I'm having to do a 'select lastModified from table where id = ?' 
query for every record, in order to determine if db lastModified  api 
lastModified or not. I was wondering if there was a way to avoid that.

If I use 'USING TIMESTAMP', would subsequent updates where lastModified is a 
value that was previously used, still create that overhead, or will they be 
ignored?

E.g if I issued an update where TIMESTAMP is X, then 1 hour later I issued 
another update where TIMESTAMP is still X, will that 2nd update essentially get 
ignored, or will it cause any overhead?

On Wed, May 13, 2015 at 5:02 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
USING TIMESTAMP doesn’t avoid compaction overhead.
When you modify data the value is stored along with a timestamp indicating the 
timestamp of the value.
Assume timestamp T1  T2 and you stored value V with timestamp T2. Then you 
store V’ with timestamp T1.
Now you have two values of V in the DB: V,T2, V’,T1
When you read the value of V from the DB you read both V,T2, V’,T1, 
Cassandra resolves the conflict by comparing the timestamp and returns V.
Compaction will later take care and remove V’,T1 from the DB.

I don’t understand the ETL use case and its relevance

Re: Updating only modified records (where lastModified current date)

2015-05-13 Thread Robert Wille
You could use lightweight transactions to update only if the record is newer. 
It doesn’t avoid the read, it just happens under the covers, so it’s not really 
going to be faster compared to a read-before-write pattern (which is an 
anti-pattern, BTW). It is probably the easiest way to avoid getting a whole 
bunch of copies of each record.

But even with a read-before-write pattern, I don’t understand why you are 
worried about 6K records per hour. That’s nothing. You’re probably looking at 
several milliseconds to do the read and write for each record (depending on 
your storage, RF and CL), so you’re probably looking at under a minute to do 6K 
records. If you do them in parallel, you’re probably looking at several 
seconds. I don’t get why something that probably takes less than a minute that 
is done once an hour is a problem.

BTW, I wouldn’t do all 6K in parallel. I’d use some kind of limiter (e.g. a 
semaphore) to ensure that you don’t execute more than X queries at a time.

Robert

On May 13, 2015, at 6:20 AM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:

But your previous email talked about when T1 is different:

 Assume timestamp T1  T2 and you stored value V with timestamp T2. Then you 
 store V’ with timestamp T1.

What if you issue an update twice, but with the same timestamp? E.g if you ran:

Update  where foo=bar USING TIMESTAMP = 1000

and 1 hour later, you ran exactly the same query again. In this case, the value 
of T is the same for both queries. Would that still cause multiple values to be 
stored?

On Wed, May 13, 2015 at 5:17 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
It will cause an overhead (compaction and read) as I described in the previous 
email.

From: Ali Akhtar [mailto:ali.rac...@gmail.commailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 3:13 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified  current date)


 I don’t understand the ETL use case and its relevance here. Can you provide 
 more details?

Basically, every 1 hour a job runs which queries an external API and gets some 
records. Then, I want to take only new or updated records, and insert / update 
them in cassandra. For records that are already in cassandra and aren't 
modified, I want to ignore them.

Each record returns a lastModified datetime, I want to use that to determine 
whether a record was changed or not (if it was, it'd be updated, if not, it'd 
be ignored).

The issue was, I'm having to do a 'select lastModified from table where id = ?' 
query for every record, in order to determine if db lastModified  api 
lastModified or not. I was wondering if there was a way to avoid that.

If I use 'USING TIMESTAMP', would subsequent updates where lastModified is a 
value that was previously used, still create that overhead, or will they be 
ignored?

E.g if I issued an update where TIMESTAMP is X, then 1 hour later I issued 
another update where TIMESTAMP is still X, will that 2nd update essentially get 
ignored, or will it cause any overhead?

On Wed, May 13, 2015 at 5:02 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
USING TIMESTAMP doesn’t avoid compaction overhead.
When you modify data the value is stored along with a timestamp indicating the 
timestamp of the value.
Assume timestamp T1  T2 and you stored value V with timestamp T2. Then you 
store V’ with timestamp T1.
Now you have two values of V in the DB: V,T2, V’,T1
When you read the value of V from the DB you read both V,T2, V’,T1, 
Cassandra resolves the conflict by comparing the timestamp and returns V.
Compaction will later take care and remove V’,T1 from the DB.

I don’t understand the ETL use case and its relevance here. Can you provide 
more details?

UPDATE in Cassandra updates specific rows. All of them are updated, nothing is 
ignored.


From: Ali Akhtar [mailto:ali.rac...@gmail.commailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 2:43 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified  current date)

Its rare for an existing record to have changes, but the etl job runs every 
hour, therefore it will send updates each time, regardless of whether there 
were changes or not.

(I'm assuming that USING TIMESTAMP here will avoid the compaction overhead, 
since that will cause it to not run any updates unless the timestamp is 
actually  last update timestamp?)

Also, is there a way to get the number of rows which were updated / ignored?

On Wed, May 13, 2015 at 4:37 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
The cost of issuing an UPDATE that won’t update anything is compaction 
overhead. Since you stated it’s rare for rows to be updated then the overhead 
should be negligible.

The easiest way to convert a milliseconds timestamp long value to microseconds 
is to multiply by 1000.

From: 

Re: Consistency Issues

2015-05-13 Thread Robert Wille
Timestamps have millisecond granularity. If you make multiple writes within the 
same millisecond, then the outcome is not deterministic.

Also, make sure you are running ntp. Clock skew will manifest itself similarly.

On May 13, 2015, at 3:47 AM, Jared Rodriguez 
jrodrig...@kitedesk.commailto:jrodrig...@kitedesk.com wrote:

Thanks for the feedback.  We have dug in deeper and upgraded to Cassandra 
2.0.14 and are seeing the same issue.  What appears to be happening is that if 
a record is initially written, then the first read is fine.  But if we 
immediately update that record with a second write, that then the second read 
is problematic.

We have a 4 node cluster and a replication factor of 2.  What seems to be 
happening on the initial write the record is sent to nodes A and B.  If a 
secondary write (update) of the record occurs while the record is in the 
memtable and not yet written to the sstable of A or B, that the next read 
returns nothing.

We are continuing to dig in and get as much detail as possible before opening 
this as a JIRA.

On Tue, May 12, 2015 at 6:51 PM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com wrote:
On Tue, May 12, 2015 at 12:35 PM, Michael Shuler 
mich...@pbandjelly.orgmailto:mich...@pbandjelly.org wrote:
This is a 4 node cluster running Cassandra 2.0.6

Can you reproduce the same issue on 2.0.14? (or better yet, the cassandra-2.0 
branch HEAD, which will soon ship 2.0.15) If you get the same results, please, 
open a JIRA with the reproduction steps.

And if you do file such a JIRA, please let the list know the JIRA URL, to close 
the loop!

=Rob




--
Jared Rodriguez




Re: query contains IN on the partition key and an ORDER BY

2015-05-02 Thread Robert Wille
Bag the IN clause and execute multiple parallel queries instead. It’s more 
performant anyway.

On May 2, 2015, at 11:46 AM, Abhishek Singh Bailoo 
abhishek.singh.bai...@gmail.commailto:abhishek.singh.bai...@gmail.com wrote:

Hi

I have run into the following issue 
https://issues.apache.org/jira/browse/CASSANDRA-6722 when running a query 
(contains IN on the partition key and an ORDER BY ) using datastax driver for 
Java.

However, I am able to run this query alright in cqlsh.

cqlsh: show version;
[cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3]

cqlsh:gps select * from log where imeih in 
('862170011627815@2015-01-29@03','862170011627815@2015-01-30@21','862170011627815@2015-01-30@04')
 and dtime  '2015-01-30 23:59:59' order by dtime desc limit 1;

The same query when run via datastax Java driver gives the following error:

Exception in thread main 
com.datastax.driver.core.exceptions.InvalidQueryException: Cannot page queries 
with both ORDER BY and a IN restriction on the partition key; you must either 
remove the ORDER BY or the IN and sort client side, or disable paging for this 
query
at 
com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)

Any ideas?

Thanks,
Abhishek.



Re: Inserting null values

2015-04-29 Thread Robert Wille
I’ve come across the same thing. I have a table with at least half a dozen 
columns that could be null, in any combination. Having a prepared statement for 
each permutation of null columns just isn’t going to happen. I don’t want to 
build custom queries each time because I have a really cool system of managing 
my queries that relies on them being prepared.

Fortunately for me, I should have at most a handful of tombstones in each 
partition, and most of my records are written exactly once. So, I just let the 
tombstones get written and they’ll eventually get compacted out and life will 
go on.

It’s annoying and not ideal, but what can you do?

On Apr 29, 2015, at 2:36 AM, Matthew Johnson 
matt.john...@algomi.commailto:matt.john...@algomi.com wrote:

Hi all,

I have some fields that I am storing into Cassandra, but some of them could be 
null at any given point. As there are quite a lot of them, it makes the code 
much more readable if I don’t check each one for null before adding it to the 
INSERT.

I can see a few Jiras around CQL 3 supporting inserting nulls:

https://issues.apache.org/jira/browse/CASSANDRA-3783
https://issues.apache.org/jira/browse/CASSANDRA-5648

But I have tested inserting null and it seems to work fine (when querying the 
table with cqlsh, it shows up as a red lowercase null).

Are there any obvious pitfalls to look out for that I have missed? Could it be 
a performance concern to insert a row with some nulls, as opposed to checking 
the values first and inserting the row and just omitting those columns?

Thanks!
Matt



Re: Reading hundreds of thousands of rows at once?

2015-04-22 Thread Robert Wille
Add more nodes to your cluster

On Apr 22, 2015, at 1:39 AM, John Anderson 
son...@gmail.commailto:son...@gmail.com wrote:

Hey, I'm looking at querying around 500,000 rows that I need to pull into a 
Pandas data frame for processing.  Currently testing this on a single cassandra 
node it takes around 21 seconds:

https://gist.github.com/sontek/4ca95f5c5aa539663eaf

I tried introducing multiprocessing so I could use 4 processes at a time to 
query this and I got it down to 14 seconds:

https://gist.github.com/sontek/542f13307ef9679c0094

Although shaving off 7 seconds is great it still isn't really where I would 
like to be in regards to performance, for this many rows I'd really like to get 
down to a max of 1-2 seconds query time.

What types of optimization's can I make to improve the read performance when 
querying a large set of data?  Will this timing speed up linearly as I add more 
nodes?

This is what the schema looks like currently:

https://gist.github.com/sontek/d6fa3fc1b6d085ad3fa4


I'm not tied to the current schema at all, its mostly just a replication of 
what we have in SQL Server. I'm more interested in what things I can change to 
make querying it faster.

Thanks,
John



Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
I should have been more clear. What I meant was that its about the same amount 
of work for the cluster to do a “select count(l)” as it is to do a “select l” 
(unlike in the RDBMS world, where count(l) can use the primary key index). The 
reason why is the coordinator has to retrieve all the rows from all the nodes 
and count them. The only thing you’re saving is that the rows don’t have to be 
sent to the client.

I heard from another Cassandra user that they found “select l to be faster 
than select count(l)”. I don’t know why that would be, but I’ve seen stranger 
things.

Robert

On Apr 22, 2015, at 7:49 AM, Mich Talebzadeh 
m...@peridale.co.ukmailto:m...@peridale.co.uk wrote:

Thanks Robert,

In RDBMS select count(1) basically returns the rows.

1 select count(1) from t
2 go

---
  30

(1 row affected)

Is count(1) fundamentally different in Cassandra?

Does count(1) means return (in my case) 1 three hundred thousand time?

Cheers,


Mich Talebzadeh

http://talebzadehmich.wordpress.comhttp://talebzadehmich.wordpress.com/

Author of the books A Practitioner’s Guide to Upgrading to Sybase ASE 15, 
ISBN 978-0-9563693-0-7.
co-author Sybase Transact SQL Guidelines Best Practices, ISBN 
978-0-9759693-0-4
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

From: Robert Wille [mailto:rwi...@fold3.com]
Sent: 22 April 2015 14:44
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: OperationTimedOut in selerct count statement in cqlsh

Keep in mind that select count(l) and select l amount to essentially the 
same thing.

On Apr 22, 2015, at 3:41 AM, Tommy Stendahl 
tommy.stend...@ericsson.commailto:tommy.stend...@ericsson.com wrote:


Hi,

Checkout CASSANDRA-8899, my guess is that you have to increase the timeout in 
cqlsh.

/Tommy
On 2015-04-22 11:15, Mich Talebzadeh wrote:
Hi,

I have a table of 300,000 rows.

When I try to do a simple

cqlsh:ase select count(1) from t;
OperationTimedOut: errors={}, last_host=127.0.0.1

Appreciate any feedback

Thanks,

Mich


NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.



Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
Keep in mind that select count(l) and select l amount to essentially the 
same thing.

On Apr 22, 2015, at 3:41 AM, Tommy Stendahl 
tommy.stend...@ericsson.commailto:tommy.stend...@ericsson.com wrote:

Hi,

Checkout CASSANDRA-8899, my guess is that you have to increase the timeout in 
cqlsh.

/Tommy

On 2015-04-22 11:15, Mich Talebzadeh wrote:
Hi,

I have a table of 300,000 rows.

When I try to do a simple

cqlsh:ase select count(1) from t;
OperationTimedOut: errors={}, last_host=127.0.0.1

Appreciate any feedback

Thanks,

Mich


NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.






Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
Use a counter table to maintain the count so you don’t have to compute it. When 
you do something that affects the count, its generally easy to issue an 
asynchronous query to update the counter in parallel with the actual work. It 
definitely complicates the code, especially if you have a lot of places where 
you do things that affect the count, but generally doesn’t cost much, if 
anything, in terms of performance.

Due to Cassandra’s eventually consistent model and lack atomicity, you need to 
write your code to deal gracefully with the possibility of the counter being 
inaccurate. How hard that is really depends a lot on your data model.

Robert

On Apr 22, 2015, at 8:07 AM, Mich Talebzadeh 
m...@peridale.co.ukmailto:m...@peridale.co.uk wrote:

Thanks Robert for explanation.

Please correct me if I am wrong.

Currently running a single node cluster of Cassandra. There is the primary key 
on object_id column in both RDBMS and Cassandra.

As you correctly pointed out RDBMS does not need to touch the base table. It 
can just go through the primary key B-tree index to work out the rows


   |ROOT:EMIT Operator (VA = 2)
   |
   |   |SCALAR AGGREGATE Operator (VA = 1)
   |   |  Evaluate Ungrouped COUNT AGGREGATE.
   |   |
   |   |   |SCAN Operator (VA = 0)
   |   |   |  FROM TABLE
   |   |   |  t
   |   |   |  Using Clustered Index.
   |   |   |  Index : t_ui
   |   |   |  Forward Scan.
   |   |   |  Positioning at index start.
   |   |   |  Index contains all needed columns. Base table will not be 
read.
   |   |   |  Using I/O Size 64 Kbytes for index leaf pages.
   |   |   |  With LRU Buffer Replacement Strategy for index leaf pages.


Total estimated I/O cost for statement 1 (at line 1): 144996.


---
  30


Whereas in Cassandra it has to retrieve every row and count the total of the 
rows without sending results back?

What are the other alternatives to make it faster if any?


Cheers,


Mich Talebzadeh

http://talebzadehmich.wordpress.comhttp://talebzadehmich.wordpress.com/

Author of the books A Practitioner’s Guide to Upgrading to Sybase ASE 15, 
ISBN 978-0-9563693-0-7.
co-author Sybase Transact SQL Guidelines Best Practices, ISBN 
978-0-9759693-0-4
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

From: Robert Wille [mailto:rwi...@fold3.com]
Sent: 22 April 2015 15:00
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: OperationTimedOut in selerct count statement in cqlsh

I should have been more clear. What I meant was that its about the same amount 
of work for the cluster to do a “select count(l)” as it is to do a “select l” 
(unlike in the RDBMS world, where count(l) can use the primary key index). The 
reason why is the coordinator has to retrieve all the rows from all the nodes 
and count them. The only thing you’re saving is that the rows don’t have to be 
sent to the client.

I heard from another Cassandra user that they found “select l to be faster 
than select count(l)”. I don’t know why that would be, but I’ve seen stranger 
things.

Robert

On Apr 22, 2015, at 7:49 AM, Mich Talebzadeh 
m...@peridale.co.ukmailto:m...@peridale.co.uk wrote:


Thanks Robert,

In RDBMS select count(1) basically returns the rows.

1 select count(1) from t
2 go

---
  30

(1 row affected)

Is count(1) fundamentally different in Cassandra?

Does count(1) means return (in my case) 1 three hundred thousand time?

Cheers,


Mich Talebzadeh

http://talebzadehmich.wordpress.comhttp://talebzadehmich.wordpress.com/

Author of the books A Practitioner’s Guide to Upgrading to Sybase ASE 15, 
ISBN 978-0-9563693-0-7.
co-author Sybase Transact SQL Guidelines Best Practices, ISBN 
978-0-9759693-0-4
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees

Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
And I should have read the post more clearly. I thought it was count(l), not 
count(1). But, either way, you’re counting the number of records in the table, 
which in the RDBMS world means scanning an index, and in Cassandra means the 
coordinator has to select all the records from all the nodes.

In general, counting records in Cassandra is bad. People are accustomed to 
counting being a cheap operation, but in any distributed database with 
replication, it is going to be expensive. If your data model requires that you 
count large number of records, then I recommend you revise your data model and 
maintain a counter. I know that can be a pain, but there really is not way to 
count records fast.

On Apr 22, 2015, at 7:49 AM, Mich Talebzadeh 
m...@peridale.co.ukmailto:m...@peridale.co.uk wrote:

Thanks Robert,

In RDBMS select count(1) basically returns the rows.

1 select count(1) from t
2 go

---
  30

(1 row affected)

Is count(1) fundamentally different in Cassandra?

Does count(1) means return (in my case) 1 three hundred thousand time?

Cheers,


Mich Talebzadeh

http://talebzadehmich.wordpress.comhttp://talebzadehmich.wordpress.com/

Author of the books A Practitioner’s Guide to Upgrading to Sybase ASE 15, 
ISBN 978-0-9563693-0-7.
co-author Sybase Transact SQL Guidelines Best Practices, ISBN 
978-0-9759693-0-4
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

From: Robert Wille [mailto:rwi...@fold3.com]
Sent: 22 April 2015 14:44
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: OperationTimedOut in selerct count statement in cqlsh

Keep in mind that select count(l) and select l amount to essentially the 
same thing.

On Apr 22, 2015, at 3:41 AM, Tommy Stendahl 
tommy.stend...@ericsson.commailto:tommy.stend...@ericsson.com wrote:


Hi,

Checkout CASSANDRA-8899, my guess is that you have to increase the timeout in 
cqlsh.

/Tommy
On 2015-04-22 11:15, Mich Talebzadeh wrote:
Hi,

I have a table of 300,000 rows.

When I try to do a simple

cqlsh:ase select count(1) from t;
OperationTimedOut: errors={}, last_host=127.0.0.1

Appreciate any feedback

Thanks,

Mich


NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.



Re: Delete-only work loads crash Cassandra

2015-04-15 Thread Robert Wille
I can readily reproduce the bug, and filed a JIRA ticket: 
https://issues.apache.org/jira/browse/CASSANDRA-9194

I’m posting for posterity

On Apr 13, 2015, at 11:59 AM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:

Unfortunately, I’ve switched email systems and don’t have my emails from that 
time period. I did not file a Jira, and I don’t remember who made the patch for 
me or if he filed a Jira on my behalf.

I vaguely recall seeing the fix in the Cassandra change logs, but I just went 
and read them and I don’t see it. I’m probably remembering wrong.

My suspicion is that the original patch did not make it into the main branch, 
and I just have always had enough concurrent writing to keep Cassandra happy.

Hopefully the author of the patch will read this and be able to chime in.

This issue is very reproducible. I’ll try to come up with some time to write a 
simple program that illustrates the problem and file a Jira.

Thanks

Robert

On Apr 13, 2015, at 10:39 AM, Philip Thompson 
philip.thomp...@datastax.commailto:philip.thomp...@datastax.com wrote:

Did the original patch make it into upstream? That's unclear. If so, what was 
the JIRA #? Have you filed a JIRA for the new problem?

On Mon, Apr 13, 2015 at 12:21 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I 
did lots of deletes and no upserts, Cassandra would report that the memtable 
was 0 bytes because an accounting error. The memtable would never flush and 
Cassandra would eventually die. Someone was kind enough to create a patch, 
which seemed to have fixed the problem, but last night it reared its ugly head.

I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, 
CL=1). The workload was pretty light, because this cleanup process is 
single-threaded and does everything synchronously. It was performing 4 reads 
per second and about 3000 deletes per second. Over the course of many hours, 
heap slowly grew on all nodes. CPU utilization also increased as GC consumed an 
ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of 
their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 
second GC pauses.

The workaround is pretty simple. This cleanup process can simply write a dummy 
record with a TTL periodically so that Cassandra can flush its memtables and 
function properly. However, I think this probably ought to be fixed. 
Delete-only workloads can’t be that rare. I can’t be the only one that needs to 
go through and cleanup their tables.

Robert






Re: Delete-only work loads crash Cassandra

2015-04-13 Thread Robert Wille
Unfortunately, I’ve switched email systems and don’t have my emails from that 
time period. I did not file a Jira, and I don’t remember who made the patch for 
me or if he filed a Jira on my behalf.

I vaguely recall seeing the fix in the Cassandra change logs, but I just went 
and read them and I don’t see it. I’m probably remembering wrong.

My suspicion is that the original patch did not make it into the main branch, 
and I just have always had enough concurrent writing to keep Cassandra happy.

Hopefully the author of the patch will read this and be able to chime in.

This issue is very reproducible. I’ll try to come up with some time to write a 
simple program that illustrates the problem and file a Jira.

Thanks

Robert

On Apr 13, 2015, at 10:39 AM, Philip Thompson 
philip.thomp...@datastax.commailto:philip.thomp...@datastax.com wrote:

Did the original patch make it into upstream? That's unclear. If so, what was 
the JIRA #? Have you filed a JIRA for the new problem?

On Mon, Apr 13, 2015 at 12:21 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I 
did lots of deletes and no upserts, Cassandra would report that the memtable 
was 0 bytes because an accounting error. The memtable would never flush and 
Cassandra would eventually die. Someone was kind enough to create a patch, 
which seemed to have fixed the problem, but last night it reared its ugly head.

I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, 
CL=1). The workload was pretty light, because this cleanup process is 
single-threaded and does everything synchronously. It was performing 4 reads 
per second and about 3000 deletes per second. Over the course of many hours, 
heap slowly grew on all nodes. CPU utilization also increased as GC consumed an 
ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of 
their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 
second GC pauses.

The workaround is pretty simple. This cleanup process can simply write a dummy 
record with a TTL periodically so that Cassandra can flush its memtables and 
function properly. However, I think this probably ought to be fixed. 
Delete-only workloads can’t be that rare. I can’t be the only one that needs to 
go through and cleanup their tables.

Robert





Delete-only work loads crash Cassandra

2015-04-13 Thread Robert Wille
Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I 
did lots of deletes and no upserts, Cassandra would report that the memtable 
was 0 bytes because an accounting error. The memtable would never flush and 
Cassandra would eventually die. Someone was kind enough to create a patch, 
which seemed to have fixed the problem, but last night it reared its ugly head.

I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, 
CL=1). The workload was pretty light, because this cleanup process is 
single-threaded and does everything synchronously. It was performing 4 reads 
per second and about 3000 deletes per second. Over the course of many hours, 
heap slowly grew on all nodes. CPU utilization also increased as GC consumed an 
ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of 
their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 
second GC pauses.

The workaround is pretty simple. This cleanup process can simply write a dummy 
record with a TTL periodically so that Cassandra can flush its memtables and 
function properly. However, I think this probably ought to be fixed. 
Delete-only workloads can’t be that rare. I can’t be the only one that needs to 
go through and cleanup their tables.

Robert



Help understanding aftermath of death by GC

2015-03-31 Thread Robert Wille
I moved my site over to Cassandra a few months ago, and everything has been 
just peachy until a few hours ago (yes, it would be in the middle of the night) 
when my entire cluster suffered death by GC. By death by GC, I mean this:

[rwille@cas031 cassandra]$ grep GC system.log | head -5
 INFO [ScheduledTasks:1] 2015-03-31 02:49:57,480 GCInspector.java (line 116) GC 
for ConcurrentMarkSweep: 30219 ms for 1 collections, 7664429440 used; max is 
8329887744
 INFO [ScheduledTasks:1] 2015-03-31 02:50:32,180 GCInspector.java (line 116) GC 
for ConcurrentMarkSweep: 30673 ms for 1 collections, 7707488712 used; max is 
8329887744
 INFO [ScheduledTasks:1] 2015-03-31 02:51:05,108 GCInspector.java (line 116) GC 
for ConcurrentMarkSweep: 30453 ms for 1 collections, 7693634672 used; max is 
8329887744
 INFO [ScheduledTasks:1] 2015-03-31 02:51:38,787 GCInspector.java (line 116) GC 
for ConcurrentMarkSweep: 30691 ms for 1 collections, 7686028472 used; max is 
8329887744
 INFO [ScheduledTasks:1] 2015-03-31 02:52:12,452 GCInspector.java (line 116) GC 
for ConcurrentMarkSweep: 30346 ms for 1 collections, 7701401200 used; max is 
8329887744

I’m pretty sure I know what triggered it. When I first started developing to 
Cassandra, I found the IN clause to be supremely useful, and I used it a lot. 
Later I figured out it was a bad thing and repented and fixed my code, but I 
missed one spot. A maintenance task spent a couple of hours repeatedly issuing 
queries with IN clauses with 1000 items in the clause and the whole system went 
belly up.

I get that my bad queries caused Cassandra to require more heap than was 
available, but here’s what I don’t understand. When the crap hit the fan, the 
maintenance task died due to a timeout error, but the cluster never recovered. 
I would have expected that when I was no longer issuing the bad queries, that 
the heap would get cleaned up and life would resume to normal. Can anybody help 
me understand why Cassandra wouldn’t recover? How is it that GC pressure will 
cause heap to be permanently uncollectable?

This makes me pretty worried. I can fix my code, but I don’t really have 
control over spikes. If memory pressure spikes, I can tolerate some timeouts 
and errors, but if it can’t come back when the pressure is gone, that seems 
pretty bad.

Any insights would be greatly appreciated

Robert




Re: Arbitrary nested tree hierarchy data model

2015-03-28 Thread Robert Wille
Ben Bromhead sent an email to me directly and expressed an interest in seeing 
some of my queries. I may as well post them for everyone. Here are my queries 
for the part of my code that reads and cleans up browse trees.

@NamedCqlQueries({
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CHECK_TREE_EXISTS,
query = SELECT tree FROM tree WHERE tree = :tree,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_TREES,
query = SELECT tree, atime, pub, rhpath FROM tree,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_TREE,
query = SELECT tree FROM tree,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_NODE,
query = SELECT hpath, tree FROM node,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_INDEX_PAGE,
query = SELECT page, tree FROM path_by_page,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_INDEX_PUB,
query = SELECT distinct tree, bucket FROM path_by_pub,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_INDEX_CHILD,
query = SELECT distinct phpath, bucket, tree FROM path_by_parent,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_TREE,
query = DELETE FROM tree WHERE tree IN :tree,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_NODE,
query = DELETE FROM node WHERE hpath IN :hpath AND tree = :tree,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_INDEX_PAGE,
query = DELETE FROM path_by_page WHERE page IN :page AND tree = :tree,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_INDEX_PUB,
query = DELETE FROM path_by_pub WHERE tree = :tree AND bucket IN :bucket,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_INDEX_CHILD,
query = DELETE FROM path_by_parent WHERE phpath = :phpath AND bucket = :bucket 
AND tree IN :tree,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_MAX_ORDINAL,
query = SELECT pord FROM path_by_pub WHERE tree = :tree AND bucket = :bucket 
ORDER BY pord DESC,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_PAGE,
query = SELECT page, tree, ord, hpath FROM path_by_page WHERE page = :page AND 
tree = :tree,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_PAGE_ALL_TREES,
query = SELECT page, tree, ord, hpath FROM path_by_page WHERE page = :page,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_NODE,
query = SELECT tree, hpath, node, ccount FROM node WHERE hpath = :hpath AND 
tree = :tree,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_NODE_ALL_TREES,
query = SELECT tree, hpath, node, ccount FROM node WHERE hpath = :hpath,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_TREE_FOR_HASHPATH,
query = SELECT tree, node FROM node WHERE hpath = :hpath,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_CHILDREN,
query = SELECT hpath FROM path_by_parent WHERE phpath = :phpath AND bucket = 
:bucket AND tree = :tree AND ord = :ord,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_CHILDREN,
query = SELECT hpath FROM path_by_parent WHERE phpath = :phpath AND bucket = 
:bucket AND tree = :tree,
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_NEIGHBORS_NEXT,
query = SELECT hpath FROM path_by_pub WHERE tree = :tree AND bucket = :bucket 
AND pord  :pord ORDER BY pord,
keyspace = KeyspaceFamilyImpl.BROWSE,

Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Robert Wille
 it in memory and maintain inverted 
indexes. Not caching this table would result in one additional trip to 
Cassandra for all API’s. Each tree is assigned a random tree ID (which is an 
INT instead of UUID for reasons beyond this discussion). All my tables have a 
tree ID in them so I can know which tree each node belongs to, since the hash 
path is not unique.

CREATE TABLE node (
hpath VARCHAR,
tree INT,
node VARCHAR,
ccount INT,
PRIMARY KEY (hpath, tree)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This is the main table that holds all the text and ordinal information for all 
my nodes. This table contains the lion’s share of the data in my cluster. The 
node column is a JSON document. ccount is the number of child nodes.

CREATE TABLE path_by_page (
page BIGINT,
tree INT,
hpath VARCHAR,
pord INT,
ord INT,
PRIMARY KEY (page, tree)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to get the hash path for any image. page is the primary 
key of the page from my relational database. pord is the image’s ordinal in the 
publication filmstrip. ord is the page’s ordinal amongst its siblings.

CREATE TABLE path_by_pub (
tree INT,
bucket INT,
pord INT,
ord INT,
hpath VARCHAR,
page BIGINT,
PRIMARY KEY ((tree, bucket), pord)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to do the filmstrip. pord is what I key off of to paginate.

CREATE TABLE path_by_parent (
phpath VARCHAR,
bucket INT,
tree INT,
ord INT,
hpath VARCHAR,
PRIMARY KEY ((phpath, bucket, tree), ord)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to get the children for a node. ord is the node's ordinal 
within its siblings. It is what I key off of to paginate. The link I provided 
above is to a publication that has a fanout of 1 for the leaf node’s parent 
nodes, so isn’t very interesting (although the content is very interesting). 
Here’s a more interesting node that has bigger fanout: 
http://www.fold3.com/browse.php#246|hrRUXqv6Rj6GFxKAohttp://www.fold3.com/browse.php#246%7ChrRUXqv6Rj6GFxKAo.
 And finally, here’s a node with a fanout of 378622: 
http://www.fold3.com/browse.php#1|hhqJwp03TQBwCAyoDhttp://www.fold3.com/browse.php#1%7ChhqJwp03TQBwCAyoD.

As long as this post is, it probably wasn’t enough to fully understand 
everything I do with my schema. I have dozens of queries. If anyone would like 
me to dig a little deeper, I’d be happy to. Just email me.

Robert

On Mar 27, 2015, at 5:35 PM, Ben Bromhead 
b...@instaclustr.commailto:b...@instaclustr.com wrote:

+1 would love to see how you do it

On 27 March 2015 at 07:18, Jonathan Haddad 
j...@jonhaddad.commailto:j...@jonhaddad.com wrote:
I'd be interested to see that data model. I think the entire list would benefit!

On Thu, Mar 26, 2015 at 8:16 PM Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
I have a cluster which stores tree structures. I keep several hundred unrelated 
trees. The largest has about 180 million nodes, and the smallest has 1 node. 
The largest fanout is almost 400K. Depth is arbitrary, but in practice is 
probably less than 10. I am able to page through children and siblings. It 
works really well.

Doesn’t sound like its exactly like what you’re looking for, but if you want 
any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List 
l...@airstreamcomm.netmailto:l...@airstreamcomm.net wrote:

 Not sure if this is the right place to ask, but we are trying to model a 
 user-generated tree hierarchy in which they create child objects of a root 
 node, and can create an arbitrary number of children (and children of 
 children, and on and on).  So far we have looked at storing each tree 
 structure as a single document in JSON format and reading/writing it out in 
 it's entirety, doing materialized paths where we store the root id with every 
 child and the tree structure above the child as a map, and some form of an 
 adjacency list (which does not appear to be very viable as looking up the 
 entire tree would be ridiculous).

 The hope is to end up with a data model that allows us to display the entire 
 tree quickly, as well as see the entire path to a leaf when selecting that 
 leaf.  If anyone has some suggestions/experience on how to model such a tree 
 heirarchy we would greatly appreciate your input.





--

Ben Bromhead

Instaclustr | www.instaclustr.comhttps://www.instaclustr.com/ | 
@instaclustrhttp://twitter.com/instaclustr | (650) 284 9692



Re: Arbitrary nested tree hierarchy data model

2015-03-26 Thread Robert Wille
I have a cluster which stores tree structures. I keep several hundred unrelated 
trees. The largest has about 180 million nodes, and the smallest has 1 node. 
The largest fanout is almost 400K. Depth is arbitrary, but in practice is 
probably less than 10. I am able to page through children and siblings. It 
works really well. 

Doesn’t sound like its exactly like what you’re looking for, but if you want 
any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List l...@airstreamcomm.net wrote:

 Not sure if this is the right place to ask, but we are trying to model a 
 user-generated tree hierarchy in which they create child objects of a root 
 node, and can create an arbitrary number of children (and children of 
 children, and on and on).  So far we have looked at storing each tree 
 structure as a single document in JSON format and reading/writing it out in 
 it's entirety, doing materialized paths where we store the root id with every 
 child and the tree structure above the child as a map, and some form of an 
 adjacency list (which does not appear to be very viable as looking up the 
 entire tree would be ridiculous).
 
 The hope is to end up with a data model that allows us to display the entire 
 tree quickly, as well as see the entire path to a leaf when selecting that 
 leaf.  If anyone has some suggestions/experience on how to model such a tree 
 heirarchy we would greatly appreciate your input.
 



Re: using or in select query in cassandra

2015-03-02 Thread Robert Wille
I would also like to add that if you avoid IN and use async queries instead, it 
is pretty trivial to use a semaphore or some other limiting mechanism to put a 
ceiling on the amount on concurrent work you are sending to the cluster. If you 
use a query with an IN clause with a thousand things, you’ll make the cluster 
look for a thousand records concurrently. If you issue a thousand asyncQueries, 
and use a limiting mechanism, then you can control how much load you are 
placing on the server.

I built a nice wrapper around the Session object, and one of the things that is 
built into the wrapper is the ability to limit the number of concurrent async 
queries. It’s a really nice and simple feature to have.

Robert

On Mar 2, 2015, at 10:33 AM, Jonathan Haddad 
j...@jonhaddad.commailto:j...@jonhaddad.com wrote:

I'd like to add that in() is usually a bad idea.  It is convenient, but not 
really what you want in production.  Go with Jens' original suggestion of 
multiple queries.

I recommend reading Ryan Svihla's post on why in() is generally a bad thing: 
http://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

On Mon, Mar 2, 2015 at 12:36 AM Jens Rantil 
jens.ran...@tink.semailto:jens.ran...@tink.se wrote:
Hi Rahul,

No, you can't do this in a single query. You will need to execute two separate 
queries if the requirements are on different columns. However, if you'd like to 
select multiple rows of with restriction on the same column you can do that 
using the `IN` construct:

select * from table where id IN (123,124);

See [1] for reference.

[1] 
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

Cheers,
Jens

On Mon, Mar 2, 2015 at 7:06 AM, Rahul Srivastava 
srivastava.robi...@gmail.commailto:srivastava.robi...@gmail.com wrote:
Hi
 I want to make uniqueness for my data so i need to add OR clause  in my WHERE 
clause.
ex: select * from table where id =123 OR name ='abc'
so in above i want that i get data if my id is 123 or my name is abc .

is there any possibility in cassandra to achieve this .




--
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.semailto:jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.sehttp://www.tink.se/

Facebookhttps://www.facebook.com/#!/tink.se 
Linkedinhttp://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitterhttps://twitter.com/tink



Unexplained query slowness

2015-02-25 Thread Robert Wille
Our Cassandra database just rolled to live last night. I’m looking at our query 
performance, and overall it is very good, but perhaps 1 in 10,000 queries takes 
several hundred milliseconds (up to a full second). I’ve grepped for GC in the 
system.log on all nodes, and there aren’t any recent GC events. I’m executing 
~500 queries per second, which produces negligible load and CPU utilization. I 
have very minimal writes (one every few minutes). The slow queries are across 
the board. There isn’t one particular query that is slow.

I’m running 2.0.12 with SSD’s. I’ve got a 10 node cluster with RF=3.

I have no idea where to even begin to look. Any thoughts on where to start 
would be greatly appreciated.

Robert



Why does C* repeatedly compact the same tables over and over?

2015-01-08 Thread Robert Wille
After bootstrapping a node, the node repeatedly compacts the same tables over 
and over, even though my cluster is completely idle. I’ve noticed the same 
behavior after extended periods of heavy writes. I realize that during 
bootstrapping (or extended periods of heavy writes) that compaction could get 
seriously behind, but once a table has been compacted, I don’t see the need to 
recompact the table dozens of more times.

Possibly related, I often see that OpsCenter reports that nodes have a large 
number of pending tasks, when Pending column of the Thread Pool Stats doesn’t 
reflect that.

Robert



Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Robert Wille
Tombstones have to be created. The SSTables are immutable, so the data cannot 
be deleted. Therefore, a tombstone is required. The value you deleted will be 
physically removed during compaction.

My workload sounds similar to yours in some respects, and I was able to get C* 
working for me. I have large chunks of data which I periodically replace. I 
write the new data, update a reference, and then delete the old data. I 
designed my schema to be tombstone-friendly, and C* works great. For some of my 
tables I am able to delete entire partitions. Because of the reference that I 
updated, I never try to access the old data, and therefore the tombstones for 
these partitions are never read. The old data simply has to wait for 
compaction. Other tables require deleting records within partitions. These 
tombstones do get read, so there are performance implications. I was able to 
design my schema so that no partition ever has more than a few tombstones (one 
for each generation of deleted data, which is usually no more than one).

Hope this helps.

Robert

On Dec 16, 2014, at 8:22 AM, Ian Rose 
ianr...@fullstory.commailto:ianr...@fullstory.com wrote:

Howdy all,

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I know 
that C* is not well suited to this kind of workload, but that's where we are, 
and before I go looking for an entirely new data layer I would rather explore 
whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always 
occur by backend processes to clean up data after it has been processed, and 
thus they do not need to be 100% available.  So this made me think, what if I 
did the following?

  *   gc_grace_seconds = 0, which ensures that tombstones are never created
  *   replication factor = 3
  *   for writes that are inserts, consistency = QUORUM, which ensures that 
writes can proceed even if 1 replica is slow/down
  *   for deletes, consistency = ALL, which ensures that when we delete a 
record it disappears entirely (no need for tombstones)
  *   for reads, consistency = QUORUM

Also, I should clarify that our data essentially append only, so I don't need 
to worry about inconsistencies created by partial updates (e.g. value gets 
changed on one machine but not another).  Sometimes there will be duplicate 
writes, but I think that should be fine since the value is always identical.

Any red flags with this approach?  Has anyone tried it and have experiences to 
share?  Also, I *think* that this means that I don't need to run repairs, which 
from an ops perspective is great.

Thanks, as always,
- Ian




Re: Hinted handoff not working

2014-12-16 Thread Robert Wille
Nope. I added millions of records and several GB to the cluster while one node 
was down, and then ran nodetool flush system hints on a couple of nodes that 
were up, and system/hints has less than 200K in it.

Here’s the relevant part of nodetool cfstats system.hints:

Keyspace: system
Read Count: 28572
Read Latency: 0.01806502869942601 ms.
Write Count: 351
Write Latency: 0.04547008547008547 ms.
Pending Tasks: 0
Table: hints
SSTable count: 1
Space used (live), bytes: 7446
Space used (total), bytes: 80062
SSTable Compression Ratio: 0.2651441528992549
Number of keys (estimate): 128
Memtable cell count: 1
Memtable data size, bytes: 1740

The hints are definitely not being stored.

Robert

On Dec 14, 2014, at 11:44 PM, Jens Rantil 
jens.ran...@tink.semailto:jens.ran...@tink.se wrote:

Hi Robert ,

Maybe you need to flush your memtables to actually see the disk usage increase? 
This applies to both hosts.

Cheers,
Jens




On Sun, Dec 14, 2014 at 3:52 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:

I have a cluster with RF=3. If I shut down one node, add a bunch of data to the 
cluster, I don’t see a bunch of records added to system.hints. Also, du of 
/var/lib/cassandra/data/system/hints of the nodes that are up shows that hints 
aren’t being stored. When I start the down node, its data doesn’t grow until I 
run repair, which then takes a really long time because it is significantly out 
of date. Is there some magic setting I cannot find in the documentation to 
enable hinted handoff? I’m running 2.0.11. Any insights would be greatly 
appreciated.

Thanks

Robert





Hinted handoff not working

2014-12-14 Thread Robert Wille
I have a cluster with RF=3. If I shut down one node, add a bunch of data to the 
cluster, I don’t see a bunch of records added to system.hints. Also, du of 
/var/lib/cassandra/data/system/hints of the nodes that are up shows that hints 
aren’t being stored. When I start the down node, its data doesn’t grow until I 
run repair, which then takes a really long time because it is significantly out 
of date. Is there some magic setting I cannot find in the documentation to 
enable hinted handoff? I’m running 2.0.11. Any insights would be greatly 
appreciated. 

Thanks

Robert



Re: Hinted handoff not working

2014-12-14 Thread Robert Wille
I’ve got hinted_handoff_enabled: true in cassandra.yaml. My settings are all 
default except for the DC, listen addresses and snitch. I should have mentioned 
this in my original post.

On Dec 14, 2014, at 8:02 AM, Rahul Neelakantan ra...@rahul.be wrote:

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__hinted_handoff_enabled
 
 Rahul
 
 On Dec 14, 2014, at 9:46 AM, Robert Wille rwi...@fold3.com wrote:
 
 I have a cluster with RF=3. If I shut down one node, add a bunch of data to 
 the cluster, I don’t see a bunch of records added to system.hints. Also, du 
 of /var/lib/cassandra/data/system/hints of the nodes that are up shows that 
 hints aren’t being stored. When I start the down node, its data doesn’t grow 
 until I run repair, which then takes a really long time because it is 
 significantly out of date. Is there some magic setting I cannot find in the 
 documentation to enable hinted handoff? I’m running 2.0.11. Any insights 
 would be greatly appreciated. 
 
 Thanks
 
 Robert
 



Observations/concerns with repair and hinted handoff

2014-12-09 Thread Robert Wille
I have spent a lot of time working with single-node, RF=1 clusters in my 
development. Before I deploy a cluster to our live environment, I have spent 
some time learning how to work with a multi-node cluster with RF=3. There were 
some surprises. I’m wondering if people here can enlighten me. I don’t exactly 
have that warm, fuzzy feeling.

I created a three-node cluster with RF=3. I then wrote to the cluster pretty 
heavily to cause some dropped mutation messages. The dropped messages didn’t 
trickle in, but came in a burst. I suspect full GC is the culprit, but I don’t 
really know. Anyway, I ended up with 17197 dropped mutation messages on node 1, 
6422 on node 2, and none on node 3. In order to learn about repair, I waited 
for compaction to finish doing its thing, recorded the size and estimated 
number of keys for each table, started up repair (nodetool repair keyspace) 
on all three nodes, and waited for it to complete before doing anything else 
(even reads). When repair and compaction were done, I checked the size and 
estimated number of keys for each table. All tables on all nodes grew in size 
and estimated number of keys. The estimated number of keys for each node grew 
by 65k, 272k and 247k (.2%, .7% and .6%) for nodes 1, 2 and 3 respectively. I 
expected some growth, but that’s significantly more new keys than I had dropped 
mutation messages. I also expected the most new data on node 1, and none on 
node 3, which didn’t come close to what actually happened. Perhaps a mutation 
message contains more than one record? Perhaps the dropped mutation message 
counter is incremented on the coordinator, not the node that was overloaded?

I repeated repair, and the second time around the tables remained unchanged, as 
expected. I would hope that repair wouldn’t do anything to the tables if they 
were in sync. 

Just to be clear, I’m not overly concerned about the unexpected increase in 
number of keys. I’m pretty sure that repair did the needful thing and did bring 
the nodes in sync. The unexpected results more likely indicates that I’m 
ignorant, and it really bothers me when I don’t understand something. If you 
have any insights, I’d appreciate them.

One of the dismaying things about repair was that the first time around it took 
about 4 hours, with a completely idle cluster (except for repairs, of course), 
and only 6 GB of data on each node. I can bootstrap a node with 6 GB of data in 
a couple of minutes. That makes repair something like 50 to 100 times more 
expensive than bootstrapping. I know I should run repair on one node at a time, 
but even if you divide by three, that’s still a horrifically long time for such 
a small amount of data. The second time around, repair only took 30 minutes. 
That’s much better, but best-case is still about 10x longer than bootstrapping. 
Should repair really be taking this long? When I have 300 GB of data, is a 
best-case repair going to take 25 hours, and a repair with a modest amount of 
work more than 100 hours? My records are quite small. Those 6 GB contain almost 
40 million partitions. 

Following my repair experiment, I added a fourth node, and then tried killing a 
node and importing a bunch of data while the node was down. As far as repair is 
concerned, this seems to work fine (although again, glacially). However, I 
noticed that hinted handoff doesn’t seem to be working. I added several million 
records (with consistency=one), and nothing appeared in system.hints (du -hs 
showed a few dozen K bytes), nor did I get any pending Hinted Handoff tasks in 
the Thread Pool Stats. When I started up the down node (less than 3 hours 
later), the missed data didn’t appear to get sent to it. The tables did not 
grow, compaction events didn’t schedule, and there wasn’t any appreciable CPU 
utilization by the cluster. With millions of records that were missed while it 
was down, I should have noticed something if it actually was replaying the 
hints. Is there some magic setting to turn on hinted handoffs? Were there too 
many hints and so it just deleted them? My assumption is that if hinted handoff 
is working, then my need for repair should be much less, which given my 
experience so far, would be a really good thing.

Given the horrifically long time it takes to repair a node, and hinted handoff 
apparently not working, if a node goes down, is it better to bootstrap a new 
one than to repair the node that went down? I would expect that even if I chose 
to bootstrap a new node, it would need to be repaired anyway, since it would 
probably miss writes while bootstrapping.

Thanks in advance

Robert



Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread Robert Wille
At the data modeling class at the Cassandra Summit, the instructor said that 
lots of small partitions are just fine. I’ve heard on this list that that is 
not true, and that its better to cluster small partitions into fewer, larger 
partitions. Due to conflicting information on this issue, I’d be interested in 
hearing people’s opinions.

For the sake of discussion, lets compare two tables:

CREATE TABLE a (
id INT,
value INT,
PRIMARY KEY (id)
)

CREATE TABLE b (
bucket INT,
id INT,
value INT,
PRIMARY KEY ((bucket), id)
)

And lets say that bucket is computed as id / N. For analysis purposes, lets 
assume I have 100 million id’s to store.

Table a is obviously going to have a larger bloom filter. That’s a clear 
negative.

When I request a record, table a will have less data to load from disk, so that 
seems like a positive.

Table a will never have its columns scattered across multiple SSTables, but 
table b might. If I only want one row from a partition in table b, does 
fragmentation matter (I think probably not, but I’m not sure)?

It’s not clear to me which will fit more efficiently on disk, but I would guess 
that table a wins.

Smaller partitions means sending less data during repair, but I suspect that 
when computing the Merkle tree for the table, more partitions might mean more 
overhead, but that’s only a guess. Which one repairs more efficiently?

In your opinion, which one is best and why? If you think table b is best, what 
would you choose N to be?

Robert



Repair taking many snapshots per minute

2014-12-04 Thread Robert Wille
This is a follow-up to my previous post “Cassandra taking snapshots 
automatically?”. I’ve renamed the thread to better describe the new information 
I’ve discovered.

I have a four node, RF=3, 2.0.11 cluster that was producing snapshots at a 
prodigious rate. I let the cluster sit idle overnight to settle down, and 
deleted all the snapshots. I waited for a while to make sure it really was done 
creating snapshots. I then ran nodetool repair test2_browse” on one node and 
immediately got snapshots on three of my four nodes. Here’s what my 
/var/lib/cassandra/data/test2_browse/path_by_parent/snapshots directory looks 
like after a few minutes:

drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
33adb6b0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
33aea110-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
33af6460-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
33b027b0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
33b0c3f0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
33b1ae50-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
33b24a90-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
3a2d1300-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
3a2daf40-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
3a2e4b80-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
3a2ee7c0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
3a2f5cf0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:09 
3a2ff930-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
40bbb190-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
40bc74e0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
40bd1120-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
40bdd470-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
40be70b0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
474b3a80-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
474c24e0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
474d3650-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
4dd9d910-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
4ddac370-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
546877a0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
54696200-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
546a7370-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
546b36c0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:10 
5af73d40-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
60ee7dd0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
60ef4120-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
60efdd60-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
60f0a0b0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
60f16400-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
677d1c60-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
677e06c0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
6e0bbaf0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
749a5980-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
7b28f810-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:11 
81b796a0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:12 
87ae3af0-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:12 
87aed730-7bbf-11e4-893d-d96c3e745723
drwxr-xr-x 2 cassandra cassandra 40960 Dec  4 07:12 
87af7370-7bbf-11e4-893d-d96c3e745723

I also get lots of events like these in system.log:

ERROR [AntiEntropySessions:1] 2014-12-03 13:35:40,541 CassandraDaemon.java 
(line 199) Exception in thread Thread[AntiEntropySessions:1,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Failed during snapshot 
creation.
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 

Cassandra taking snapshots automatically?

2014-12-03 Thread Robert Wille
I built my first multi-node cluster and populated it with a bunch of data, and 
ran out of space far more quickly than I expected. On one node, I ended up with 
76 snapshots, consuming a total of 220 GB of space. I only have 40 GB of data. 
It took several snapshots per hour, sometimes within a minute of each other. I 
don’t know why it would have any snapshots at all. I never consciously asked it 
to take a snapshot. I didn’t truncate or drop any CF’s or keyspaces or make any 
schema changes, and certainly not 76 times.

Any idea what would have caused Cassandra to take all these snapshots, and how 
I can make it stop?

Thanks in advance

Robert



Re: Cassandra taking snapshots automatically?

2014-12-03 Thread Robert Wille
No. auto_snapshot is turned on, but not snapshot_before_compaction.

On Dec 3, 2014, at 10:30 AM, Eric Stevens 
migh...@gmail.commailto:migh...@gmail.com wrote:

Do you  have snapshot_before_compaction enabled?
http://datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__snapshot_before_compaction

On Wed Dec 03 2014 at 10:25:12 AM Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
I built my first multi-node cluster and populated it with a bunch of data, and 
ran out of space far more quickly than I expected. On one node, I ended up with 
76 snapshots, consuming a total of 220 GB of space. I only have 40 GB of data. 
It took several snapshots per hour, sometimes within a minute of each other. I 
don’t know why it would have any snapshots at all. I never consciously asked it 
to take a snapshot. I didn’t truncate or drop any CF’s or keyspaces or make any 
schema changes, and certainly not 76 times.

Any idea what would have caused Cassandra to take all these snapshots, and how 
I can make it stop?

Thanks in advance

Robert




Re: Recommissioned node is much smaller

2014-12-03 Thread Robert Wille
Load and ownership didn’t correlate nearly as well as I expected. I have lots 
and lots of very small records. I would expect very high correlation.

I think the moral of the story is that I shouldn’t delete the system directory. 
If I have issues with a node, I should recommission it properly.

Robert

On Dec 3, 2014, at 10:23 AM, Eric Stevens 
migh...@gmail.commailto:migh...@gmail.com wrote:

How does the difference in load compare to the effective ownership?  If you 
deleted the system directory as well, you should end up with new ranges, so I'm 
wondering if perhaps you just ended up with a really bad shuffle. Did you run 
removenode on the old host after you took it down (I assume so since all nodes 
are in UN status)?  Is the test node in its own seeds list?

On Tue Dec 02 2014 at 4:10:10 PM Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
I didn’t do anything except kill the server process, delete /var/lib/cassandra, 
and start it back up again. nodetool status shows all nodes as UN, and doesn’t 
display any unexpected nodes.

I don’t know if this sheds any light on the issue, but I’ve added a 
considerable amount of data to the cluster since I did the aforementioned test. 
The difference in size between the nodes is shrinking. The other nodes are 
growing more slowly than the one I recommissioned. That was definitely not 
something that I expected, and I don’t have any explanation for that either.

Robert

On Dec 2, 2014, at 3:38 PM, Tyler Hobbs 
ty...@datastax.commailto:ty...@datastax.com wrote:


On Tue, Dec 2, 2014 at 2:21 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
As a a test, I took down a node, deleted /var/lib/cassandra and restarted it.

Did you decommission or removenode it when you took it down?  If you didn't, 
the old node is still in the ring, and affects the replication.


--
Tyler Hobbs
DataStaxhttp://datastax.com/




Re: Cassandra taking snapshots automatically?

2014-12-03 Thread Robert Wille
No. auto_snapshot is turned on, but snapshot_before_compaction is off.

Maybe this will shed some light on it. I tried running nodetool repair. I got 
several messages saying Lost notification. You should check server log for 
repair status of keyspace test2_browse”.

I looked in system.log, and I have errors where repair is trying to create a 
snapshot. Not sure why repair is trying to create snapshots, or why it is 
failing. I also now have about 200 snapshots. One table has just one. Another 
table has 124.

There’s so much that’s so odd about this.

I’m running 2.0.11.

Robert

On Dec 3, 2014, at 10:30 AM, Eric Stevens 
migh...@gmail.commailto:migh...@gmail.com wrote:

Do you  have snapshot_before_compaction enabled?
http://datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__snapshot_before_compaction

On Wed Dec 03 2014 at 10:25:12 AM Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
I built my first multi-node cluster and populated it with a bunch of data, and 
ran out of space far more quickly than I expected. On one node, I ended up with 
76 snapshots, consuming a total of 220 GB of space. I only have 40 GB of data. 
It took several snapshots per hour, sometimes within a minute of each other. I 
don’t know why it would have any snapshots at all. I never consciously asked it 
to take a snapshot. I didn’t truncate or drop any CF’s or keyspaces or make any 
schema changes, and certainly not 76 times.

Any idea what would have caused Cassandra to take all these snapshots, and how 
I can make it stop?

Thanks in advance

Robert




Recommissioned node is much smaller

2014-12-02 Thread Robert Wille
As a a test, I took down a node, deleted /var/lib/cassandra and restarted it. 
After it joined the cluster, it’s about 75% the size of its neighbors (both in 
terms of bytes and numbers of keys). Prior to my test it was approximately the 
same size. I have no explanation for why that node would shrink so much, other 
than data loss. I have no deleted data, and no TTL’s. Only a small percentage 
of my data has had any updates (and some of my tables have had only inserts, 
and those have shrunk by 25% as well). I don’t really know how to check if I 
have records that have fewer than three replicas (RF=3).

Any thoughts would be greatly appreciated.

Thanks

Robert



Re: Recommissioned node is much smaller

2014-12-02 Thread Robert Wille
I meant to mention that I had run repair, but neglected to do so. Sorry about 
that. Repair runs pretty quick (a fraction of the time that compaction takes) 
and doesn’t seem to do anything.

On Dec 2, 2014, at 1:44 PM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com wrote:

On Tue, Dec 2, 2014 at 12:21 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
As a a test, I took down a node, deleted /var/lib/cassandra and restarted it. 
After it joined the cluster, it’s about 75% the size of its neighbors (both in 
terms of bytes and numbers of keys). Prior to my test it was approximately the 
same size. I have no explanation for why that node would shrink so much, other 
than data loss. I have no deleted data, and no TTL’s. Only a small percentage 
of my data has had any updates (and some of my tables have had only inserts, 
and those have shrunk by 25% as well). I don’t really know how to check if I 
have records that have fewer than three replicas (RF=3).

Sounds suspicious, actually. I would suspect partial-bootstrap.

To determine if you have under-replicated data, run repair. That's what it's 
for.

=Rob




Re: Recommissioned node is much smaller

2014-12-02 Thread Robert Wille
I didn’t do anything except kill the server process, delete /var/lib/cassandra, 
and start it back up again. nodetool status shows all nodes as UN, and doesn’t 
display any unexpected nodes.

I don’t know if this sheds any light on the issue, but I’ve added a 
considerable amount of data to the cluster since I did the aforementioned test. 
The difference in size between the nodes is shrinking. The other nodes are 
growing more slowly than the one I recommissioned. That was definitely not 
something that I expected, and I don’t have any explanation for that either.

Robert

On Dec 2, 2014, at 3:38 PM, Tyler Hobbs 
ty...@datastax.commailto:ty...@datastax.com wrote:


On Tue, Dec 2, 2014 at 2:21 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
As a a test, I took down a node, deleted /var/lib/cassandra and restarted it.

Did you decommission or removenode it when you took it down?  If you didn't, 
the old node is still in the ring, and affects the replication.


--
Tyler Hobbs
DataStaxhttp://datastax.com/



Partial replication to a DC

2014-11-25 Thread Robert Wille
Is it possible to replicate a subset of the keyspaces to a data center? For 
example, if I want to run reports without impacting my production nodes, can I 
put the relevant column families in a keyspace and create a DC for reporting 
that replicates only that keyspace?

Robert

Re: Getting the counters with the highest values

2014-11-24 Thread Robert Wille
We do get a large number of documents getting counts each day, which is why I’m 
thinking the running totals table be ((doc_id), day) rather than ((day), 
doc_id). We have too many documents per day to materialize in memory, so 
querying per day and aggregating the results isn’t really possible.

I’m planning on bucketing the materialized ordering because we get enough 
unique document views per day that the rows will be quite large. Not so large 
as to be unmanageable, but pushing the limits. If we were so lucky as to get a 
significant increase in traffic, I might start having issues. I didn’t include 
bucketing in my post because I didn’t want to complicate my question. I hadn’t 
considered that I could bucket by hour and then use a local midnight instead of 
a global midnight. Interesting idea.

Thanks for your response.

Robert

On Nov 24, 2014, at 9:40 AM, Eric Stevens 
migh...@gmail.commailto:migh...@gmail.com wrote:

You're right that there's no way to use the counter data type to materialize a 
view ordered by the counter.  Computing this post hoc is the way to go if your 
needs allow for it (if not, something like Summingbird or vanilla Storm may be 
necessary).

I might suggest that you make your primary key for your running totals by day 
table be ((day), doc_id) because it will make it easy to compute the 
materialized ordered view (SELECT doc_id, count FROM running_totals WHERE 
day=?) unless you expect to have a really large number of documents getting 
counts each day.

For your materialized ordering, I'd suggest a primary key of ((day), count) as 
then for a given day you'll be able to select top by count (SELECT count, 
doc_id FROM doc_counts WHERE day=? ORDER BY count DESC).

One more thing to consider if your users are not all in a single timezone is 
having your time bucket be hour instead of day so that you can answer by day 
goal posted by local midnight (except the handful of locations that use half 
hour timezone offsets) instead of a single global midnight.  You can then 
either include either just each hour in each row (and aggregate at read time), 
or you can make each row a rolling 24 hours (aggregating at write time), 
depending on which use case fits your needs better.

On Sun Nov 23 2014 at 8:42:11 AM Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
I’m working on moving a bunch of counters out of our relational database to 
Cassandra. For the most part, Cassandra is a very nice fit, except for one 
feature on our website. We manage a time series of view counts for each 
document, and display a list of the most popular documents in the last seven 
days. This seems like a pretty strong anti-pattern for Cassandra, but also 
seems like something a lot of people would want to do. If you’re keeping 
counters, its pretty likely that you’d want to know which ones have the highest 
counts.

Here’s what I came up with to implement this feature. Create a counter table 
with primary key (doc_id, day) and a single counter. Whenever a document is 
viewed, increment the counter for the document for today and the previous six 
days. Sometime after midnight each day, compile the counters into a table with 
primary key (day, count, doc_id) and no additional columns. For each partition 
in the counter table, I would sum up the counters, delete any counters that are 
over a week old, and put the sum into the second table with day = today. When I 
query the table, i would ask for data where day = yesterday. During the 
compilation process, I would delete old partitions. In theory I’d only need two 
partitions. One that is being built, and one for querying.

I’d be interested to hear critiques on this strategy, as well as hearing how 
other people have implemented a most-popular feature using Cassandra counters.

Robert




Getting the counters with the highest values

2014-11-23 Thread Robert Wille
I’m working on moving a bunch of counters out of our relational database to 
Cassandra. For the most part, Cassandra is a very nice fit, except for one 
feature on our website. We manage a time series of view counts for each 
document, and display a list of the most popular documents in the last seven 
days. This seems like a pretty strong anti-pattern for Cassandra, but also 
seems like something a lot of people would want to do. If you’re keeping 
counters, its pretty likely that you’d want to know which ones have the highest 
counts. 

Here’s what I came up with to implement this feature. Create a counter table 
with primary key (doc_id, day) and a single counter. Whenever a document is 
viewed, increment the counter for the document for today and the previous six 
days. Sometime after midnight each day, compile the counters into a table with 
primary key (day, count, doc_id) and no additional columns. For each partition 
in the counter table, I would sum up the counters, delete any counters that are 
over a week old, and put the sum into the second table with day = today. When I 
query the table, i would ask for data where day = yesterday. During the 
compilation process, I would delete old partitions. In theory I’d only need two 
partitions. One that is being built, and one for querying.

I’d be interested to hear critiques on this strategy, as well as hearing how 
other people have implemented a most-popular feature using Cassandra counters.

Robert



LOCAL_* consistency levels

2014-10-14 Thread Robert Wille
I’m wondering if there’s a best practice for an annoyance I’ve come across.

Currently all my environments (dev, staging and live) have a single DC. In the 
future my live environment will most likely have a second DC. When that 
happens, I’ll want to use LOCAL_* consistency levels. However, if I write my 
code with LOCAL_* consistency levels, an exception is thrown. I’ve forgotten 
the exact verbiage, but its something about having a NetworkTopologyStrategy 
that doesn’t support local consistency levels. t don’t really want to change 
all my queries when I have a second DC, nor do I want to check my environment 
for every query. Is there a nice way to use LOCAL_* consistency levels and have 
Cassandra do the appropriate thing when there is a single DC?

Thanks in advance

Robert



Deleting counters

2014-10-09 Thread Robert Wille
At the Cassandra Summit I became aware of that there are issues with deleting 
counters. I have a few questions about that. What is the bad thing that happens 
(or can possibly happen) when a counter is deleted? Is it safe to delete an 
entire row of counters? Is there any 2.0.x version of Cassandra in which it is 
safe to delete counters? Is there an access pattern in which it is safe to 
delete counters in 2.0.x?

Thanks

Robert



Re: IN versus multiple asynchronous queries

2014-10-06 Thread Robert Wille
As far as latency is concerned, it seems like it wouldn't matter very much if 
the coordinator has to wait for all the responses to come back, or the client 
waits for all the responses to come back. I’ve got the same latency either way.

I would assume that 50 coordinations is more expensive than one coordination 
that does 50 times the work, but that’s probably insignificant when compared to 
the actual fetching of the data from the SSTables.

I do see the point about putting stress on coordinator memory. In general, the 
documents will be very small, but there will occasionally be some rather large 
ones, potentially several megabytes in size. Definitely better to not make the 
coordinator hold on to that memory while it waits for other requests to come 
back.

Robert

On Oct 4, 2014, at 8:34 AM, DuyHai Doan 
doanduy...@gmail.commailto:doanduy...@gmail.com wrote:

Definitely 50 concurrent queries, possibly in async mode.

If you're using the IN clause with 50 values, the coordinator will block, 
waiting for 50 partitions to be fetched from different nodes (worst case = 50 
nodes) before responding to client. In addition to the very  high latency, 
you'll put the stress on the coordinator memory.



On Sat, Oct 4, 2014 at 3:09 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
I have a table of small documents (less than 1K) that are often accessed 
together as a group. The group size is always less than 50. Which produces less 
load on the server, one query using an IN clause to get all 50 back together, 
or 50 concurrent queries? Which one is fastest?

Thanks

Robert





IN versus multiple asynchronous queries

2014-10-04 Thread Robert Wille
I have a table of small documents (less than 1K) that are often accessed 
together as a group. The group size is always less than 50. Which produces less 
load on the server, one query using an IN clause to get all 50 back together, 
or 50 concurrent queries? Which one is fastest?

Thanks

Robert



Cassandra + Solr

2014-10-04 Thread Robert Wille
I am architecting a solution for moving a large number of documents out of our 
MySQL database to C*. We use Solr to index these documents. I’ve recently 
become aware of a few different packages that integrate C* and Solr. At first 
blush, this seems like the perfect fit, as it would eliminate a complicated and 
somewhat fragile system that manages the indexing of our documents. However, I 
have a significant concern that would certainly be a show-stopper, unless I’m 
mistaken about some assumptions I’ve made. If any of you can confirm that my 
concern is justified, or let me know where I’m wrong, I’d greatly appreciate it.

So here’s what I think will be an issue. The guiding principle in setting up 
our current Solr cluster (which was done before my time), is that the index has 
to fit in the heap. Each node in our current Solr cluster has 32 GB of RAM and 
runs with a 28 GB heap, and serves up less than 28 GB of index. When I think 
about combining Solr and C*, it would seem that I’d need to cough up 8 GB for 
C*, leaving 20 GB for Solr. The entire Solr index and the entire MySQL database 
are roughly the same size. If I assume that the data will consume roughly the 
same space on C* as it does on MySQL, then each C* would be limited to roughly 
the same amount of data as the Solr index, or about 20 GB. If I have a 
replication factor of 3, then I need a node for each 7 GB of data. That is 
obviously a problem. We have about 1 TB of data. It would take about 150 nodes.

Is C*/Solr integration only feasible for applications for which a small subset 
of the data is indexed in Solr, or am I mistaken about the requirement that the 
Solr index be able to fit in heap?

Thanks in advance

Robert



Re: Manually deleting sstables

2014-08-21 Thread Robert Wille
 
 2) Are there any other recommended procedures for this?

0) stop writes to columnfamily
1) TRUNCATE columnfamily;
2) nodetool clearsnapshot # on the snapshot that results
3) DROP columnfamily;

My two cents here is that this process is extremely difficult to automate,
making testing that involves dropping column families very difficult.

Robert





Re: Securing Cassandra database

2014-04-05 Thread Robert Wille
Password protection doesn¹t protect against an engineer accidentally running
test cases using the live config file instead of the test config file. To
protect against that, our RDBMS system will only accept connections from
certain IP addresses. Is there an equivalent thing in Cassandra, or should
we configure firewall software for that?

From:  Mark Reddy mark.re...@boxever.com
Reply-To:  user@cassandra.apache.org
Date:  Saturday, April 5, 2014 at 12:38 AM
To:  user@cassandra.apache.org
Subject:  Re: Securing Cassandra database

Ok so you want to enable auth on Cassandra itself. You will want to look
into the authentication and authorisation functionality then.

Here is a quick overview:
http://www.datastax.com/dev/blog/a-quick-tour-of-internal-authentication-and
-authorization-security-in-datastax-enterprise-and-apache-cassandra

This section of the docs should give you the technical details needed to
move forward on this:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/security/secur
ityTOC.html


Mark 


On Sat, Apr 5, 2014 at 7:31 AM, Check Peck comptechge...@gmail.com wrote:
 Just to add, nobody should be able to read and write into our Cassandra
 database through any API or any CQL client as well only our team should be
 able to do that.
 
 
 On Fri, Apr 4, 2014 at 11:29 PM, Check Peck comptechge...@gmail.com wrote:
 Thanks Mark. But what about Cassandra database? I don't want anybody to read
 and write into our Cassandra database through any API only just our team
 should be able to do that.
 
 We are using CQL based tables so data doesn't get shown on the OPSCENTER.
 
 In our case, we would like to secure database itself. Is this possible to do
 as well anyhow?
 
 
 
 
 
 On Fri, Apr 4, 2014 at 11:24 PM, Mark Reddy mark.re...@boxever.com wrote:
 Hi, 
 
 If you want to just secure OpsCenter itself take a look here:
 http://www.datastax.com/documentation/opscenter/4.1/opsc/configure/opscAssig
 ningAccessRoles_t.html
 
 
 If you want to enable internal authentication and still allow OpsCenter
 access, you can create an OpsCenter user and once you have auth turned
 within the cluster update the cluster config with the user name and password
 for the OpsCenter user.
 
 Depending on your installation type you will find the cluster config in one
 of the following locations:
 Packaged installs: /etc/opscenter/clusters/cluster_specific.conf
 Binary installs: install_location/conf/clusters/cluster_specific.conf
 Windows installs: Program Files (x86)\DataStax
 Community\opscenter\conf\clusters\cluster_specific.conf
 
 Open the file and update the username and password values under the
 [cassandra] section:
 
 [cassandra]
 username = 
 seed_hosts = 
 api_port =
 password = 
 
 After changing properties in this file, restart OpsCenter for the changes to
 take effect.
 
 
 Mark
 
 
 On Sat, Apr 5, 2014 at 6:54 AM, Check Peck comptechge...@gmail.com wrote:
 Hi All,
 
 
 We would like to secure our Cassandra database. We don¹t want anybody to
 read/write on our Cassandra database leaving our team members only.
  
 We are using Cassandra 1.2.9 in Production and we have 36 node Cassandra
 cluster. 12 in each colo as we have three datacenters.
 
 
 
 But we would like to have OPSCENTER working as it is working currently.
  
 Is this possible to do anyhow? Is there any settings in yaml file which we
 can enforce? 
 
 
  
 
 
 





Re: Flushing after dropping a column family

2014-02-26 Thread Robert Wille
I use truncate between my test cases. Never had a problem with one test
case inheriting the data from the previous one. I¹m using a single node,
so that may be why.

On 2/26/14, 9:27 AM, Ben Hood 0x6e6...@gmail.com wrote:

On Wed, Feb 26, 2014 at 3:58 PM, DuyHai Doan doanduy...@gmail.com wrote:
 Try truncate foo instead of drop table foo.

 About the nodetool clearsnapshot, I've experienced the same behavior
also
 before. Snapshots cleaning is not immediate

I get the same behavior with truncate as well.




Re: [OT]: Can I have a non-delivering subscription?

2014-02-22 Thread Robert Wille
Yeah, it¹s called a rule. Set one up to delete everything from
user@cassandra.apache.org.

On 2/22/14, 10:32 AM, Paul LeoNerd Evans leon...@leonerd.org.uk
wrote:

A question about the mailing list itself, rather than Cassandra.

I've re-subscribed simply because I have to be subscribed in order to
send to the list, as I sometimes try to when people Cc questions about
my Net::Async::CassandraCQL perl module to me. However, if I want to
read the list, I usually do so on the online archives and not by mail.

Is it possible to have a non-delivering subscription, which would let
me send messages, but doesn't deliver anything back to me?

-- 
Paul LeoNerd Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/




Re: Lots of deletions results in death by GC

2014-02-05 Thread Robert Wille
Yes. It¹s kind of an unusual workload. An insertion phase followed by a
deletion phase, generally not overlapping.

From:  Benedict Elliott Smith belliottsm...@datastax.com
Reply-To:  user@cassandra.apache.org
Date:  Tuesday, February 4, 2014 at 5:29 PM
To:  user@cassandra.apache.org
Subject:  Re: Lots of deletions results in death by GC

Is it possible you are generating exclusively deletes for this table?


On 5 February 2014 00:10, Robert Wille rwi...@fold3.com wrote:
 I ran my test again, and Flush Writer¹s ³All time blocked² increased to 2 and
 then shortly thereafter GC went into its death spiral. I doubled
 memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried
 again.
 
 This time, the table that always sat with Memtable data size = 0 now showed
 increases in Memtable data size. That was encouraging. It never flushed, which
 isn¹t too surprising, because that table has relatively few rows and they are
 pretty wide. However, on the fourth table to clean, Flush Writer¹s ³All time
 blocked² went to 1, and then there were no more completed events, and about 10
 minutes later GC went into its death spiral. I assume that each time Flush
 Writer completes an event, that means a table was flushed. Is that right?
 Also, I got two dropped mutation messages at the same time that Flush Writer¹s
 All time blocked incremented.
 
 I then increased the writers and queue size to 3 and 12, respectively, and ran
 my test again. This time All time blocked remained at 0, but I still suffered
 death by GC.
 
 I would almost think that this is caused by high load on the server, but I¹ve
 never seen CPU utilization go above about two of my eight available cores. If
 high load triggers this problem, then that is very disconcerting. That means
 that a CPU spike could permanently cripple a node. Okay, not permanently, but
 until a manual flush occurs.
 
 If anyone has any further thoughts, I¹d love to hear them. I¹m quite at the
 end of my rope.
 
 Thanks in advance
 
 Robert
 
 From:  Nate McCall n...@thelastpickle.com
 Reply-To:  user@cassandra.apache.org
 Date:  Saturday, February 1, 2014 at 9:25 AM
 To:  Cassandra Users user@cassandra.apache.org
 Subject:  Re: Lots of deletions results in death by GC
 
 What's the output of 'nodetool tpstats' while this is happening? Specifically
 is Flush Writer All time blocked increasing? If so, play around with turning
 up memtable_flush_writers and memtable_flush_queue_size and see if that helps.
 
 
 On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille rwi...@fold3.com wrote:
 A few days ago I posted about an issue I¹m having where GC takes a long time
 (20-30 seconds), and it happens repeatedly and basically no work gets done.
 I¹ve done further investigation, and I now believe that I know the cause. If
 I do a lot of deletes, it creates memory pressure until the memtables are
 flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is
 good again (although that takes a very long time because of the GC issue). If
 I just leave the flushing to Cassandra, then I end up with death by GC. I
 believe that when the memtables are full of tombstones, Cassadnra doesn¹t
 realize how much memory the memtables are actually taking up, and so it
 doesn¹t proactively flush them in order to free up heap.
 
 As I was deleting records out of one of my tables, I was watching it via
 nodetool cfstats, and I found a very curious thing:
 
 Memtable cell count: 1285
 Memtable data size, bytes: 0
 Memtable switch count: 56
 
 As the deletion process was chugging away, the memtable cell count increased,
 as expected, but the data size stayed at 0. No flushing occurred.
 
 Here¹s the schema for this table:
 
 CREATE TABLE bdn_index_pub (
 
 tshard VARCHAR,
 
 pord INT,
 
 ord INT,
 
 hpath VARCHAR,
 
 page BIGINT,
 
 PRIMARY KEY (tshard, pord)
 
 ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
 
 
 I have a few tables that I run this cleaning process on, and not all of them
 exhibit this behavior. One of them reported an increasing number of bytes, as
 expected, and it also flushed as expected. Here¹s the schema for that table:
 
 
 CREATE TABLE bdn_index_child (
 
 ptshard VARCHAR,
 
 ord INT,
 
 hpath VARCHAR,
 
 PRIMARY KEY (ptshard, ord)
 
 ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
 
 
 In both cases, I¹m deleting the entire record (i.e. specifying just the first
 component of the primary key in the delete statement). Most records in
 bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just a
 handful of rows, but a few records can have up 10,000.
 
 Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
 doesn¹t seem like nearly enough to create a memory problem. Perhaps there are
 other flaws in the memory metering. Or perhaps there is some

Re: Lots of deletions results in death by GC

2014-02-04 Thread Robert Wille
I ran my test again, and Flush Writer¹s ³All time blocked² increased to 2
and then shortly thereafter GC went into its death spiral. I doubled
memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried
again.

This time, the table that always sat with Memtable data size = 0 now showed
increases in Memtable data size. That was encouraging. It never flushed,
which isn¹t too surprising, because that table has relatively few rows and
they are pretty wide. However, on the fourth table to clean, Flush Writer¹s
³All time blocked² went to 1, and then there were no more completed events,
and about 10 minutes later GC went into its death spiral. I assume that each
time Flush Writer completes an event, that means a table was flushed. Is
that right? Also, I got two dropped mutation messages at the same time that
Flush Writer¹s All time blocked incremented.

I then increased the writers and queue size to 3 and 12, respectively, and
ran my test again. This time All time blocked remained at 0, but I still
suffered death by GC.

I would almost think that this is caused by high load on the server, but
I¹ve never seen CPU utilization go above about two of my eight available
cores. If high load triggers this problem, then that is very disconcerting.
That means that a CPU spike could permanently cripple a node. Okay, not
permanently, but until a manual flush occurs.

If anyone has any further thoughts, I¹d love to hear them. I¹m quite at the
end of my rope.

Thanks in advance

Robert

From:  Nate McCall n...@thelastpickle.com
Reply-To:  user@cassandra.apache.org
Date:  Saturday, February 1, 2014 at 9:25 AM
To:  Cassandra Users user@cassandra.apache.org
Subject:  Re: Lots of deletions results in death by GC

What's the output of 'nodetool tpstats' while this is happening?
Specifically is Flush Writer All time blocked increasing? If so, play
around with turning up memtable_flush_writers and memtable_flush_queue_size
and see if that helps.


On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille rwi...@fold3.com wrote:
 A few days ago I posted about an issue I¹m having where GC takes a long time
 (20-30 seconds), and it happens repeatedly and basically no work gets done.
 I¹ve done further investigation, and I now believe that I know the cause. If I
 do a lot of deletes, it creates memory pressure until the memtables are
 flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is
 good again (although that takes a very long time because of the GC issue). If
 I just leave the flushing to Cassandra, then I end up with death by GC. I
 believe that when the memtables are full of tombstones, Cassadnra doesn¹t
 realize how much memory the memtables are actually taking up, and so it
 doesn¹t proactively flush them in order to free up heap.
 
 As I was deleting records out of one of my tables, I was watching it via
 nodetool cfstats, and I found a very curious thing:
 
 Memtable cell count: 1285
 Memtable data size, bytes: 0
 Memtable switch count: 56
 
 As the deletion process was chugging away, the memtable cell count increased,
 as expected, but the data size stayed at 0. No flushing occurred.
 
 Here¹s the schema for this table:
 
 CREATE TABLE bdn_index_pub (
 
 tshard VARCHAR,
 
 pord INT,
 
 ord INT,
 
 hpath VARCHAR,
 
 page BIGINT,
 
 PRIMARY KEY (tshard, pord)
 
 ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
 
 
 I have a few tables that I run this cleaning process on, and not all of them
 exhibit this behavior. One of them reported an increasing number of bytes, as
 expected, and it also flushed as expected. Here¹s the schema for that table:
 
 
 CREATE TABLE bdn_index_child (
 
 ptshard VARCHAR,
 
 ord INT,
 
 hpath VARCHAR,
 
 PRIMARY KEY (ptshard, ord)
 
 ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
 
 
 In both cases, I¹m deleting the entire record (i.e. specifying just the first
 component of the primary key in the delete statement). Most records in
 bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just a
 handful of rows, but a few records can have up 10,000.
 
 Still a further mystery, 1285 tombstones in the bdn_index_pub memtable doesn¹t
 seem like nearly enough to create a memory problem. Perhaps there are other
 flaws in the memory metering. Or perhaps there is some other issue that causes
 Cassandra to mismanage the heap when there are a lot of deletes. One other
 thought I had is that I page through these tables and clean them out as I go.
 Perhaps there is some interaction between the paging and the deleting that
 causes the GC problems and I should create a list of keys to delete and then
 delete them after I¹ve finished reading the entire table.
 
 I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1
 GB, in hopes that it would force Cassandra to flush

Lots of deletions results in death by GC

2014-02-01 Thread Robert Wille
A few days ago I posted about an issue I¹m having where GC takes a long time
(20-30 seconds), and it happens repeatedly and basically no work gets done.
I¹ve done further investigation, and I now believe that I know the cause. If
I do a lot of deletes, it creates memory pressure until the memtables are
flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is
good again (although that takes a very long time because of the GC issue).
If I just leave the flushing to Cassandra, then I end up with death by GC. I
believe that when the memtables are full of tombstones, Cassadnra doesn¹t
realize how much memory the memtables are actually taking up, and so it
doesn¹t proactively flush them in order to free up heap.

As I was deleting records out of one of my tables, I was watching it via
nodetool cfstats, and I found a very curious thing:

Memtable cell count: 1285
Memtable data size, bytes: 0
Memtable switch count: 56

As the deletion process was chugging away, the memtable cell count
increased, as expected, but the data size stayed at 0. No flushing occurred.

Here¹s the schema for this table:

CREATE TABLE bdn_index_pub (

tshard VARCHAR,

pord INT,

ord INT,

hpath VARCHAR,

page BIGINT,

PRIMARY KEY (tshard, pord)

) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };


I have a few tables that I run this cleaning process on, and not all of them
exhibit this behavior. One of them reported an increasing number of bytes,
as expected, and it also flushed as expected. Here¹s the schema for that
table:


CREATE TABLE bdn_index_child (

ptshard VARCHAR,

ord INT,

hpath VARCHAR,

PRIMARY KEY (ptshard, ord)

) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };


In both cases, I¹m deleting the entire record (i.e. specifying just the
first component of the primary key in the delete statement). Most records in
bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just
a handful of rows, but a few records can have up 10,000.

Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
doesn¹t seem like nearly enough to create a memory problem. Perhaps there
are other flaws in the memory metering. Or perhaps there is some other issue
that causes Cassandra to mismanage the heap when there are a lot of deletes.
One other thought I had is that I page through these tables and clean them
out as I go. Perhaps there is some interaction between the paging and the
deleting that causes the GC problems and I should create a list of keys to
delete and then delete them after I¹ve finished reading the entire table.

I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1
GB, in hopes that it would force Cassandra to flush tables before I ran into
death by GC, but it didn¹t seem to help.

I¹m using Cassandra 2.0.4.

Any insights would be greatly appreciated. I can¹t be the only one that has
periodic delete-heavy workloads. Hopefully someone else has run into this
and can give advice.

Thanks

Robert




GC taking a long time

2014-01-29 Thread Robert Wille
I read through the recent thread Cassandra mad GC, which seemed very
similar to my situation, but didn¹t really help.

Here is what I get from my logs when I grep for GCInspector. Note that this
is the middle of the night on a dev server, so there should have been almost
no load.

 INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 350 ms for 1 collections, 802776 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 338 ms for 1 collections, 7859905288 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:25,471 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 352 ms for 1 collections, 7911145688 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:38,473 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 359 ms for 1 collections, 7938204144 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:50,895 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 368 ms for 1 collections, 7988088408 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:50:03,345 GCInspector.java 

  1   2   >