from:"Edward Capriolo"

Re: Cassandra 3.10: ClassCastException in ThreadAwareSecurityManager

2017-03-31 Thread Edward Capriolo

I created https://issues.apache.org/jira/browse/CASSANDRA-13396 for you

https://issues.apache.org/jira/browse/CASSANDRA-13396

/**
* The purpose of this class is
*/ this purpose of this class is ...what ? this class is who? sicka sicka
slim shady.

On Thu, Mar 30, 2017 at 1:48 PM, Anton PASSIOUK <
anton.passi...@hsoftware.com> wrote:

> Hello
>
> After upgrading from Cassandra 3.6 to 3.10 I have suddenly started having
> errors like this:
>
> java.lang.ClassCastException: org.slf4j.impl.JDK14LoggerAdapter cannot be
> cast to ch.qos.logback.classic.Logger
> at org.apache.cassandra.cql3.functions.ThreadAwareSecurityManager.
> install(ThreadAwareSecurityManager.java:82)
> at org.apache.cassandra.service.CassandraDaemon.setup(
> CassandraDaemon.java:193)
> at org.apache.cassandra.service.CassandraDaemon.activate(
> CassandraDaemon.java:601)
> at com.ingalys.cassandra.CassandraWrapper.(
> CassandraWrapper.java:150)
> at com.ingalys.cassandra.Builder.build(Builder.java:22)
> at com.ingalys.soa.ServiceContainer$1.lambda$run$
> 0(ServiceContainer.java:172)
> at com.ingalys.fmk2.util.ThrowingFunction.apply(
> ThrowingFunction.java:14)
> at com.ingalys.fmk2.util.PromiseImpl.lambda$
> thenCompose$5(PromiseImpl.java:166)
>
> I am embedding Cassandra nodes in a container of mine and it happens that
> there are several slf4j bindings that are transitively brought to the
> classpath by other dependencies.
> I have read that in this case slf4j chooses one of the bindings
> more-or-less randomly, in my case it takes the "jdk14" implementation and
> makes Cassandra daemon (and me too) unhappy because there is a hard-coded
> cast to ch.qos.logback.classic.Logger in ThreadAwareSecurityManager:
>
> https://github.com/apache/cassandra/blob/trunk/src/java/
> org/apache/cassandra/cql3/functions/ThreadAwareSecurityManager.java#L83
>
> Of course this crashes if another slf4j binding is used (by accident like
> me, or as a conscious choice) so I was wondering if this code should check
> the type of the logger before cast and adopt some fallback behavior if
> slf4j is not bound to logback?
>
> Thanks and regards,
> --
> Anton PASSIOUK
> Horizon Software - Trade Your Way
> http://www.hsoftware.com/
>
>
>
>
>

Re: Assertions being hit on Cassandra 3.5 cluster (UnfilteredRowIterators.concat)

2017-03-22 Thread Edward Capriolo

On Wed, Mar 22, 2017 at 4:34 PM, Daniel Miranda  wrote:

> I found out the problem is conditioned to having the row cache enabled.
> Whenever a query would return an empty result set in a particular table, it
> would fail instead with the exception being thrown in all all nodes.
>
> Disabling the row cache for that particular table fixes the problem.
> Re-enabling has not caused the issue again yet. I do not have row cache
> saving enabled, and the issue persisted between node restarts, which it's
> somewhat strange.
>
> I couldn't reproduce the issue with any other tables. I can produce a
> sanitized dump of the SSTable if a developer desires.
>
> --
> Regards,
> Daniel
>
> On Wed, Mar 22, 2017 at 2:33 PM Daniel Miranda  wrote:
>
>> Thank you for the pointer Michael, I'll try to investigate if this is the
>> same bug I am seeing.
>>
>> I am afraid it might not be, since I'm observing the error periodically,
>> not just during compactions, and the traceback seems different.
>>
>> Regards,
>> Daniel
>>
>> On Wed, Mar 22, 2017 at 1:27 PM Michael Shuler 
>> wrote:
>>
>> Possibly https://issues.apache.org/jira/browse/CASSANDRA-12336, which
>> shows fixed in 3.0.9, 3.8. There are a couple related bug reports listed
>> on there, which you might investigate, as well.
>>
>> --
>> Kind regards,
>> Michael
>>
>> On 03/22/2017 11:21 AM, Daniel Miranda wrote:
>> > Greetings,
>> >
>> > Recently I've started to see the an assertion (traceback follows at the
>> > end of the message) causing exceptions in a 3-node Cassandra 3.5 cluster
>> > (running on Ubuntu 14.04 on Amazon EC2). It seems to happen in all
>> > nodes. Repairs run fine without indicating any errors.
>> >
>> > I can't seem to find any information about it from someone else or any
>> > bug reports.
>> >
>> > Should I bother running an SSTable scrub? Is it a known issue that is
>> > fixed in subsequent versions?
>> >
>> > Thanks in advance,
>> > Daniel
>> >
>> > ---
>> > WARN  [SharedPool-Worker-1] 2017-03-16 18:54:35,587
>> > AbstractLocalAwareExecutorService.java:169 - Uncaught exception on
>> > thread Thread[SharedPool-Worker-1,5,main]: {}
>> > java.lang.AssertionError: null
>> > at
>> > org.apache.cassandra.db.rows.UnfilteredRowIterators.concat(
>> UnfilteredRowIterators.java:157)
>> > ~[apache-cassandra-3.5.jar:3.5]
>> > at
>> > org.apache.cassandra.db.SinglePartitionReadCommand.getThroughCache(
>> SinglePartitionReadCommand.java:420)
>> > ~[apache-cassandra-3.5.jar:3.5]
>> > at
>> > org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(
>> SinglePartitionReadCommand.java:324)
>> > ~[apache-cassandra-3.5.jar:3.5]
>> > at
>> > org.apache.cassandra.db.ReadCommand.executeLocally(
>> ReadCommand.java:366)
>> > ~[apache-cassandra-3.5.jar:3.5]
>> > at
>> > org.apache.cassandra.service.StorageProxy$
>> LocalReadRunnable.runMayThrow(StorageProxy.java:1797)
>> > ~[apache-cassandra-3.5.jar:3.5]
>> > at
>> > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(
>> StorageProxy.java:2466)
>> > ~[apache-cassandra-3.5.jar:3.5]
>> > at
>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> > ~[na:1.8.0_111]
>> > at
>> > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
>> ice$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>> > ~[apache-cassandra-3.5.jar:3.5]
>> > at
>> > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
>> ice$LocalSessionFutureTask.run(AbstractLocalAwareExecutorServ
>> ice.java:136)
>> > [apache-cassandra-3.5.jar:3.5]
>> > at
>> > org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>> > [apache-cassandra-3.5.jar:3.5]
>> > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
>> > --
>> > *Daniel Miranda*
>> >
>> > DevOps Engineering Intern
>> > (11) 991959845
>> > www.cobli.co 
>>
>>
I would strongly advice not using the rowcache in the tick-tock series. It
is not on by default and as a result I believe not heavily used. I can not
find reference to it but a few months ago on this list someone pointed out
that it was not doing anything in there version at all. More trouble then
it is worth IMHO just use the standard caching.

Ye old singleton debate

2017-03-15 Thread Edward Capriolo

This question came up today:

OK, say you mock, how do you construct a working multi-process
representation of how C* actually works from within a unit test without
running the code that actually constructs the cluster?

1) Don't do that (construct a multinode cluster in a test) just mock the
crap out of it.

http://www.baeldung.com/mockito-verify

2) dtests
Dtest don't actually do this in the classic sense. One challenge is
code-coverage. For many projects I use cobertura.
http://www.mojohaus.org/cobertura-maven-plugin/. Cobertura can't (as far as
know) can not instrument N JVMs and give you coverage. Bringing up a full
on cluster to test something is slow, compute intensive, and quite hard.

3) Fix it
https://issues.apache.org/jira/browse/CASSANDRA-7837
https://issues.apache.org/jira/browse/CASSANDRA-10283

*Impossible you say!  No NoSQL JAVA DATABASE CAN DO THIS!!!*

https://www.elastic.co/guide/en/elasticsearch/reference/current/integration-tests.html

Wouldn't that be just the bees knees???

Re: scylladb

2017-03-12 Thread Edward Capriolo

On Sun, Mar 12, 2017 at 3:45 PM, Dor Laor <d...@scylladb.com> wrote:

> On Sun, Mar 12, 2017 at 12:11 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>> The simple claim that "Scylla IS a drop in replacement for C*" shows
>> that they clearly don't know as much as they think they do.
>>
>> Even if it did supposedly "support everything" it would not actually work
>> like that. For example, some things in Cassandra work "the way they work" .
>> They are not specifically defined in a unit test or a document that
>> describes how they are supposed to work. During a massive code port you can
>> not reason if the code still works the same way in all situations.
>>
>> Example, without using SEDA and using something else it definitely wont
>> work the same way when the thread pools fill up and it starts blocking,
>> dropping, whatever. There is so much implicitly undefined behavior.
>>
>
> According to your definition there is no such a thing as drop and
> replacement, doesn't it?
>
> One of our users asked us to add a protocol verb that identify Scylla as
> Scylla so they'll know which
> is which for the time they run 2 clusters.
>
> Look, if we'll claim we have all the features and when someone checks they
> see we don't have LWT then it makes us a bad service. Usually when we get
> someone (specific) interested, we map their C* usage and say what feature
> isn't yet there. So far it's just lack of those not-implemented yet
> features that hold users back. We do try to mimic the exact behaviour of C*.
>
> Clearly, I can't defend a 100% drop-in replacement. Once we implement
> someone's selected
> featureset, then we're a drop-in replacement for them and we're not a good
> match for others.
> We're not after quick wins, quite the opposite.
>
>
>> Also just for argument sake. YCSB proves nothing. Nothing. It generates
>> key-value data, and well frankly that is not the primary use case of
>> Cassandra. So again. Know what you don't know.
>>
>>
> a. We do not pretend we know it all.
> We do have a 3 year mileage with Cassandra and 2.5 with Scylla and we
> gained some knowledge... before we decided to go after the C* path, we
> considered
> to reimplement Mongo, HDFS, Kafka and few more examples and the fact
> we chose
> C* shows our appreciation to this project and not the opposite.
>
> b. YCSB is an industry standard, and that's why everybody use it.
> We don't like it at all since it doesn't have prepared statements
> (it's time that
> someone will merge this support).
> It's not a plain K/V since it's a table of 10 columns of 100b each.
> We do support wide rows and learned (the hard way) their challenge,
> especially
> with compaction, repair and streaming. The current Scylla code doesn't
> cache
> wide row beyond 10MB which isn't ideal. In 1.8 (next month) we have a
> partial
> row caching which supposed to be very good. During the past 20 months
> since
> our beta we tried to focus on good out-of-the-box experience to all
> real workloads
> and we knowingly deferred features like LWT since we wanted a good
> solid base
> before we reach feature parity. If we'll do a good job with a
> benchmark but a bad
> one in real workload, we just shot ourselves in the foot. This was the
> case around our
> beta but it was just a beta. Today we think we're in a very solid
> position. We still
> have lots to complete around repair (which is ok but not great). There
> is a work
> in progress to switch out from Merkle tree to a new algorithm and
> reduced latency
> (almost there). We have mixed feelings about anti-compaction for
> incremental repair
> but we're likely to go through this route too
>
>
>>
>>
>>
>> On Sun, Mar 12, 2017 at 2:15 PM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>>> I don't think Jeff comes across as angry.  He's simply pointing out that
>>> ScyllaDB isn't a drop in replacement for Cassandra.  Saying that it is is
>>> very misleading.  The marketing material should really say something like
>>> "drop in replacement for some workloads" or "aims to be a drop in
>>> replacement".  As is, it doesn't support everything, so it's not a drop in.
>>>
>>>
>>> On Sat, Mar 11, 2017 at 10:34 PM Dor Laor <d...@scylladb.com> wrote:
>>>
>>>> On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2017-03-10 09:57 (-0800), Rakesh Kum

Re: scylladb

2017-03-12 Thread Edward Capriolo

The simple claim that "Scylla IS a drop in replacement for C*" shows that
they clearly don't know as much as they think they do.

Even if it did supposedly "support everything" it would not actually work
like that. For example, some things in Cassandra work "the way they work" .
They are not specifically defined in a unit test or a document that
describes how they are supposed to work. During a massive code port you can
not reason if the code still works the same way in all situations.

Example, without using SEDA and using something else it definitely wont
work the same way when the thread pools fill up and it starts blocking,
dropping, whatever. There is so much implicitly undefined behavior.

Also just for argument sake. YCSB proves nothing. Nothing. It generates
key-value data, and well frankly that is not the primary use case of
Cassandra. So again. Know what you don't know.

On Sun, Mar 12, 2017 at 2:15 PM, Jonathan Haddad  wrote:

> I don't think Jeff comes across as angry.  He's simply pointing out that
> ScyllaDB isn't a drop in replacement for Cassandra.  Saying that it is is
> very misleading.  The marketing material should really say something like
> "drop in replacement for some workloads" or "aims to be a drop in
> replacement".  As is, it doesn't support everything, so it's not a drop in.
>
>
> On Sat, Mar 11, 2017 at 10:34 PM Dor Laor  wrote:
>
>> On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa  wrote:
>>
>>
>>
>> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote:
>> > Cassanda vs Scylla is a valid comparison because they both are
>> compatible. Scylla is a drop-in replacement for Cassandra.
>>
>> No, they aren't, and no, it isn't
>>
>>
>> Jeff is angry with us for some reason. I don't know why, it's natural
>> that when
>> a new opponent there are objections and the proof lies on us.
>> We go through great deal of doing it and we don't just throw comments
>> without backing.
>>
>> Scylla IS a drop in replacement for C*. We support the same CQL (from
>> version 1.7 it's cql 3.3.1, protocol v4), the same SStable format (based on
>> 2.1.8). In 1.7 release we support cql uploader
>> from 3.x. We will support the SStable format of 3.x natively in 3 month
>> time. Soon all of the feature set will be implemented. We always have been
>> using this page (not 100% up to date, we'll update it this week):
>> http://www.scylladb.com/technology/status/
>>
>> We add a jmx-proxy daemon in java in order to make the transition as
>> smooth as possible. Almost all the nodetool commands just work, for sure
>> all the important ones.
>> Btw: we have a RESTapi and Prometheus formats, much better than the hairy
>> jmx one.
>>
>> Spark, Kairosdb, Presto and probably Titan (we add Thrift just for legacy
>> users and we don't intend
>> to decommission an api).
>>
>> Regarding benchmarks, if someone finds a flaw in them, we'll do the best
>> to fix it.
>> Let's ignore them and just here what our users have to say:
>> http://www.scylladb.com/users/
>>
>>
>>

Re: scylladb

2017-03-12 Thread Edward Capriolo

On Sun, Mar 12, 2017 at 11:40 AM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

>
>
> On Sun, Mar 12, 2017 at 1:38 AM, benjamin roth <brs...@gmail.com> wrote:
>
>> There is no reason to be angry. This is progress. This is the circle of
>> live.
>>
>> It happens anywhere at any time.
>>
>> Am 12.03.2017 07:34 schrieb "Dor Laor" <d...@scylladb.com>:
>>
>>> On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote:
>>>> > Cassanda vs Scylla is a valid comparison because they both are
>>>> compatible. Scylla is a drop-in replacement for Cassandra.
>>>>
>>>> No, they aren't, and no, it isn't
>>>>
>>>
>>> Jeff is angry with us for some reason. I don't know why, it's natural
>>> that when
>>> a new opponent there are objections and the proof lies on us.
>>> We go through great deal of doing it and we don't just throw comments
>>> without backing.
>>>
>>> Scylla IS a drop in replacement for C*. We support the same CQL (from
>>> version 1.7 it's cql 3.3.1, protocol v4), the same SStable format (based on
>>> 2.1.8). In 1.7 release we support cql uploader
>>> from 3.x. We will support the SStable format of 3.x natively in 3 month
>>> time. Soon all of the feature set will be implemented. We always have been
>>> using this page (not 100% up to date, we'll update it this week):
>>> http://www.scylladb.com/technology/status/
>>>
>>> We add a jmx-proxy daemon in java in order to make the transition as
>>> smooth as possible. Almost all the nodetool commands just work, for sure
>>> all the important ones.
>>> Btw: we have a RESTapi and Prometheus formats, much better than the
>>> hairy jmx one.
>>>
>>> Spark, Kairosdb, Presto and probably Titan (we add Thrift just for
>>> legacy users and we don't intend
>>> to decommission an api).
>>>
>>> Regarding benchmarks, if someone finds a flaw in them, we'll do the best
>>> to fix it.
>>> Let's ignore them and just here what our users have to say:
>>> http://www.scylladb.com/users/
>>>
>>>
>>>
>
> Scylla is NOT a drop in replacement for Cassandra. Cassandra is a TM.
> Cassandra is NOT a certification body. You are not a certification body.
>
> "Scylla IS a drop in replacement for C*. We support the same CQL (from
> version 1.7 it's cql 3.3.1, protocol v4), the same SStable format (based on
> 2.1.8). In 1.7 release we support cql uploader
> from 3.x. We will support the SStable format of 3.x natively in 3 month
> time. Soon all of the feature set will be implemented. We always have been
> using this page (not 100% up to date, we'll update it this week):
> http://www.scylladb.com/technology/status/ "
>
> No matter how "compatible" you believe Scylla is you can not assert this
> claim.
>
>
>
Also there is no reason to say Jeff is 'angry' because he asserted his
believe in fact.

"No, they aren't, and no, it isn't"

Does not sound angry. t

Besides that your your own words prove it:

"Scylla IS a drop in replacement for C*"
"Soon all of the feature set will be implemented"

Something is NOT a "drop in replacement" when it does NOT have all the
features.

Also knowing Jeff who is very even keel, I highly doubt because he made a
short concise statement he is "angry".

That being said I am little bit angry by the shameless self promotion and
Jabbering on you seem to be doing. We get it you know about kernels and
page faults and want to talk endlessly about it.

Re: scylladb

2017-03-12 Thread Edward Capriolo

On Sun, Mar 12, 2017 at 1:38 AM, benjamin roth  wrote:

> There is no reason to be angry. This is progress. This is the circle of
> live.
>
> It happens anywhere at any time.
>
> Am 12.03.2017 07:34 schrieb "Dor Laor" :
>
>> On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa  wrote:
>>
>>>
>>>
>>> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote:
>>> > Cassanda vs Scylla is a valid comparison because they both are
>>> compatible. Scylla is a drop-in replacement for Cassandra.
>>>
>>> No, they aren't, and no, it isn't
>>>
>>
>> Jeff is angry with us for some reason. I don't know why, it's natural
>> that when
>> a new opponent there are objections and the proof lies on us.
>> We go through great deal of doing it and we don't just throw comments
>> without backing.
>>
>> Scylla IS a drop in replacement for C*. We support the same CQL (from
>> version 1.7 it's cql 3.3.1, protocol v4), the same SStable format (based on
>> 2.1.8). In 1.7 release we support cql uploader
>> from 3.x. We will support the SStable format of 3.x natively in 3 month
>> time. Soon all of the feature set will be implemented. We always have been
>> using this page (not 100% up to date, we'll update it this week):
>> http://www.scylladb.com/technology/status/
>>
>> We add a jmx-proxy daemon in java in order to make the transition as
>> smooth as possible. Almost all the nodetool commands just work, for sure
>> all the important ones.
>> Btw: we have a RESTapi and Prometheus formats, much better than the hairy
>> jmx one.
>>
>> Spark, Kairosdb, Presto and probably Titan (we add Thrift just for legacy
>> users and we don't intend
>> to decommission an api).
>>
>> Regarding benchmarks, if someone finds a flaw in them, we'll do the best
>> to fix it.
>> Let's ignore them and just here what our users have to say:
>> http://www.scylladb.com/users/
>>
>>
>>

Scylla is NOT a drop in replacement for Cassandra. Cassandra is a TM.
Cassandra is NOT a certification body. You are not a certification body.

"Scylla IS a drop in replacement for C*. We support the same CQL (from
version 1.7 it's cql 3.3.1, protocol v4), the same SStable format (based on
2.1.8). In 1.7 release we support cql uploader
from 3.x. We will support the SStable format of 3.x natively in 3 month
time. Soon all of the feature set will be implemented. We always have been
using this page (not 100% up to date, we'll update it this week):
http://www.scylladb.com/technology/status/ "

No matter how "compatible" you believe Scylla is you can not assert this
claim.

Re: scylladb

2017-03-11 Thread Edward Capriolo

On Sat, Mar 11, 2017 at 9:41 PM, daemeon reiydelle 
wrote:

> Recall that garbage collection on a busy node can occur minutes or seconds
> apart. Note that stop the world GC also happens as frequently as every
> couple of minutes on every node. Remove that and do the simple arithmetic.
>
>
> sent from my mobile
> Daemeon Reiydelle
> skype daemeon.c.m.reiydelle
> USA 415.501.0198 <(415)%20501-0198>
>
> On Mar 10, 2017 8:59 AM, "Bhuvan Rawal"  wrote:
>
>> Agreed C++ gives an added advantage to talk to underlying hardware with
>> better efficiency, it sound good but can a pice of code written in C++ give
>> 1000% throughput than a Java app? Is TPC design 10X more performant than
>> SEDA arch?
>>
>> And if C/C++ is indeed that fast how can Aerospike (which is itself
>> written in C) claim to be 10X faster than Scylla here
>> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining
>> your's and aerospike's benchmarks it appears that Aerospike is 100X
>> performant than C* - I highly doubt that!! )
>>
>> For a moment lets forget about evaluating 2 different databases, one can
>> observe 10X performance difference between a mistuned cassandra cluster and
>> one thats tuned as per data model - there are so many Tunables in yaml as
>> well as table configs.
>>
>> Idea is - in order to strengthen your claim, you need to provide complete
>> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
>> with the configs used. Having plain ops per second and 99p latency is
>> blackbox.
>>
>> Regards,
>> Bhuvan
>>
>> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity  wrote:
>>
>>> ScyllaDB engineer here.
>>>
>>> C++ is really an enabling technology here. It is directly responsible
>>> for a small fraction of the gain by executing faster than Java.  But it is
>>> indirectly responsible for the gain by allowing us direct control over
>>> memory and threading.  Just as an example, Scylla starts by taking over
>>> almost all of the machine's memory, and dynamically assigning it to
>>> memtables, cache, and working memory needed to handle requests in flight.
>>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>>> fully.  You can't do these things in Java.
>>>
>>> I would say the major contributors to Scylla performance are:
>>>  - thread-per-core design
>>>  - replacement of the page cache with a row cache
>>>  - careful attention to many small details, each contributing a little,
>>> but with a large overall impact
>>>
>>> While I'm here I can say that performance is not the only goal here, it
>>> is stable and predictable performance over varying loads and during
>>> maintenance operations like repair, without any special tuning.  We measure
>>> the amount of CPU and I/O spent on foreground (user) and background
>>> (maintenance) tasks and divide them fairly.  This work is not complete but
>>> already makes operating Scylla a lot simpler.
>>>
>>>
>>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>>
>>> I dont think ScyllaDB performance is because of C++. The design
>>> decisions in scylladb are indeed different from Cassandra such as getting
>>> rid of SEDA and moving to TPC and so on.
>>>
>>> If someone thinks it is because of C++ then just show the benchmarks
>>> that proves it is indeed the C++ which gave 10X performance boost as
>>> ScyllaDB claims instead of stating it.
>>>
>>>
>>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <
>>> mrbur...@gmail.com> wrote:
>>>
 They spend an enormous amount of time focusing on performance. You can
 expect them to continue on with their optimization and keep crushing it.

 P.S., I don't work for ScyllaDB.

 On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <
 rakeshkumar...@outlook.com> wrote:

> In all of their presentation they keep harping on the fact that
> scylladb is written in C++ and does not carry the overhead of Java.  Still
> the difference looks staggering.
> 
> From: daemeon reiydelle 
> Sent: Thursday, March 9, 2017 14:21
> To: user@cassandra.apache.org
> Subject: Re: scylladb
>
> The comparison is fair, and conservative. Did substantial performance
> comparisons for two clients, both results returned throughputs that were
> faster than the published comparisons (15x as I recall). At that time the
> client preferred to utilize a Cass COTS solution and use a caching 
> solution
> for OLA compliance.
>
>
> ...
>
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>
> London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>
>
> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen > wrote:
> I was wondering how people feel about the comparison that's made here
> between Cassandra and ScyllaDB :

Re: scylladb

2017-03-11 Thread Edward Capriolo

On Sat, Mar 11, 2017 at 2:08 PM, Bhuvan Rawal  wrote:

> "Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
> there's nothing to tune."
>  - The details are indeed compelling to have a go ahead and test it for
> specific use case.
>
> If it works out good it can lead to good cost cut in infra costs as well
> as having to manage less servers plus probably less time to bootstrap &
> decommission nodes!
>
> It will also be interesting to have a benchmark with Cassandra 3 version
> as well, as the new storage engine is said to have better performance:
> https://www.datastax.com/2015/12/storage-engine-30
>
> Regards,
> Bhuvan
>
> On Sat, Mar 11, 2017 at 2:59 PM, Avi Kivity  wrote:
>
>> There is no magic 10X bullet.  It's a mix of multiple factors, which can
>> come up to less than 10X in some circumstances and more than 10X in others,
>> as has been reported on this thread by others.
>>
>> TPC doesn't give _any_ advantage when you have just one core, and can
>> give more than 10X on a machine with a large number of cores.  These are
>> becoming more and more common, think of the recent AMD Naples announcement;
>> with 32 cores per socket you can have 128 logical cores in a two-socket
>> server; or the AWS i3.16xlarge instance with 32 cores / 64 vcpus.
>>
>> You're welcome to browse our site to learn more about the architecture,
>> or watch this technical talk [1] I gave in QConSF that highlights some of
>> the techniques we use.
>>
>> Of course it's possible to mistune Cassandra to give bad results, that is
>> why we spent a lot more time tuning Cassandra and documenting everything
>> than we spent on Scylla.  You can read the report in [2], it is very
>> detailed, and provides a wealth of metrics like you'd expect.
>>
>> I'm not going to comment about the Aerospike numbers, I haven't studied
>> them in detail.  And no, you can't multiply results like that unless they
>> were done with very similar configurations and test harnesses.
>>
>> Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
>> there's nothing to tune.
>>
>> Avi
>>
>> [1] https://www.infoq.com/presentations/scylladb
>> [2] http://www.scylladb.com/technology/cassandra-vs-scylla-bench
>> mark-cluster-1/
>>
>>
>> On 03/10/2017 06:58 PM, Bhuvan Rawal wrote:
>>
>> Agreed C++ gives an added advantage to talk to underlying hardware with
>> better efficiency, it sound good but can a pice of code written in C++ give
>> 1000% throughput than a Java app? Is TPC design 10X more performant than
>> SEDA arch?
>>
>> And if C/C++ is indeed that fast how can Aerospike (which is itself
>> written in C) claim to be 10X faster than Scylla here
>> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining
>> your's and aerospike's benchmarks it appears that Aerospike is 100X
>> performant than C* - I highly doubt that!! )
>>
>> For a moment lets forget about evaluating 2 different databases, one can
>> observe 10X performance difference between a mistuned cassandra cluster and
>> one thats tuned as per data model - there are so many Tunables in yaml as
>> well as table configs.
>>
>> Idea is - in order to strengthen your claim, you need to provide complete
>> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
>> with the configs used. Having plain ops per second and 99p latency is
>> blackbox.
>>
>> Regards,
>> Bhuvan
>>
>> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity  wrote:
>>
>>> ScyllaDB engineer here.
>>>
>>> C++ is really an enabling technology here. It is directly responsible
>>> for a small fraction of the gain by executing faster than Java.  But it is
>>> indirectly responsible for the gain by allowing us direct control over
>>> memory and threading.  Just as an example, Scylla starts by taking over
>>> almost all of the machine's memory, and dynamically assigning it to
>>> memtables, cache, and working memory needed to handle requests in flight.
>>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>>> fully.  You can't do these things in Java.
>>>
>>> I would say the major contributors to Scylla performance are:
>>>  - thread-per-core design
>>>  - replacement of the page cache with a row cache
>>>  - careful attention to many small details, each contributing a little,
>>> but with a large overall impact
>>>
>>> While I'm here I can say that performance is not the only goal here, it
>>> is stable and predictable performance over varying loads and during
>>> maintenance operations like repair, without any special tuning.  We measure
>>> the amount of CPU and I/O spent on foreground (user) and background
>>> (maintenance) tasks and divide them fairly.  This work is not complete but
>>> already makes operating Scylla a lot simpler.
>>>
>>>
>>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>>
>>> I dont think ScyllaDB performance is because of C++. The design
>>> decisions in scylladb are indeed

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Edward Capriolo

On Saturday, March 4, 2017, Thakrar, Jayesh 
wrote:

> LCS does not rule out frequent updates - it just says that there will be
> more frequent compaction, which can potentially increase compaction
> activity (which again can be throttled as needed).
>
> But STCS will guarantee OOM when you have large datasets.
>
> Did you have a look at the offheap + onheap size of our jvm using
> "nodetool -info" ?
>
>
>
>
>
> *From: *Shravan C  >
> *Date: *Friday, March 3, 2017 at 11:11 PM
> *To: *Joaquin Casares  >, "
> user@cassandra.apache.org
> " <
> user@cassandra.apache.org
> >
> *Subject: *Re: OOM on Apache Cassandra on 30 Plus node at the same time
>
>
>
> We run C* at 32 GB and all servers have 96GB RAM. We use STCS . LCS is not
> an option for us as we have frequent updates.
>
>
>
> Thanks,
>
> Shravan
> --
>
> *From:* Thakrar, Jayesh  >
> *Sent:* Friday, March 3, 2017 3:47:27 PM
> *To:* Joaquin Casares; user@cassandra.apache.org
> 
> *Subject:* Re: OOM on Apache Cassandra on 30 Plus node at the same time
>
>
>
> Had been fighting a similar battle, but am now over the hump for most part.
>
>
>
> Get info on the server config (e.g. memory, cpu, free memory (free -g),
> etc)
>
> Run "nodetool info" on the nodes to get heap and off-heap sizes
>
> Run "nodetool tablestats" or "nodetool tablestats ."
> on the key large tables
>
> Essentially the purpose is to see if you really had a true OOM or was your
> machine running out of memory.
>
>
>
> Cassandra can use offheap memory very well - so "nodetool info" will give
> you both heap and offheap.
>
>
>
> Also, what is the compaction strategy of your tables?
>
>
>
> Personally, I have found STCS to be awful at large scale - when you have
> sstables that are 100+ GB in size.
>
> See https://issues.apache.org/jira/browse/CASSANDRA-10821?
> focusedCommentId=15389451=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-15389451
>
>
>
> LCS seems better and should be the default (my opinion) unless you want
> DTCS
>
>
>
> A good description of all three compactions is here -
> http://docs.scylladb.com/kb/compaction/
>
> Documentation 
>
> docs.scylladb.com
>
> Scylla is a Cassandra-compatible NoSQL data store that can handle 1
> million transactions per second on a single server.
>
>
>
>
>
>
>
> *From: *Joaquin Casares  >
> *Date: *Friday, March 3, 2017 at 11:34 AM
> *To: * >
> *Subject: *Re: OOM on Apache Cassandra on 30 Plus node at the same time
>
>
>
> Hello Shravan,
>
>
>
> Typically asynchronous requests are recommended over batch statements
> since batch statements will cause more work on the coordinator node while
> individual requests, when using a TokenAwarePolicy, will hit a specific
> coordinator, perform a local disk seek, and return the requested
> information.
>
>
>
> The only times that using batch statements are ideal is if writing to the
> same partition key, even if it's across multiple tables when using the same
> hashing algorithm (like murmur3).
>
>
>
> Could you provide a bit of insight into what the batch statement was
> trying to accomplish and how many child statements were bundled up within
> that batch?
>
>
>
> Cheers,
>
>
>
> Joaquin
>
>
> Joaquin Casares
>
> Consultant
>
> Austin, TX
>
>
>
> Apache Cassandra Consulting
>
> http://www.thelastpickle.com
>
> The Last Pickle • Apache Cassandra Consulting & Services
> 
>
> www.thelastpickle.com
>
> Apache Cassandra Consulting & Services. Our wealth of experience with
> Apache Cassandra will ensure success at all stages of a your project
> lifecycle.
>
>
>
> On Fri, Mar 3, 2017 at 11:18 AM, Shravan Ch  > wrote:
>
> Hello,
>
> More than 30 plus Cassandra servers in the primary DC went down OOM
> exception below. What puzzles me is the scale at which it happened (at the
> same minute). I will share some more details below.
>
> System Log: http://pastebin.com/iPeYrWVR
>
> GC Log: http://pastebin.com/CzNNGs0r
>
> During the OOM I saw lot of WARNings like the below (these were there for
> quite sometime may be weeks)
> *WARN  [SharedPool-Worker-81] 2017-03-01 19:55:41,209
> BatchStatement.java:252 - Batch of prepared statements for [keyspace.table]
> is of size 225455, exceeding

Re: question of keyspace that just disappeared

2017-03-03 Thread Edward Capriolo

On Fri, Mar 3, 2017 at 7:56 AM, Romain Hardouin  wrote:

> I suspect a lack of 3.x reliability. Cassandra could had gave up with
> dropped messages but not with a "drop keyspace". I mean I already saw some
> spark jobs with too much executors that produce a high load average on a
> DC. I saw a C* node with a 1 min. load avg of 140 that can still have a P99
> read latency at 40ms. But I never saw a disappearing keyspace. There are
> old tickets regarding C* 1.x but as far as I remember it was due to a
> create/drop/create keyspace.
>
>
> Le Vendredi 3 mars 2017 13h44, George Webster  a
> écrit :
>
>
> Thank you for your reply and good to know about the debug statement. I
> haven't
>
> We never dropped or re-created the keyspace before. We haven't even
> performed writes to that keyspace in months. I also checked the permissions
> of Apache, that user had read only access.
>
> Unfortunately, I reverted from a backend recently. I cannot say for sure
> anymore if I saw something in system before the revert.
>
> Anyway, hopefully it was just a fluke. We have some crazy ML libraries
> running on it maybe Cassandra just gave up? Ohh well, Cassandra is a a
> champ and we haven't really had issues with it before.
>
> On Thu, Mar 2, 2017 at 6:51 PM, Romain Hardouin 
> wrote:
>
> Did you inspect system tables to see if there is some traces of your
> keyspace? Did you ever drop and re-create this keyspace before that?
>
> Lines in debug appear because fd interval is > 2 seconds (logs are in
> nanoseconds). You can override intervals via -Dcassandra.fd_initial_value_
> ms and -Dcassandra.fd_max_interval_ms properties. Are you sure you didn't
> have these lines in debug logs before? I used to see them a lot prior to
> increase intervals to 4 seconds.
>
> Best,
>
> Romain
>
> Le Mardi 28 février 2017 18h25, George Webster  a
> écrit :
>
>
> Hey Cassandra Users,
>
> We recently encountered an issue with a keyspace just disappeared. I was
> curious if anyone has had this occur before and can provide some insight.
>
> We are using cassandra 3.10. 2 DCs  3 nodes each.
> The data was still located in the storage folder but is not located inside
> Cassandra
>
> I searched the logs for any hints of error or commands being executed that
> could have caused a loss of a keyspace. Unfortunately I found nothing. In
> the logs the only unusual issue i saw was a series of read timeouts that
> occurred right around when the keyspace went away. Since then I see
> numerous entries in debug log as the following:
>
> DEBUG [GossipStage:1] 2017-02-28 18:14:12,580 FailureDetector.java:457 -
> Ignoring interval time of 2155674599 for /x.x.x..12
> DEBUG [GossipStage:1] 2017-02-28 18:14:16,580 FailureDetector.java:457 -
> Ignoring interval time of 2945213745 for /x.x.x.81
> DEBUG [GossipStage:1] 2017-02-28 18:14:19,590 FailureDetector.java:457 -
> Ignoring interval time of 2006530862 for /x.x.x..69
> DEBUG [GossipStage:1] 2017-02-28 18:14:27,434 FailureDetector.java:457 -
> Ignoring interval time of 3441841231 for /x.x.x.82
> DEBUG [GossipStage:1] 2017-02-28 18:14:29,588 FailureDetector.java:457 -
> Ignoring interval time of 2153964846 for /x.x.x.82
> DEBUG [GossipStage:1] 2017-02-28 18:14:33,582 FailureDetector.java:457 -
> Ignoring interval time of 2588593281 for /x.x.x.82
> DEBUG [GossipStage:1] 2017-02-28 18:14:37,588 FailureDetector.java:457 -
> Ignoring interval time of 2005305693 for /x.x.x.69
> DEBUG [GossipStage:1] 2017-02-28 18:14:38,592 FailureDetector.java:457 -
> Ignoring interval time of 2009244850 for /x.x.x.82
> DEBUG [GossipStage:1] 2017-02-28 18:14:43,584 FailureDetector.java:457 -
> Ignoring interval time of 2149192677 for /x.x.x.69
> DEBUG [GossipStage:1] 2017-02-28 18:14:45,605 FailureDetector.java:457 -
> Ignoring interval time of 2021180918 for /x.x.x.85
> DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 -
> Ignoring interval time of 2436026101 for /x.x.x.81
> DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 -
> Ignoring interval time of 2436187894 for /x.x.x.82
>
> During the time of the disappearing keyspace we had two concurrent
> activities:
> 1) Running a Spark job (via HDP 2.5.3 in Yarn) that was performing a
> countbykey. It was using they Keyspace that disappeared. The operation
> crashed.
> 2) We created a new keyspace to test out scheme. Only "fancy" thing in
> that keyspace are a few material view tables. Data was being loaded into
> that keyspace during the crash. The load process was extracting information
> and then just writing to Cassandra.
>
> Any ideas? Anyone seen this before?
>
> Thanks,
> George
>
>
>
>
>
>
Cassandra takes snapshots for certain events. Does this extend to drop
keyspace commands? Maybe it should.

Re: Is periodic manual repair necessary?

2017-02-27 Thread Edward Capriolo

There are 4 anti entropy systems in cassandra.

Hinted handoff
Read repair
Commit logs
Repair commamd

All are basically best effort.

Commit logs get corrupt and only flush periodically.

Bits rot on disk and while crossing networks network

Read repair is async and only happens randomly

Hinted handoff stops after some time and is not guarenteed.
On Monday, February 27, 2017, Thakrar, Jayesh 
wrote:

> Thanks Roth and Oskar for your quick responses.
>
>
>
> This is a single datacenter, multi-rack setup.
>
>
>
> > A TTL is technically similar to a delete - in the end both create
> tombstones.
>
> >If you want to eliminate the possibility of resurrected deleted data, you
> should run repairs.
>
> So why do I need to worry about data resurrection?
>
> Because, the TTL for the data is specified at the row level (atleast in
> this case) i.e. across ALL columns across ALL replicas.
>
> So they all will have the same data or wont have the data at all (i.e. it
> would have been tombstoned).
>
>
>
>
>
> > If you can guarantuee a 100% that data is read-repaired before
> gc_grace_seconds after the data has been TTL'ed, you won't need an extra
> repair.
>
> Why read-repaired before "gc_grace_period"?
>
> Isn't gc_grace_period the grace period for compaction to occur?
>
> So if the data was not consistent and read-repair happens before that,
> then well and good.
>
> Does read-repair not happen after gc/compaction?
>
> If this table has data being constantly/periodically inserted, then
> compaction will also happen accordingly, right?
>
>
>
> Thanks,
>
> Jayesh
>
>
>
>
>
> *From: *Benjamin Roth  >
> *Date: *Monday, February 27, 2017 at 11:53 AM
> *To: * >
> *Subject: *Re: Is periodic manual repair necessary?
>
>
>
> A TTL is technically similar to a delete - in the end both create
> tombstones.
>
> If you want to eliminate the possibility of resurrected deleted data, you
> should run repairs.
>
>
>
> If you can guarantuee a 100% that data is read-repaired before
> gc_grace_seconds after the data has been TTL'ed, you won't need an extra
> repair.
>
>
>
> 2017-02-27 18:29 GMT+01:00 Oskar Kjellin  >:
>
> Are you running multi dc?
>
> Skickat från min iPad
>
>
> 27 feb. 2017 kl. 16:08 skrev Thakrar, Jayesh  >:
>
> Suppose I have an application, where there are no deletes, only 5-10% of
> rows being occasionally updated (and that too only once) and a lot of reads.
>
>
>
> Furthermore, I have replication = 3 and both read and write are configured
> for local_quorum.
>
>
>
> Occasionally, servers do go into maintenance.
>
>
>
> I understand when the maintenance is longer than the period for
> hinted_handoffs to be preserved, they are lost and servers may have stale
> data.
>
> But I do expect it to be rectified on reads. If the stale data is not read
> again, I don’t care for it to be corrected as then the data will be
> automatically purged because of TTL.
>
>
>
> In such a situation, do I need to have a periodic (weekly?) manual/batch
> read_repair process?
>
>
>
> Thanks,
>
> Jayesh Thakrar
>
>
>
>
>
> --
>
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Pluggable throttling of read and write queries

2017-02-22 Thread Edward Capriolo

On Wed, Feb 22, 2017 at 1:20 PM, Abhishek Verma <ve...@uber.com> wrote:

> We have lots of dedicated Cassandra clusters for large use cases, but we
> have a long tail of (~100) of internal customers who want to store < 200GB
> of data with < 5k qps and non-critical data. It does not make sense to
> create a 3 node dedicated cluster for each of these small use cases. So we
> have a shared cluster into which we onboard these users.
>
> But once in a while, one of the customers will run a ingest job from HDFS
> which will pound the shared cluster and break our SLA for the cluster for
> all the other customers. Currently, I don't see anyway to signal back
> pressure to the ingestion jobs or throttle their requests. Another example
> is one customer doing a large number of range queries which has the same
> effect.
>
> A simple way to avoid this is to throttle the read or write requests based
> on some quota limits for each keyspace or user.
>
> Please see replies inlined:
>
> On Mon, Feb 20, 2017 at 11:46 PM, vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
>> Aren't you using mesos Cassandra framework to manage your multiple
>> clusters ? (Seen a presentation in cass summit)
>>
> Yes we are using https://github.com/mesosphere/dcos-cassandra-service and
> contribute heavily to it. I am aware of the presentation (
> https://www.youtube.com/watch?v=4Ap-1VT2ChU) at the Cassandra summit as I
> was the one who gave it :)
> This has helped us automate the creation and management of these clusters.
>
>> What's wrong with your current mesos approach ?
>>
> Hardware efficiency: Spinning up dedicated clusters for each use case
> wastes a lot of hardware resources. One of the approaches we have taken is
> spinning up multiple Cassandra nodes belonging to different clusters on the
> same physical machine. However, we still have overhead of managing these
> separate multi-tenant clusters.
>
>> I am also thinking it's better to split a large cluster into smallers
>> except if you also manage client layer that query cass and you can put some
>> backpressure or rate limit in it.
>>
> We have an internal storage API layer that some of the clients use, but
> there are many customers who use the vanilla DataStax Java or Python
> driver. Implementing throttling in each of those clients does not seem like
> a viable approach.
>
> Le 21 févr. 2017 2:46 AM, "Edward Capriolo" <edlinuxg...@gmail.com> a
>> écrit :
>>
>>> Older versions had a request scheduler api.
>>
>> I am not aware of the history behind it. Can you please point me to the
> JIRA tickets and/or why it was removed?
>
> On Monday, February 20, 2017, Ben Slater <ben.sla...@instaclustr.com>
>>> wrote:
>>>
>>>> We’ve actually had several customers where we’ve done the opposite -
>>>> split large clusters apart to separate uses cases. We found that this
>>>> allowed us to better align hardware with use case requirements (for example
>>>> using AWS c3.2xlarge for very hot data at low latency, m4.xlarge for more
>>>> general purpose data) we can also tune JVM settings, etc to meet those uses
>>>> cases.
>>>>
>>> There have been several instances where we have moved customers out of
> the shared cluster to their own dedicated clusters because they outgrew our
> limitations. But I don't think it makes sense to move all the small use
> cases into their separate clusters.
>
> On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin <
>>>> oleksandr.shul...@zalando.de> wrote:
>>>>
>>>>> On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma <ve...@uber.com>
>>>>> wrote:
>>>>>
>>>>>> Cassandra is being used on a large scale at Uber. We usually create
>>>>>> dedicated clusters for each of our internal use cases, however that is
>>>>>> difficult to scale and manage.
>>>>>>
>>>>>> We are investigating the approach of using a single shared cluster
>>>>>> with 100s of nodes and handle 10s to 100s of different use cases for
>>>>>> different products in the same cluster. We can define different keyspaces
>>>>>> for each of them, but that does not help in case of noisy neighbors.
>>>>>>
>>>>>> Does anybody in the community have similar large shared clusters
>>>>>> and/or face noisy neighbor issues?
>>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> We've never tried this approach and

Re: How does cassandra achieve Linearizability?

2017-02-22 Thread Edward Capriolo

On Wed, Feb 22, 2017 at 9:47 AM, Ariel Weisberg <ar...@weisberg.ws> wrote:

> Hi,
>
> No it's not going to be in 3.11.x. The earliest release it could make it
> into is 4.0.
>
> Ariel
>
> On Wed, Feb 22, 2017, at 03:34 AM, Kant Kodali wrote:
>
> Hi Ariel,
>
> Can we really expect the fix in 3.11.x as the ticket
> https://issues.apache.org/jira/browse/CASSANDRA-6246
> <https://issues.apache.org/jira/browse/CASSANDRA-6246?jql=text%20~%20%22epaxos%22>
>  says?
>
> Thanks,
> kant
>
> On Thu, Feb 16, 2017 at 2:12 PM, Ariel Weisberg <ar...@weisberg.ws> wrote:
>
>
> Hi,
>
> That would work and would help a lot with the dueling proposer issue.
>
> A lot of the leader election stuff is designed to reduce the number of
> roundtrips and not just address the dueling proposer issue. Those will have
> downtime because it's there for correctness. Just adding an affinity for a
> specific proposer is probably a free lunch.
>
> I don't think you can group keys because the Paxos proposals are per
> partition which is why we get linear scale out for Paxos. I don't believe
> it's linearizable across multiple partitions. You can use the clustering
> key and deterministically pick one of the live replicas for that clustering
> key. Sort the list of replicas by IP, hash the clustering key, use the hash
> as an index into the list of replicas.
>
> Batching is of limited usefulness because we only use Paxos for CAS I
> think? So in a batch by definition all but one will fail the CAS. This is
> something where a distinguished coordinator could help by failing the rest
> of the contending requests more inexpensively than it currently does.
>
>
> Ariel
>
> On Thu, Feb 16, 2017, at 04:55 PM, Edward Capriolo wrote:
>
>
>
> On Thu, Feb 16, 2017 at 4:33 PM, Ariel Weisberg <ar...@weisberg.ws> wrote:
>
>
> Hi,
>
> Classic Paxos doesn't have a leader. There are variants on the original
> Lamport approach that will elect a leader (or some other variation like
> Mencius) to improve throughput, latency, and performance under contention.
> Cassandra implements the approach from the beginning of "Paxos Made Simple"
> (https://goo.gl/SrP0Wb) with no additional optimizations that I am aware
> of. There is no distinguished proposer (leader).
>
> That paper does go on to discuss electing a distinguished proposer, but
> that was never done for C*. I believe it's not considered a good fit for C*
> philosophically.
>
> Ariel
>
> On Thu, Feb 16, 2017, at 04:20 PM, Kant Kodali wrote:
>
> @Ariel Weisberg EPaxos looks very interesting as it looks like it doesn't
> need any designated leader for C* but I am assuming the paxos that is
> implemented today for LWT's requires Leader election and If so, don't we
> need to have an odd number of nodes or racks or DC's to satisfy N = 2F + 1
> constraint to tolerate F failures ? I understand it is not needed when not
> using LWT's since Cassandra is a master-less system.
>
> On Fri, Feb 10, 2017 at 10:25 AM, Kant Kodali <k...@peernova.com> wrote:
>
> Thanks Ariel! Yes I knew there are so many variations and optimizations of
> Paxos. I just wanted to see if we had any plans on improving the existing
> Paxos implementation and it is great to see the work is under progress! I
> am going to follow that ticket and read up the references pointed in it
>
>
> On Fri, Feb 10, 2017 at 8:33 AM, Ariel Weisberg <ar...@weisberg.ws> wrote:
>
>
> Hi,
>
> Cassandra's implementation of Paxos doesn't implement many optimizations
> that would drastically improve throughput and latency. You need consensus,
> but it doesn't have to be exorbitantly expensive and fall over under any
> kind of contention.
>
> For instance you could implement EPaxos https://issues.apache.o
> rg/jira/browse/CASSANDRA-6246
> <https://issues.apache.org/jira/browse/CASSANDRA-6246?jql=text%20~%20%22epaxos%22>,
> batch multiple operations into the same Paxos round, have an affinity for a
> specific proposer for a specific partition, implement asynchronous commit,
> use a more efficient implementation of the Paxos log, and maybe other
> things.
>
>
> Ariel
>
>
>
> On Fri, Feb 10, 2017, at 05:31 AM, Benjamin Roth wrote:
>
> Hi Kant,
>
> If you read the published papers about Paxos, you will most probably
> recognize that there is no way to "do it better". This is a conceptional
> thing due to the nature of distributed systems + the CAP theorem.
> If you want A+P in the triangle, then C is very expensive. CS is made for
> A+P mostly with tunable C. In ACID databases this is a completely different
> thing as they are mostly either not partition tolerant, n

Re: Trouble implementing CAS operation with LWT query

2017-02-22 Thread Edward Capriolo

On Wed, Feb 22, 2017 at 8:42 AM, 안정아  wrote:

> Hi, all
>
>
>
> I'm trying to implement a typical CAS operation with LWT query(conditional
> update).
>
> But I'm having trouble keeping integrity of the result when
> WriteTimeoutException occurs.
>
> according to http://www.datastax.com/dev/blog/cassandra-error-handling-
> done-right
>
> "If the paxos phase fails, the driver will throw a WriteTimeoutException
> with a WriteType.
>
> CAS as retrieved with WriteTimeoutException#getWriteType().
>
> In this situation you can’t know if the CAS operation has been applied..."
>
> 1) Doesn't it ruin the whole point of using LWT for CAS operation if you
> can't be sure whether the query is applied or not?
>
> 2-1) Is there anyway to know whether the query is applied when timeout
> occurred?
>
> 2-2) If we can't tell, are there any way to workaround this and keep the
> CAS integrity?
>
>
>
> Thanks!
>
>
>
>
>

What you might be first trying to do is count the timeouts:

https://github.com/edwardcapriolo/ec/blob/master/src/test/java/Base/CompareAndSwapTest.java

https://github.com/edwardcapriolo/ec/blob/master/src/test/java/Base/CompareAndSwapTest.java#L99

This tactic does not work.

However you can keep re-reading at CL.Serial to determine i the update
applied.

What I found this to mean is you CANT do this:

for (i=0;i<2000;i++){
   new Thread(){ () -> { doCasInsert() } }.start();
}

Assert.assertEquals(2000, getTotalInserts())

But you CAN do this:

for (i=0;i<2000;i++){
  new Thread ( () -> {
  count = "SELECT count(1) from".setConstistencyLevel(cl.Serial)
  if (count < 2000){
 doCasInsert()
  }
  });
}


Essentially, because you want know if a CAS operation will succeed even in
a client timeout in the future you can not "COUNT" on the insert side. Y

Pluggable throttling of read and write queries

2017-02-20 Thread Edward Capriolo

Older versions had a request scheduler api.

On Monday, February 20, 2017, Ben Slater > wrote:

> We’ve actually had several customers where we’ve done the opposite - split
> large clusters apart to separate uses cases. We found that this allowed us
> to better align hardware with use case requirements (for example using AWS
> c3.2xlarge for very hot data at low latency, m4.xlarge for more general
> purpose data) we can also tune JVM settings, etc to meet those uses cases.
>
> Cheers
> Ben
>
> On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma  wrote:
>>
>>> Cassandra is being used on a large scale at Uber. We usually create
>>> dedicated clusters for each of our internal use cases, however that is
>>> difficult to scale and manage.
>>>
>>> We are investigating the approach of using a single shared cluster with
>>> 100s of nodes and handle 10s to 100s of different use cases for different
>>> products in the same cluster. We can define different keyspaces for each of
>>> them, but that does not help in case of noisy neighbors.
>>>
>>> Does anybody in the community have similar large shared clusters and/or
>>> face noisy neighbor issues?
>>>
>>
>> Hi,
>>
>> We've never tried this approach and given my limited experience I would
>> find this a terrible idea from the perspective of maintenance (remember the
>> old saying about basket and eggs?)
>>
>> What potential benefits do you see?
>>
>> Regards,
>> --
>> Alex
>>
>> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: cassandra user request log

2017-02-20 Thread Edward Capriolo

Not directly. Consider proxing request through an application server and
log at that level.

On Friday, February 10, 2017, Benjamin Roth  wrote:

> If you want to audit write operations only, you could maybe use CDC, this
> is a quite new feature in 3.x (I think it was introduced in 3.9 or 3.10)
>
> 2017-02-10 10:10 GMT+01:00 vincent gromakowski <
> vincent.gromakow...@gmail.com
> >:
>
>> tx
>>
>> 2017-02-10 10:01 GMT+01:00 Benjamin Roth > >:
>>
>>> you could write a custom trigger that logs access to specific CFs. But
>>> be aware that this may have a big performance impact.
>>>
>>> 2017-02-10 9:58 GMT+01:00 vincent gromakowski <
>>> vincent.gromakow...@gmail.com
>>> >:
>>>
 GDPR compliancy...we need to trace user activity on personal data.
 Maybe there is another way ?

 2017-02-10 9:46 GMT+01:00 Benjamin Roth >:

> On a cluster with just a little bit load, that would cause zillions of
> petabytes of logs (just roughly ;)). I don't think this is viable.
> There are many many JMX metrics on an aggregated level. But none per
> authed used.
> What exactly do you want to find out? Is it for debugging purposes?
>
>
> 2017-02-10 9:42 GMT+01:00 vincent gromakowski <
> vincent.gromakow...@gmail.com
> >:
>
>> Hi all,
>> Is there any way to trace user activity at the server level to see
>> which user is accessing which data ? Do you thin it would be simple to
>> implement ?
>> Tx
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Count(*) is not working

2017-02-20 Thread Edward Capriolo

Seems worth it to file a bug since some here are under the impression it
almost always works and others are under the impression it almost never
works.

On Friday, February 17, 2017, kurt greaves  wrote:

> really... well that's good to know. it still almost never works though. i
> guess every time I've seen it it must have timed out due to tombstones.
>
> On 17 Feb. 2017 22:06, "Sylvain Lebresne"  > wrote:
>
> On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves  > wrote:
>
>> if you want a reliable count, you should use spark. performing a count
>> (*) will inevitably fail unless you make your server read timeouts and
>> tombstone fail thresholds ridiculous
>>
>
> That's just not true. count(*) is paged internally so while it is not
> particular fast, it shouldn't require bumping neither the read timeout nor
> the tombstone fail threshold in any way to work.
>
> In that case, it seems the partition does have many tombstones (more than
> live rows) and so the tombstone threshold is doing its job of warning about
> it.
>
>
>>
>> On 17 Feb. 2017 04:34, "Jan" > > wrote:
>>
>>> Hi,
>>>
>>> could you post the output of nodetool cfstats for the table?
>>>
>>> Cheers,
>>>
>>> Jan
>>>
>>> Am 16.02.2017 um 17:00 schrieb Selvam Raman:
>>>
>>> I am not getting count as result. Where i keep on getting n number of
>>> results below.
>>>
>>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>>> LIMIT 100 (see tombstone_warn_threshold)
>>>
>>> On Thu, Feb 16, 2017 at 12:37 PM, Jan Kesten >> > wrote:
>>>
 Hi,

 do you got a result finally?

 Those messages are simply warnings telling you that c* had to read many
 tombstones while processing your query - rows that are deleted but not
 garbage collected/compacted. This warning gives you some explanation why
 things might be much slower than expected because per 100 rows that count
 c* had to read about 15 times rows that were deleted already.

 Apart from that, count(*) is almost always slow - and there is a
 default limit of 10.000 rows in a result.

 Do you really need the actual live count? To get a idea you can always
 look at nodetool cfstats (but those numbers also contain deleted rows).

 Am 16.02.2017 um 13:18 schrieb Selvam Raman:

 Hi,

 I want to know the total records count in table.

 I fired the below query:
select count(*) from tablename;

 and i have got the below output

 Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
 keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
 LIMIT 100 (see tombstone_warn_threshold)

 Read 100 live rows and 1435 tombstone cells for query SELECT * FROM
 keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT 100 (see
 tombstone_warn_threshold)

 Read 96 live rows and 1385 tombstone cells for query SELECT * FROM
 keysace.table WHERE token(id) > token(test:-2220-UV033/04) LIMIT 100 (see
 tombstone_warn_threshold).

 Can you please help me to get the total count of the table.

 --
 Selvam Raman
 "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

>>>
>>>
>>> --
>>> Selvam Raman
>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>
>>>
>>>
>
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: High disk io read load

2017-02-19 Thread Edward Capriolo

On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth 
wrote:

> We are talking about a read IO increase of over 2000% with 512 tokens
> compared to 256 tokens. 100% increase would be linear which would be
> perfect. 200% would even okay, taking the RAM/Load ratio for caching into
> account. But > 20x the read IO is really incredible.
> The nodes are configured with puppet, they share the same roles and no
> manual "optimizations" are applied. So I can't imagine, a different
> configuration is responsible for it.
>
> 2017-02-18 21:28 GMT+01:00 Benjamin Roth :
>
>> This is status of the largest KS of these both nodes:
>> UN  10.23.71.10  437.91 GiB  512  49.1%
>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>> UN  10.23.71.9   246.99 GiB  256  28.3%
>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>
>> So roughly as expected.
>>
>> 2017-02-17 23:07 GMT+01:00 kurt greaves :
>>
>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

When I read articles like this:

http://www.doanduyhai.com/blog/?p=1930

And see the word hot-spot.

"Another performance consideration worth mentioning is hot-spot. Similar to
manual denormalization, if your view partition key is chosen poorly, you’ll
end up with hot spots in your cluster. A simple example with our *user* table
is to create a materialized

*view user_by_gender"It leads me to ask a question back: What can you say
about hotspots in your data? Even if your nodes had the identical number of
tokens this autho seems to suggesting that you still could have hotspots.
Maybe the issue is you have a hotspot 2x hotspots, or your application has
a hotspot that would be present even with perfect token balancing.*

Re: How does cassandra achieve Linearizability?

2017-02-16 Thread Edward Capriolo

On Thu, Feb 16, 2017 at 4:33 PM, Ariel Weisberg  wrote:

> Hi,
>
> Classic Paxos doesn't have a leader. There are variants on the original
> Lamport approach that will elect a leader (or some other variation like
> Mencius) to improve throughput, latency, and performance under contention.
> Cassandra implements the approach from the beginning of "Paxos Made Simple"
> (https://goo.gl/SrP0Wb) with no additional optimizations that I am aware
> of. There is no distinguished proposer (leader).
>
> That paper does go on to discuss electing a distinguished proposer, but
> that was never done for C*. I believe it's not considered a good fit for C*
> philosophically.
>
> Ariel
>
> On Thu, Feb 16, 2017, at 04:20 PM, Kant Kodali wrote:
>
> @Ariel Weisberg EPaxos looks very interesting as it looks like it doesn't
> need any designated leader for C* but I am assuming the paxos that is
> implemented today for LWT's requires Leader election and If so, don't we
> need to have an odd number of nodes or racks or DC's to satisfy N = 2F + 1
> constraint to tolerate F failures ? I understand it is not needed when not
> using LWT's since Cassandra is a master-less system.
>
> On Fri, Feb 10, 2017 at 10:25 AM, Kant Kodali  wrote:
>
> Thanks Ariel! Yes I knew there are so many variations and optimizations of
> Paxos. I just wanted to see if we had any plans on improving the existing
> Paxos implementation and it is great to see the work is under progress! I
> am going to follow that ticket and read up the references pointed in it
>
>
> On Fri, Feb 10, 2017 at 8:33 AM, Ariel Weisberg  wrote:
>
>
> Hi,
>
> Cassandra's implementation of Paxos doesn't implement many optimizations
> that would drastically improve throughput and latency. You need consensus,
> but it doesn't have to be exorbitantly expensive and fall over under any
> kind of contention.
>
> For instance you could implement EPaxos https://issues.apache.o
> rg/jira/browse/CASSANDRA-6246
> ,
> batch multiple operations into the same Paxos round, have an affinity for a
> specific proposer for a specific partition, implement asynchronous commit,
> use a more efficient implementation of the Paxos log, and maybe other
> things.
>
>
> Ariel
>
>
>
> On Fri, Feb 10, 2017, at 05:31 AM, Benjamin Roth wrote:
>
> Hi Kant,
>
> If you read the published papers about Paxos, you will most probably
> recognize that there is no way to "do it better". This is a conceptional
> thing due to the nature of distributed systems + the CAP theorem.
> If you want A+P in the triangle, then C is very expensive. CS is made for
> A+P mostly with tunable C. In ACID databases this is a completely different
> thing as they are mostly either not partition tolerant, not highly
> available or not scalable (in a distributed manner, not speaking of
> "monolithic super servers").
>
> There is no free lunch ...
>
>
> 2017-02-10 11:09 GMT+01:00 Kant Kodali :
>
> "That’s the safety blanket everyone wants but is extremely expensive,
> especially in Cassandra."
>
> yes LWT's are expensive. Are there any plans to make this better?
>
> On Fri, Feb 10, 2017 at 12:17 AM, Kant Kodali  wrote:
>
> Hi Jon,
>
> Thanks a lot for your response. I am well aware that the LWW != LWT but I
> was talking more in terms of LWW with respective to LWT's which I believe
> you answered. so thanks much!
>
>
> kant
>
>
> On Thu, Feb 9, 2017 at 6:01 PM, Jon Haddad 
> wrote:
>
> LWT != Last Write Wins.  They are totally different.
>
> LWTs give you (assuming you also read at SERIAL) “atomic consistency”,
> meaning you are able to perform operations atomically and in isolation.
> That’s the safety blanket everyone wants but is extremely expensive,
> especially in Cassandra.  The lightweight part, btw, may be a little
> optimistic, especially if a key is under contention.  With regard to the
> “last write” part you’re asking about - w/ LWT Cassandra provides the
> timestamp and manages it as part of the ballot, and it always is
> increasing.  See 
> org.apache.cassandra.service.ClientState#getTimestampForPaxos.
> From the code:
>
>  * Returns a timestamp suitable for paxos given the timestamp of the last
> known commit (or in progress update).
>  * Paxos ensures that the timestamp it uses for commits respects the
> serial order of those commits. It does so
>  * by having each replica reject any proposal whose timestamp is not
> strictly greater than the last proposal it
>  * accepted. So in practice, which timestamp we use for a given proposal
> doesn't affect correctness but it does
>  * affect the chance of making progress (if we pick a timestamp lower than
> what has been proposed before, our
>  * new proposal will just get rejected).
>
> Effectively paxos removes the ability to use custom timestamps and
> addresses clock variance by

Re: High disk io read load

2017-02-16 Thread Edward Capriolo

On Thu, Feb 16, 2017 at 12:38 AM, Benjamin Roth <benjamin.r...@jaumo.com>
wrote:

> It doesn't really look like that:
> https://cl.ly/2c3Z1u2k0u2I
>
> Thats the ReadLatency.count metric aggregated by host which represents the
> actual read operations, correct?
>
> 2017-02-15 23:01 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>:
>
>> I think it has more than double the load. It is double the data. More
>> read repair chances. More load can swing it's way during node failures etc.
>>
>> On Wednesday, February 15, 2017, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>>> Hi there,
>>>
>>> Following situation in cluster with 10 nodes:
>>> Node A's disk read IO is ~20 times higher than the read load of node B.
>>> The nodes are exactly the same except:
>>> - Node A has 512 tokens and Node B 256. So it has double the load (data).
>>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>>>
>>> Node A has roughly 460GB, Node B 260GB total disk usage.
>>> Both nodes have 128GB RAM and 40 cores.
>>>
>>> Of course I assumed that Node A does more reads because cache / load
>>> ratio is worse but a factor of 20 makes me very sceptic.
>>>
>>> Of course Node A has a much higher and less predictable latency due to
>>> the wait states.
>>>
>>> Has anybody experienced similar situations?
>>> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
>>> payload is not that few. I am pretty sure that not the whole dataset of
>>> 460GB is "hot".
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>> <07161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

You could be correct. I also think a few things smooth out the curves.

- Intelligent clients
- Dynamic snitch

For example when testing out a an awesome JVM tune, you might see CPU usage
go down. From there you assume the tune worked, but what can happen is the
two dynamic mechanisms shift some small% of traffic away. Those affects
cascade as well. dynamic_snitch claims to shift load once performance is
$threshold worse.

Re: High disk io read load

2017-02-15 Thread Edward Capriolo

I think it has more than double the load. It is double the data. More read
repair chances. More load can swing it's way during node failures etc.

On Wednesday, February 15, 2017, Benjamin Roth 
wrote:

> Hi there,
>
> Following situation in cluster with 10 nodes:
> Node A's disk read IO is ~20 times higher than the read load of node B.
> The nodes are exactly the same except:
> - Node A has 512 tokens and Node B 256. So it has double the load (data).
> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>
> Node A has roughly 460GB, Node B 260GB total disk usage.
> Both nodes have 128GB RAM and 40 cores.
>
> Of course I assumed that Node A does more reads because cache / load ratio
> is worse but a factor of 20 makes me very sceptic.
>
> Of course Node A has a much higher and less predictable latency due to the
> wait states.
>
> Has anybody experienced similar situations?
> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
> payload is not that few. I am pretty sure that not the whole dataset of
> 460GB is "hot".
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-02-13 Thread Edward Capriolo

On Monday, February 13, 2017, Brice Dutheil <brice.duth...@gmail.com> wrote:

> The Android battle is another thing that I wouldn't consider for OracleJDK
> / OpenJDK.
> While I do like what Google did from a technical point of view, Google may
> have overstepped fair use (or not – I don't know). Anyway Sun didn't like
> what Google did, they probably considered going to court at that time.
>
>
>
>
> -- Brice
>
> On Mon, Feb 13, 2017 at 10:20 AM, kurt greaves <k...@instaclustr.com
> <javascript:_e(%7B%7D,'cvml','k...@instaclustr.com');>> wrote:
>
>> are people actually trying to imply that Google is less evil than oracle?
>> what is this shill fest
>>
>>
>> On 12 Feb. 2017 8:24 am, "Kant Kodali" <k...@peernova.com
>> <javascript:_e(%7B%7D,'cvml','k...@peernova.com');>> wrote:
>>
>> Saw this one today...
>>
>> https://news.ycombinator.com/item?id=13624062
>>
>> On Tue, Jan 3, 2017 at 6:27 AM, Eric Evans <john.eric.ev...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','john.eric.ev...@gmail.com');>> wrote:
>>
>>> On Mon, Jan 2, 2017 at 2:26 PM, Edward Capriolo <edlinuxg...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','edlinuxg...@gmail.com');>> wrote:
>>> > Lets be clear:
>>> > What I am saying is avoiding being loose with the word "free"
>>> >
>>> > https://en.wikipedia.org/wiki/Free_software_license
>>> >
>>> > Many things with the JVM are free too. Most importantly it is free to
>>> use.
>>> >
>>> > https://www.java.com/en/download/faq/distribution.xml
>>> >
>>> > As it relates to this conversation: I am not aware of anyone running
>>> > Cassandra that has modified upstream JVM to make Cassandra run
>>> > better/differently *. Thus the license around the Oracle JVM is roughly
>>> > meaningless to the user/developer of cassandra.
>>> >
>>> > * The only group I know that took an action to modify upstream was
>>> Acunu.
>>> > They had released a modified Linux Kernel with a modified Apache
>>> Cassandra.
>>> > http://cloudtweaks.com/2011/02/data-storage-startup-acunu-ra
>>> ises-3-6-million-to-launch-its-first-product/.
>>> > That product no longer exists.
>>> >
>>> > "I don't how to read any of this.  It sounds like you're saying that a
>>> > JVM is something that cannot be produced as a Free Software project,"
>>> >
>>> > What I am saying is something like the JVM "could" be produced as a
>>> "free
>>> > software project". However, the argument that I was making is that the
>>> > popular viable languages/(including vms or runtime to use them) today
>>> > including Java, C#, Go, Swift are developed by the largest tech
>>> companies in
>>> > the world, and as such I do believe a platform would be viable.
>>> Specifically
>>> > I believe without Oracle driving Java OpenJDK would not be viable.
>>> >
>>> > There are two specific reasons.
>>> > 1) I do not see large costly multi-year initiatives like G1 happening
>>> > 2) Without guidance/leadership that sun/oracle I do not see new
>>> features
>>> > that change the language like lambda's and try multi-catch happening
>>> in a
>>> > sane way.
>>> >
>>> > I expanded upon #2 be discussing my experience with standards like c++
>>> 11,
>>> > 14,17 and attempting to take compiling working lambda code on linux
>>> GCC to
>>> > microsoft visual studio and having it not compile. In my opinion, Java
>>> only
>>> > wins because as a platform it is very portable as both source and
>>> binary
>>> > code. Without leadership on that front I believe that over time the
>>> language
>>> > would suffer.
>>>
>>> I realize that you're trying to be pragmatic about all of this, but
>>> what I don't think you realize, is that so am I.
>>>
>>> Java could change hands at any time (it has once already), or Oracle
>>> leadership could decide to go in a different direction.  Imagine for
>>> example that they relicensed it to exclude use by orientation or
>>> religion, Cassandra would implicitly carry these restrictions as well.
>>> Imagine that they decided to provide a back-door to the NSA, Cassandra
>>> would then also contain such a back-door.  These mi

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo

On Sat, Feb 11, 2017 at 1:47 PM, Micha <mich...@fantasymail.de> wrote:

> I think I was not clear enough...
>
> I have *one* table for which the row data contains (among other values)
> a sha-1 sum. There are no collisions.  I thought computing a murmur hash
> for a sha-1 sum is just wasted time, as the murmur hash doesn't make the
> data more random than it already is.   So it's just one table where this
> matters.
>
>
>  Michael
>
>
> Am 11.02.2017 um 16:54 schrieb Jonathan Haddad:
> > The odds of only using a sha1 as your partition key for every table you
> > ever create is low. You will regret BOP until the end of time.
> > On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo <edlinuxg...@gmail.com
> > <mailto:edlinuxg...@gmail.com>> wrote:
> >
> > Probably best to avoid bop even if you are aflready hashing keys
> > yourself. What do you do when checksuma collide? It is possible
> right?
> >
> > On Saturday, February 11, 2017, Micha <mich...@fantasymail.de
> > <mailto:mich...@fantasymail.de>> wrote:
> >
> > Hi,
> >
> > my table has a sha-1 sum as partition key. Would in this case the
> > ByteOrdered partitioner be a better choice than the
> > Murmur3partitioner,
> > since the keys are quite random?
> >
> >
> > cheers,
> >  Michael
> >
> >
> >
> > --
> > Sorry this was sent from mobile. Will do less grammar and spell
> > check than usual.
> >
>

The problem of using BOP is the partitioner is not set on the
table/keyspace level but it is set cluster wide. So if you have two tables
with different key distribution there is no way to balanced them out.

BOP I would almost consider it quasi supported at this point:

http://stackoverflow.com/questions/27939234/cassandra-byteorderedpartitioner

"no seriously your doing it wrong"

I have thought about this often, if you really need BOP, for example your
generating a web index and you want to co-locate data for the same domain
so you can scan it, Cassandra is a bad fit. I'm not convinced that a
secondary index/mv fills the need. Hbase seems a more logical choice (to
me). Where the data is logically ordered by key and the protocol splits
regions as they grow.

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo

On Sat, Feb 11, 2017 at 10:54 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> The odds of only using a sha1 as your partition key for every table you
> ever create is low. You will regret BOP until the end of time.
> On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>> Probably best to avoid bop even if you are aflready hashing keys
>> yourself. What do you do when checksuma collide? It is possible right?
>>
>> On Saturday, February 11, 2017, Micha <mich...@fantasymail.de> wrote:
>>
>> Hi,
>>
>> my table has a sha-1 sum as partition key. Would in this case the
>> ByteOrdered partitioner be a better choice than the Murmur3partitioner,
>> since the keys are quite random?
>>
>>
>> cheers,
>>  Michael
>>
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
Yes, the odds are low.

https://en.wikipedia.org/wiki/Birthday_problem

This has already been addressed for RP:

https://issues.apache.org/jira/browse/CASSANDRA-1034

If you wanted to BOP and hash yourself you would have to make your primary
key something like (shavalue,actualvalue) to ensure two keys do not
overwrite each other.

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo

Probably best to avoid bop even if you are aflready hashing keys yourself.
What do you do when checksuma collide? It is possible right?

On Saturday, February 11, 2017, Micha  wrote:

> Hi,
>
> my table has a sha-1 sum as partition key. Would in this case the
> ByteOrdered partitioner be a better choice than the Murmur3partitioner,
> since the keys are quite random?
>
>
> cheers,
>  Michael
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Composite partition key token

2017-02-09 Thread Edward Capriolo

On Thu, Feb 9, 2017 at 9:26 AM, Michael Burman  wrote:

> Hi,
>
> How about taking it from the BoundStatement directly?
>
> ByteBuffer routingKey = b.getRoutingKey(ProtocolVersion.NEWEST_SUPPORTED,
> codecRegistry);
> Token token = metadata.newToken(routingKey);
>
> In this case the b is the "BoundStatement". Replace codecRegistry &
> ProtocolVersion with what you have. codecRegistry for example from the
> codecRegistry = session.getCluster().getConfig
> uration().getCodecRegistry();
>
>   - Micke
>
>
> On 02/08/2017 08:58 PM, Branislav Janosik -T (bjanosik - AAP3 INC at
> Cisco) wrote:
>
>>
>> Hi,
>>
>> I would like to ask how to calculate token for composite partition key
>> using java api?
>>
>> For partition key made of one column I use cluster.getMetadata().newToken
>> (newBuffer);
>>
>> But what if my key looks like this PRIMARY KEY
>> ((parentResourceId,timeRT), childName)?
>>
>> I read that “:” is a separator but it doesn’t seem to be the case.
>>
>> How can I create ByteBuffer with multiple values so that the token would
>> be actually correct?
>>
>> Thank you,
>>
>> Branislav
>>
>>
>
This could help:
https://github.com/edwardcapriolo/simple-cassandra-tools/blob/master/src/main/java/io/teknek/cassandra/simple/CompositeTool.java

Re: [RELEASE] Apache Cassandra 3.10 released

2017-02-03 Thread Edward Capriolo

On Fri, Feb 3, 2017 at 6:52 PM, Michael Shuler 
wrote:

> The Cassandra team is pleased to announce the release of Apache
> Cassandra version 3.10.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a new feature and bug fix release[1] on the 3.X series.
> As always, please pay attention to the release notes[2] and Let us
> know[3] if you were to encounter any problem.
>
> This is the last tick-tock feature release of Apache Cassandra. Version
> 3.11.0 will continue bug fixes from this point on the cassandra-3.11
> branch in git.
>
> Enjoy!
>
> [1]: (CHANGES.txt) https://goo.gl/J0VghF
> [2]: (NEWS.txt) https://goo.gl/00KNVW
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>
Great job all on this release.

Re: implementing a 'sorted set' on top of cassandra

2017-01-17 Thread Edward Capriolo

On Tue, Jan 17, 2017 at 11:47 AM, Mike Torra  wrote:

> Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is
> indeed what we use today.
>
> Caching the resulting 'sorted sets' in redis is exactly what I plan to do.
> There will be tens of thousands of these sorted sets, each generally with
> <10k items (with maybe a few exceptions going a bit over that). The reason
> to periodically calculate the set and store it in cassandra is to avoid
> having the client do that work, when the client only really cares about the
> top 100 or so items at any given time. Being truly "real time" is not
> critical for us, but it is a selling point to be as up to date as possible.
>
> I'd like to understand the performance issue of frequently updating these
> sets. I understand that every time I 'regenerate' the sorted set, any rows
> that change will create a tombstone - for example, if "item_1" is in first
> place and "item_2" is in second place, then they switch on the next update,
> that would be two tombstones. Do you think this will be a big enough
> problem that it is worth doing the sorting work client side, on demand, and
> just try to eat the performance hit there? My thought was to make a
> tradeoff by using more cassandra disk space (ie pre calculating all sets),
> in exchange for faster reads when requests actually come in that need this
> data.
>
> From: Benjamin Roth 
> Reply-To: "user@cassandra.apache.org" 
> Date: Saturday, January 14, 2017 at 1:25 PM
> To: "user@cassandra.apache.org" 
> Subject: Re: implementing a 'sorted set' on top of cassandra
>
> Mike mentioned "increment" in his initial post. That let me think of a
> case with increments and fetching a top list by a counter like
> https://redis.io/commands/zincrby
> https://redis.io/commands/zrangebyscore
>
> 1. Cassandra is absolutely not made to sort by a counter (or a non-counter
> numeric incrementing value) but it is made to store counters. In this case
> a partition could be seen as a set.
> 2. I thought of CS for persistence and - depending on the app requirements
> like real-time and set size - still use redis as a read cache
>
> 2017-01-14 18:45 GMT+01:00 Jonathan Haddad :
>
>> Sorted sets don't have a requirement of incrementing / decrementing.
>> They're commonly used for thing like leaderboards where the values are
>> arbitrary.
>>
>> In Redis they are implemented with 2 data structures for efficient
>> lookups of either key or value. No getting around that as far as I know.
>>
>> In Cassandra they would require using the score as a clustering column in
>> order to select top N scores (and paginate). That means a tombstone
>> whenever the value for a key in the set changes. In sets with high rates of
>> change that means a lot of tombstones and thus terrible performance.
>> On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan  wrote:
>>
>>> Sorting on an "incremented" numeric value has always been a nightmare to
>>> be done properly in C*
>>>
>>> Either use Counter type but then no sorting is possible since counter
>>> cannot be used as type for clustering column (which allows sort)
>>>
>>> Or use simple numeric type on clustering column but then to increment
>>> the value *concurrently* and *safely* it's prohibitive (SELECT to fetch
>>> current value + UPDATE ... IF value = ) + retry
>>>
>>>
>>>
>>> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth 
>>> wrote:
>>>
>>> If your proposed solution is crazy depends on your needs :)
>>> It sounds like you can live with not-realtime data. So it is ok to cache
>>> it. Why preproduce the results if you only need 5% of them? Why not use
>>> redis as a cache with expiring sorted sets that are filled on demand from
>>> cassandra partitions with counters?
>>> So redis has much less to do and can scale much better. And you are not
>>> limited on keeping all data in ram as cache data is volatile and can be
>>> evicted on demand.
>>> If this is effective also depends on the size of your sets. CS wont be
>>> able to sort them by score for you, so you will have to load the complete
>>> set to redis for caching and / or do sorting in your app on demand. This
>>> certainly won't work out well with sets with millions of entries.
>>>
>>> 2017-01-13 23:14 GMT+01:00 Mike Torra :
>>>
>>> We currently use redis to store sorted sets that we increment many, many
>>> times more than we read. For example, only about 5% of these sets are ever
>>> read. We are getting to the point where redis is becoming difficult to
>>> scale (currently at >20 nodes).
>>>
>>> We've started using cassandra for other things, and now we are
>>> experimenting to see if having a similar 'sorted set' data structure is
>>> feasible in cassandra. My approach so far is:
>>>
>>>1. Use a counter CF to store the values I want to sort by
>>>2.

Re: implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Edward Capriolo

On Fri, Jan 13, 2017 at 8:14 PM, Jonathan Haddad  wrote:

> I've thought about this for years and have never arrived on a particularly
> great implementation.  Your idea will be maybe OK if the sets are very
> small and if the values don't change very often.  But in a system where the
> values of the keys in the set change frequently (lots of tombstones) or the
> sets are large I think you're going to experience quite a bit of pain.
>
> On Fri, Jan 13, 2017 at 2:14 PM Mike Torra  wrote:
>
> We currently use redis to store sorted sets that we increment many, many
> times more than we read. For example, only about 5% of these sets are ever
> read. We are getting to the point where redis is becoming difficult to
> scale (currently at >20 nodes).
>
> We've started using cassandra for other things, and now we are
> experimenting to see if having a similar 'sorted set' data structure is
> feasible in cassandra. My approach so far is:
>
>1. Use a counter CF to store the values I want to sort by
>2. Periodically read in all key/values in the counter CF and sort in
>the client application (~every five minutes or so)
>3. Write back to a different CF with the ordered keys I care about
>
> Does this seem crazy? Is there a simpler way to do this in cassandra?
>
>
Redis is the other side of the coin.

Fast:
https://groups.google.com/forum/#!topic/redis-db/4TAItKMyUEE

http://stackoverflow.com/questions/6076342/is-there-a-practical-limit-to-the-number-of-elements-in-a-sorted-set-in-redis

320MB memory for a 2,000,000 email addresses is hard to scale. If you are
only maintaining a single list great, but if you have millions of lists
this memory/ cost profile is not idea.

Re: implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Edward Capriolo

On Fri, Jan 13, 2017 at 5:14 PM, Mike Torra  wrote:

> We currently use redis to store sorted sets that we increment many, many
> times more than we read. For example, only about 5% of these sets are ever
> read. We are getting to the point where redis is becoming difficult to
> scale (currently at >20 nodes).
>
> We've started using cassandra for other things, and now we are
> experimenting to see if having a similar 'sorted set' data structure is
> feasible in cassandra. My approach so far is:
>
>1. Use a counter CF to store the values I want to sort by
>2. Periodically read in all key/values in the counter CF and sort in
>the client application (~every five minutes or so)
>3. Write back to a different CF with the ordered keys I care about
>
> Does this seem crazy? Is there a simpler way to do this in cassandra?
>

Have you considered using only the keys in Cassandra's map type?

I proposed an implementation that I wanted to experiment with adding to a
set: https://issues.apache.org/jira/browse/CASSANDRA-6870 . Even though
redis and it's feature set is wildly popular there is not a great consensus
that Cassandra should do those things as manipulations of a single column.

Re: Strange issue wherein cassandra not being started from cron

2017-01-10 Thread Edward Capriolo

On Tuesday, January 10, 2017, Jonathan Haddad  wrote:

> Last I checked, cron doesn't load the same, full environment you see when
> you log in. Also, why put Cassandra on a cron?
> On Mon, Jan 9, 2017 at 9:47 PM Bhuvan Rawal  > wrote:
>
>> Hi Ajay,
>>
>> Have you had a look at cron logs? - mine is in path /var/log/cron
>>
>> Thanks & Regards,
>>
>> On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg > > wrote:
>>
>>> Hi All.
>>>
>>> Facing a very weird issue, wherein the command
>>>
>>> */etc/init.d/cassandra start*
>>>
>>> causes cassandra to start when the command is run from command-line.
>>>
>>>
>>> However, if I put the above as a cron job
>>>
>>>
>>>
>>> ** * * * * /etc/init.d/cassandra start*
>>> cassandra never starts.
>>>
>>>
>>> I have checked, and "cron" service is running.
>>>
>>>
>>> Any ideas what might be wrong?
>>> I am pasting the cassandra script for brevity.
>>>
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>>
>>> 
>>> 
>>> #! /bin/sh
>>> ### BEGIN INIT INFO
>>> # Provides:  cassandra
>>> # Required-Start:$remote_fs $network $named $time
>>> # Required-Stop: $remote_fs $network $named $time
>>> # Should-Start:  ntp mdadm
>>> # Should-Stop:   ntp mdadm
>>> # Default-Start: 2 3 4 5
>>> # Default-Stop:  0 1 6
>>> # Short-Description: distributed storage system for structured data
>>> # Description:   Cassandra is a distributed (peer-to-peer) system for
>>> #the management and storage of structured data.
>>> ### END INIT INFO
>>>
>>> # Author: Eric Evans >> >
>>>
>>> DESC="Cassandra"
>>> NAME=cassandra
>>> PIDFILE=/var/run/$NAME/$NAME.pid
>>> SCRIPTNAME=/etc/init.d/$NAME
>>> CONFDIR=/etc/cassandra
>>> WAIT_FOR_START=10
>>> CASSANDRA_HOME=/usr/share/cassandra
>>> FD_LIMIT=10
>>>
>>> [ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
>>> [ -e /etc/cassandra/cassandra.yaml ] || exit 0
>>> [ -e /etc/cassandra/cassandra-env.sh ] || exit 0
>>>
>>> # Read configuration variable file if it is present
>>> [ -r /etc/default/$NAME ] && . /etc/default/$NAME
>>>
>>> # Read Cassandra environment file.
>>> . /etc/cassandra/cassandra-env.sh
>>>
>>> if [ -z "$JVM_OPTS" ]; then
>>> echo "Initialization failed; \$JVM_OPTS not set!" >&2
>>> exit 3
>>> fi
>>>
>>> export JVM_OPTS
>>>
>>> # Export JAVA_HOME, if set.
>>> [ -n "$JAVA_HOME" ] && export JAVA_HOME
>>>
>>> # Load the VERBOSE setting and other rcS variables
>>> . /lib/init/vars.sh
>>>
>>> # Define LSB log_* functions.
>>> # Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
>>> . /lib/lsb/init-functions
>>>
>>> #
>>> # Function that returns 0 if process is running, or nonzero if not.
>>> #
>>> # The nonzero value is 3 if the process is simply not running, and 1 if
>>> the
>>> # process is not running but the pidfile exists (to match the exit codes
>>> for
>>> # the "status" command; see LSB core spec 3.1, section 20.2)
>>> #
>>> CMD_PATT="cassandra.+CassandraDaemon"
>>> is_running()
>>> {
>>> if [ -f $PIDFILE ]; then
>>> pid=`cat $PIDFILE`
>>> grep -Eq "$CMD_PATT" "/proc/$pid/cmdline" 2>/dev/null && return 0
>>> return 1
>>> fi
>>> return 3
>>> }
>>> #
>>> # Function that starts the daemon/service
>>> #
>>> do_start()
>>> {
>>> # Return
>>> #   0 if daemon has been started
>>> #   1 if daemon was already running
>>> #   2 if daemon could not be started
>>>
>>> ulimit -l unlimited
>>> ulimit -n "$FD_LIMIT"
>>>
>>> cassandra_home=`getent passwd cassandra | awk -F ':' '{ print $6; }'`
>>> heap_dump_f="$cassandra_home/java_`date +%s`.hprof"
>>> error_log_f="$cassandra_home/hs_err_`date +%s`.log"
>>>
>>> [ -e `dirname "$PIDFILE"` ] || \
>>> install -d -ocassandra -gcassandra -m755 `dirname $PIDFILE`
>>>
>>>
>>>
>>> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -q -p
>>> "$PIDFILE" -t >/dev/null || return 1
>>>
>>> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p
>>> "$PIDFILE" -- \
>>> -p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null ||
>>> return 2
>>>
>>> }
>>>
>>> #
>>> # Function that stops the daemon/service
>>> #
>>> do_stop()
>>> {
>>> # Return
>>> #   0 if daemon has been stopped
>>> #   1 if daemon was already stopped
>>> #   2 if daemon could not be stopped
>>> #   other if a failure occurred
>>> start-stop-daemon -K -p "$PIDFILE" -R TERM/30/KILL/5 >/dev/null
>>> RET=$?
>>> rm -f "$PIDFILE"
>>> return $RET
>>> }
>>>
>>> case "$1" in
>>>   start)
>>> [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
>>> do_start
>>>

Re: Help

2017-01-09 Thread Edward Capriolo

On Sun, Jan 8, 2017 at 11:30 PM, Anshu Vajpayee 
wrote:

> Gossip shows - all nodes are up.
>
> But when  we perform writes , coordinator stores the hints. It means  -
> coordinator was not able to deliver the writes to few nodes after meeting
> consistency requirements.
>
> The nodes for which  writes were failing, are in different DC. Those nodes
> do not have any load.
>
> Gossips shows everything is up.  I already set write timeout to 60 sec,
> but no help.
>
> Can anyone encounter this scenario ? Network side everything is fine.
>
> Cassandra version is 2.1.13
>
> --
> *Regards,*
> *Anshu *
>
>
>
This suggests you have some intermittent network issues. I would suggest
using query tracing

http://cassandra.apache.org/doc/latest/tools/cqlsh.html

Hopefully you can use that to determine why some operations are failing.

Re: Logs appear to contradict themselves during bootstrap steps

2017-01-06 Thread Edward Capriolo

On Fri, Jan 6, 2017 at 6:45 PM, Sotirios Delimanolis 
wrote:

> I forgot to check nodetool gossipinfo. Still, why does the first check
> think that the address exists, but the second doesn't?
>
>
> On Friday, January 6, 2017 1:11 PM, David Berry 
> wrote:
>
>
> I’ve encountered this previously where after removing a node, gossip info
> is retained for 72 hours which doesn’t allow the IP to be reused during
> that period.   You can check how long gossip will retain this information
> using “nodetool gossipinfo” where the epoch time will be shown with status
>
> For example….
>
> Nodetool gossipinfo
>
> /10.236.70.199
>   generation:1482436691
>   heartbeat:3942407
>   STATUS:3942404:LEFT,3074457345618261000,1483995662276
>   LOAD:3942267:3.60685807E8
>   SCHEMA:223625:acbf0adb-1bbe-384a-acd7-6a46609497f1
>   DC:20:orion
>   RACK:22:r1
>   RELEASE_VERSION:4:2.1.16
>   RPC_ADDRESS:3:10.236.70.199
>   SEVERITY:3942406:0.25094103813171387
>   NET_VERSION:1:8
>   HOST_ID:2:cd2a767f-3716-4717-9106-52f0380e6184
>   TOKENS:15:
>
> Converting it from epoch…..
>
> local@img2116saturn101:~$ date -d @$((1483995662276/1000))
> Mon Jan  9 21:01:02 UTC 2017
>
> At the time we waited the 72 hour period before reusing the IP, I’ve not
> used replace_address previously.
>
>
> *From:* Sotirios Delimanolis [mailto:sotodel...@yahoo.com]
> *Sent:* Friday, January 6, 2017 2:38 PM
> *To:* User 
> *Subject:* Logs appear to contradict themselves during bootstrap steps
>
> We had a node go down in our cluster and its disk had to be wiped. During
> that time, all nodes in the cluster have restarted at least once.
>
> We want to add the bad node back to the ring. It has the same IP/hostname.
> I follow the steps here
> 
>  for
> "Adding nodes to an existing cluster."
>
> When the process is started up, it reports
>
> A node with address / already exists, cancelling join.
> Use cassandra.replace_address if you want to replace this node.
>
> I found this error message in the StorageService using the Gossiper
> instance to look up the node's state. Apparently, the node knows about it.
> So I followed the instructions and added the cassandra.replace_address
> system property and restarted the process.
>
> But it reports
>
> Cannot replace_address / because it doesn't exist in gossip
>
> So which one is it? Does the ring know about it or not? Running "nodetool
> ring" does show it on all other nodes.
>
> I've seen CASSANDRA-8138
>  andthe conditions
> are the same, but I can't understand why it thinks it's not part of gossip.
> What's the difference between the gossip check used to make this
> determination and the gossip check used for the first error message? Can
> someone explain?
>
> I've since retrieved the node's id and used it to "nodetool removenode".
> After rebalancing, I added the node back and "nodetool cleaned" up.
> Everything's up and running, but I'd like to understand what Cassandra was
> doing.
>
>
>
>
>
>
In case you have not seen check out
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsAssassinate.html
this is what you too when you really want something to go away from gossip.

Re: weird jvm metrics

2017-01-05 Thread Edward Capriolo

On Thu, Jan 5, 2017 at 1:53 PM, Alain Rastoul <alf.mmm@gmail.com> wrote:

> On 01/04/2017 11:12 PM, Edward Capriolo wrote:
>
>> The metric-reporter is actually leveraged from another project.
>>
>> https://github.com/addthis/metrics-reporter-config
>>
>> Check the version of metric-reporter (in cassandra/lib) and see if it
>> has changed from your old version to your new version.
>>
>>
>>
>>
>> On Wed, Jan 4, 2017 at 12:02 PM, Mike Torra <mto...@demandware.com
>> <mailto:mto...@demandware.com>> wrote:
>>
>> Just bumping - has anyone seen this before?
>>
>> http://stackoverflow.com/questions/41446352/cassandra-3-9-
>> jvm-metrics-have-bad-name
>> <http://stackoverflow.com/questions/41446352/cassandra-3-9-
>> jvm-metrics-have-bad-name>
>>
>> From: Mike Torra <mto...@demandware.com <mailto:mto...@demandware.com
>> >>
>> Reply-To: "user@cassandra.apache.org
>> <mailto:user@cassandra.apache.org>" <user@cassandra.apache.org
>> <mailto:user@cassandra.apache.org>>
>> Date: Wednesday, December 28, 2016 at 4:49 PM
>> To: "user@cassandra.apache.org <mailto:user@cassandra.apache.org>"
>> <user@cassandra.apache.org <mailto:user@cassandra.apache.org>>
>> Subject: weird jvm metrics
>>
>> Hi There -
>>
>> I recently upgraded from cassandra 3.5 to 3.9 (DDC), and I noticed
>> that the "new" jvm metrics are reporting with an extra '.' character
>> in them. Here is a snippet of what I see from one of my nodes:
>>
>> ubuntu@ip-10-0-2-163:~$ sudo tcpdump -i eth0 -v dst port 2003 -A |
>> grep 'jvm'
>>
>> tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture
>> size 65535 bytes
>>
>> .Je..l>.pi.cassandra.us-east-1.cassy-node1.jvm.buffers..dire
>> ct.capacity
>> 762371494 1482960946
>>
>> pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.count 3054
>> 1482960946
>>
>> pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.used
>> 762371496 1482960946
>>
>> pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.capacity
>> 515226631134 1482960946
>>
>> pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.count 45572
>> 1482960946
>>
>> pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.used
>> 515319762610 1482960946
>>
>> pi.cassandra.us-east-1.cassy-node1.jvm.fd.usage 0.00 1482960946
>>
>>
>> My metrics.yaml looks like this:
>>
>> graphite:
>>-
>>  period: 60
>>  timeunit: 'SECONDS'
>>  prefix: 'pi.cassandra.us-east-1.cassy-node1'
>>  hosts:
>>   - host: '#RELAY_HOST#'
>> port: 2003
>>  predicate:
>>color: "white"
>>useQualifiedName: true
>>patterns:
>>  - "^org.+"
>>  - "^jvm.+"
>>  - "^java.lang.+"
>>
>> All the org.* metrics come through fine, and the jvm.fd.usage metric
>> strangely comes through fine, too. The rest of the jvm.* metrics
>> have this extra '.' character that causes them to not show up in
>> graphite.
>>
>> Am I missing something silly here? Appreciate any help or suggestions.
>>
>> - Mike
>>
>>
>> Hi,
>
> I also noticed this problem recently.
> Some jvm metrics have a double dot in name like:
> jvm.memory..total.max , jvm.memory..total.init (etc).
> Investigating a bit, it seems that an extra dot is added at the end of the
> name in CassandraDaemon.java, around line 367 (in 3.0.10):
> ...
> // enable metrics provided by metrics-jvm.jar
>
> CassandraMetricsRegistry.Metrics.register("jvm.buffers.", new
> BufferPoolMetricSet(ManagementFactory.getPlatformMBeanServer()));
> CassandraMetricsRegistry.Metrics.register("jvm.gc.", new
> GarbageCollectorMetricSet());
>
> CassandraMetricsRegistry.Metrics.register("jvm.memory.", new
> MemoryUsageGaugeSet());
>
> and also added in append method of MetricRegistry.
> Call stack is:
> MetricRegistry>>registerAll(String prefix, MetricSet metrics)
> MetricRegistry>>static String name(String name, String... names)
> MetricRegistry>>static void append(StringBuilder builder, String part)
>
> in append the dot is also added:
> ...
> if(builder.length() > 0) {
> builder.append('.');
> }
> builder.append(part);
> ...
>
> The codahale MetricRegistry class seems to have no recent modification of
> name or append methods, so it looks like a small (but annoying) bug.
> May be the fix could be to simply remove the final dot ?
>
> Is it worth opening an issue in JIRA ?
>
>
> --
> best,
> Alain
>
>
Good troubleshooting. I would open a Jira. It seems like a good solution
would be to replace '..' with '.' somehow. It seems like no one would every
want ..

Re: weird jvm metrics

2017-01-04 Thread Edward Capriolo

The metric-reporter is actually leveraged from another project.

https://github.com/addthis/metrics-reporter-config

Check the version of metric-reporter (in cassandra/lib) and see if it has
changed from your old version to your new version.




On Wed, Jan 4, 2017 at 12:02 PM, Mike Torra  wrote:

> Just bumping - has anyone seen this before?
>
> http://stackoverflow.com/questions/41446352/cassandra-
> 3-9-jvm-metrics-have-bad-name
>
> From: Mike Torra 
> Reply-To: "user@cassandra.apache.org" 
> Date: Wednesday, December 28, 2016 at 4:49 PM
> To: "user@cassandra.apache.org" 
> Subject: weird jvm metrics
>
> Hi There -
>
> I recently upgraded from cassandra 3.5 to 3.9 (DDC), and I noticed that
> the "new" jvm metrics are reporting with an extra '.' character in them.
> Here is a snippet of what I see from one of my nodes:
>
> ubuntu@ip-10-0-2-163:~$ sudo tcpdump -i eth0 -v dst port 2003 -A | grep
> 'jvm'
>
> tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size
> 65535 bytes
>
> .Je..l>.pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.capacity
> 762371494 1482960946
>
> pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.count 3054
> 1482960946
>
> pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.used 762371496
> 1482960946
>
> pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.capacity
> 515226631134 1482960946
>
> pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.count 45572
> 1482960946
>
> pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.used 515319762610
> 1482960946
>
> pi.cassandra.us-east-1.cassy-node1.jvm.fd.usage 0.00 1482960946
>
> My metrics.yaml looks like this:
>
> graphite:
>   -
> period: 60
> timeunit: 'SECONDS'
> prefix: 'pi.cassandra.us-east-1.cassy-node1'
> hosts:
>  - host: '#RELAY_HOST#'
>port: 2003
> predicate:
>   color: "white"
>   useQualifiedName: true
>   patterns:
> - "^org.+"
> - "^jvm.+"
> - "^java.lang.+"
>
> All the org.* metrics come through fine, and the jvm.fd.usage metric
> strangely comes through fine, too. The rest of the jvm.* metrics have this
> extra '.' character that causes them to not show up in graphite.
>
> Am I missing something silly here? Appreciate any help or suggestions.
>
> - Mike
>

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-01-02 Thread Edward Capriolo

On Mon, Jan 2, 2017 at 8:30 PM, Kant Kodali <k...@peernova.com> wrote:

> This is a subjective question and of course it would turn into opinionated
> answers and I think we should welcome that (Nothing wrong in debating a
> topic). we have many such debates as SE's such as programming language
> comparisons, Architectural debates, Framework/Library debates and so on.
> people who don't like this conversation can simply refrain from following
> this thread right. I don't know why they choose to Jump in if they dont
> like a topic
>
> Sun is a great company no doubt! I don't know if Oracle is. Things like
> this https://www.extremetech.com/mobile/220136-google-
> plans-to-remove-oracles-java-apis-from-android-n is what pisses me about
> Oracle which gives an impression that they are not up for open source. It
> would be awesome to see JVM running on more and more devices (not less) so
> Google taking away Oracle Java API's from Android is a big failure from
> Oracle.
>
> JVM is a great piece of Software and by far there isn't anything yet that
> comes close. And there are great people who worked at SUN at that time.
> open the JDK source code and read it. you will encounter some great ideas
> and Algorithms.
>
>
>
>
>
> On Mon, Jan 2, 2017 at 1:04 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>>
>> On Mon, Jan 2, 2017 at 3:51 PM, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>>> Does this discussion really make sense any more? To me it seems it
>>> turned opinionated and religious. From my point of view anything that has
>>> to be said was said.
>>>
>>> Am 02.01.2017 21:27 schrieb "Edward Capriolo" <edlinuxg...@gmail.com>:
>>>
>>>>
>>>>
>>>> On Mon, Jan 2, 2017 at 11:56 AM, Eric Evans <john.eric.ev...@gmail.com>
>>>> wrote:
>>>>
>>>>> On Fri, Dec 23, 2016 at 9:15 PM, Edward Capriolo <
>>>>> edlinuxg...@gmail.com> wrote:
>>>>> > "I don't really have any opinions on Oracle per say, but Cassandra
>>>>> is a
>>>>> > Free Software project and I would prefer that we not depend on
>>>>> > commercial software, (and that's kind of what we have here, an
>>>>> > implicit dependency)."
>>>>> >
>>>>> > We are a bit loose here with terms "free" and "commercial". The
>>>>> oracle JVM
>>>>> > is open source, it is free to use and the trademark is owned by a
>>>>> company.
>>>>>
>>>>> Are we?  There are many definitions for the word "free", only one of
>>>>> which means "without cost"; I assumed it was obvious that I was
>>>>> talking about licensing terms (and of course the implications of that
>>>>> licensing).
>>>>>
>>>>> Cassandra is Free Software by virtue of the fact that it is Apache
>>>>> Licensed.  You are Free (as in Freedom) to modify and redistribute it.
>>>>>
>>>>> The Oracle JVM ships with a commercial license.  It is free only in
>>>>> the sense that you are not required to pay anything to use it, (i.e.
>>>>> you are not Free to do much of anything other than use it to run Java
>>>>> software).
>>>>>
>>>>> > That is not much different then using a tool for cassandra like a
>>>>> driver
>>>>> > hosted on github but made my a company.
>>>>>
>>>>> It is very different IME.  Cassandra requires a JVM to function, this
>>>>> is a hard dependency.  A driver is merely a means to make use of it.
>>>>>
>>>>> > The thing about a JVM is that like a kernel you want really smart
>>>>> dedicated
>>>>> > people working on it. Oracle has moved the JVM forward since taking
>>>>> over
>>>>> > sun. You can not just manage a JVM like say the freebsd port of x
>>>>> maintained
>>>>> > by 3 part time dudes that all get paid to do something else.
>>>>>
>>>>> I don't how to read any of this.  It sounds like you're saying that a
>>>>> JVM is something that cannot be produced as a Free Software project,
>>>>> or maybe that you just really like Oracle, I'm honestly not sure.  It
>>>>> doesn't seem relevant though, because there is in fact a Free Software
>>>>> JVM (and in addition to

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-01-02 Thread Edward Capriolo

On Mon, Jan 2, 2017 at 3:51 PM, Benjamin Roth <benjamin.r...@jaumo.com>
wrote:

> Does this discussion really make sense any more? To me it seems it turned
> opinionated and religious. From my point of view anything that has to be
> said was said.
>
> Am 02.01.2017 21:27 schrieb "Edward Capriolo" <edlinuxg...@gmail.com>:
>
>>
>>
>> On Mon, Jan 2, 2017 at 11:56 AM, Eric Evans <john.eric.ev...@gmail.com>
>> wrote:
>>
>>> On Fri, Dec 23, 2016 at 9:15 PM, Edward Capriolo <edlinuxg...@gmail.com>
>>> wrote:
>>> > "I don't really have any opinions on Oracle per say, but Cassandra is a
>>> > Free Software project and I would prefer that we not depend on
>>> > commercial software, (and that's kind of what we have here, an
>>> > implicit dependency)."
>>> >
>>> > We are a bit loose here with terms "free" and "commercial". The oracle
>>> JVM
>>> > is open source, it is free to use and the trademark is owned by a
>>> company.
>>>
>>> Are we?  There are many definitions for the word "free", only one of
>>> which means "without cost"; I assumed it was obvious that I was
>>> talking about licensing terms (and of course the implications of that
>>> licensing).
>>>
>>> Cassandra is Free Software by virtue of the fact that it is Apache
>>> Licensed.  You are Free (as in Freedom) to modify and redistribute it.
>>>
>>> The Oracle JVM ships with a commercial license.  It is free only in
>>> the sense that you are not required to pay anything to use it, (i.e.
>>> you are not Free to do much of anything other than use it to run Java
>>> software).
>>>
>>> > That is not much different then using a tool for cassandra like a
>>> driver
>>> > hosted on github but made my a company.
>>>
>>> It is very different IME.  Cassandra requires a JVM to function, this
>>> is a hard dependency.  A driver is merely a means to make use of it.
>>>
>>> > The thing about a JVM is that like a kernel you want really smart
>>> dedicated
>>> > people working on it. Oracle has moved the JVM forward since taking
>>> over
>>> > sun. You can not just manage a JVM like say the freebsd port of x
>>> maintained
>>> > by 3 part time dudes that all get paid to do something else.
>>>
>>> I don't how to read any of this.  It sounds like you're saying that a
>>> JVM is something that cannot be produced as a Free Software project,
>>> or maybe that you just really like Oracle, I'm honestly not sure.  It
>>> doesn't seem relevant though, because there is in fact a Free Software
>>> JVM (and in addition to some mere mortals, the fine people at Oracle
>>> do contribute to it).
>>>
>>>
>>> --
>>> Eric Evans
>>> john.eric.ev...@gmail.com
>>>
>>
>> Are we?  There are many definitions for the word "free", only one of
>> which means "without cost"; I assumed it was obvious that I was
>> talking about licensing terms (and of course the implications of that
>> licensing).
>>
>> Lets be clear:
>> What I am saying is avoiding being loose with the word "free"
>>
>> https://en.wikipedia.org/wiki/Free_software_license
>>
>> Many things with the JVM are free too. Most importantly it is free to
>> use.
>>
>> https://www.java.com/en/download/faq/distribution.xml
>>
>> As it relates to this conversation: I am not aware of anyone running
>> Cassandra that has modified upstream JVM to make Cassandra run
>> better/differently *. Thus the license around the Oracle JVM is roughly
>> meaningless to the user/developer of cassandra.
>>
>> * The only group I know that took an action to modify upstream was Acunu.
>> They had released a modified Linux Kernel with a modified Apache Cassandra.
>> http://cloudtweaks.com/2011/02/data-storage-start
>> up-acunu-raises-3-6-million-to-launch-its-first-product/. That product
>> no longer exists.
>>
>> "I don't how to read any of this.  It sounds like you're saying that a
>> JVM is something that cannot be produced as a Free Software project,"
>>
>> What I am saying is something like the JVM "could" be produced as a "free
>> software project". However, the argument that I was making is that the
>> popular viable languages/(including vms or runtime to use them)

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-01-02 Thread Edward Capriolo

On Mon, Jan 2, 2017 at 11:56 AM, Eric Evans <john.eric.ev...@gmail.com>
wrote:

> On Fri, Dec 23, 2016 at 9:15 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
> > "I don't really have any opinions on Oracle per say, but Cassandra is a
> > Free Software project and I would prefer that we not depend on
> > commercial software, (and that's kind of what we have here, an
> > implicit dependency)."
> >
> > We are a bit loose here with terms "free" and "commercial". The oracle
> JVM
> > is open source, it is free to use and the trademark is owned by a
> company.
>
> Are we?  There are many definitions for the word "free", only one of
> which means "without cost"; I assumed it was obvious that I was
> talking about licensing terms (and of course the implications of that
> licensing).
>
> Cassandra is Free Software by virtue of the fact that it is Apache
> Licensed.  You are Free (as in Freedom) to modify and redistribute it.
>
> The Oracle JVM ships with a commercial license.  It is free only in
> the sense that you are not required to pay anything to use it, (i.e.
> you are not Free to do much of anything other than use it to run Java
> software).
>
> > That is not much different then using a tool for cassandra like a driver
> > hosted on github but made my a company.
>
> It is very different IME.  Cassandra requires a JVM to function, this
> is a hard dependency.  A driver is merely a means to make use of it.
>
> > The thing about a JVM is that like a kernel you want really smart
> dedicated
> > people working on it. Oracle has moved the JVM forward since taking over
> > sun. You can not just manage a JVM like say the freebsd port of x
> maintained
> > by 3 part time dudes that all get paid to do something else.
>
> I don't how to read any of this.  It sounds like you're saying that a
> JVM is something that cannot be produced as a Free Software project,
> or maybe that you just really like Oracle, I'm honestly not sure.  It
> doesn't seem relevant though, because there is in fact a Free Software
> JVM (and in addition to some mere mortals, the fine people at Oracle
> do contribute to it).
>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>

Are we?  There are many definitions for the word "free", only one of
which means "without cost"; I assumed it was obvious that I was
talking about licensing terms (and of course the implications of that
licensing).

Lets be clear:
What I am saying is avoiding being loose with the word "free"

https://en.wikipedia.org/wiki/Free_software_license

Many things with the JVM are free too. Most importantly it is free to use.

https://www.java.com/en/download/faq/distribution.xml

As it relates to this conversation: I am not aware of anyone running
Cassandra that has modified upstream JVM to make Cassandra run
better/differently *. Thus the license around the Oracle JVM is roughly
meaningless to the user/developer of cassandra.

* The only group I know that took an action to modify upstream was Acunu.
They had released a modified Linux Kernel with a modified Apache Cassandra.
http://cloudtweaks.com/2011/02/data-storage-startup-acunu-raises-3-6-million-to-launch-its-first-product/.
That product no longer exists.

"I don't how to read any of this.  It sounds like you're saying that a
JVM is something that cannot be produced as a Free Software project,"

What I am saying is something like the JVM "could" be produced as a "free
software project". However, the argument that I was making is that the
popular viable languages/(including vms or runtime to use them) today
including Java, C#, Go, Swift are developed by the largest tech companies
in the world, and as such I do believe a platform would be viable.
Specifically I believe without Oracle driving Java OpenJDK would not be
viable.

There are two specific reasons.
1) I do not see large costly multi-year initiatives like G1 happening
2) Without guidance/leadership that sun/oracle I do not see new features
that change the language like lambda's and try multi-catch happening in a
sane way.

I expanded upon #2 be discussing my experience with standards like c++ 11,
14,17 and attempting to take compiling working lambda code on linux GCC to
microsoft visual studio and having it not compile. In my opinion, Java only
wins because as a platform it is very portable as both source and binary
code. Without leadership on that front I believe that over time the
language would suffer.

"It is very different IME.  Cassandra requires a JVM to function, this
is a hard dependency.  A driver is merely a means to make use of it."

LOL. Sure a database with a driver is very useful. I mean it sits there
flushing empty memtables and writing to its log file. You can run nodetool
ring and imagine where data would go if you could put data into it. Very
exciting stuff.

Re: Query

2016-12-29 Thread Edward Capriolo

You should start with understanding your needs. Once you understand your
need you can pick the software that fits your need. Staring with a software
stack is backwards.

On Thu, Dec 29, 2016 at 11:34 PM, Ben Slater 
wrote:

> I wasn’t familiar with Gizzard either so I thought I’d take a look. The
> first things on their github readme is:
> *NB: This project is currently not recommended as a base for new
> consumers.*
> (And no commits since 2013)
>
> So, Cassandra definitely looks like a better choice as your datastore for
> a new project.
>
> Cheers
> Ben
>
> On Fri, 30 Dec 2016 at 12:41 Manoj Khangaonkar 
> wrote:
>
>> I am not that familiar with gizzard but with gizzard + mysql , you have
>> multiple moving parts in the system that need to managed separately. You'll
>> need the mysql expert for mysql and the gizzard expert to manage the
>> distributed part. It can be argued that long term this will have higher
>> adminstration cost
>>
>> Cassandra's value add is its simple peer to peer architecture that is
>> easy to manage - a single database solution that is distributed, scalable,
>> highly available etc. In other words, once you gain expertise cassandra,
>> you get everything in one package.
>>
>> regards
>>
>>
>>
>>
>>
>> On Thu, Dec 29, 2016 at 4:05 AM, Sikander Rafiq 
>> wrote:
>>
>> Hi,
>>
>> I'm exploring Cassandra for handling large data sets for mobile app, but
>> i'm not clear where it stands.
>>
>>
>> If we use MySQL as  underlying database and Gizzard for building custom
>> distributed databases (with arbitrary storage technology) and Memcached for
>> highly queried data, then where lies Cassandra?
>>
>>
>> As i have read that Twitter uses both Cassandra and Gizzard. Please
>> explain me where Cassandra will act.
>>
>>
>> Thanks in advance.
>>
>>
>> Regards,
>>
>> Sikander
>>
>>
>> Sent from Outlook 
>>
>>
>>
>>
>> --
>> http://khangaonkar.blogspot.com/
>>
>

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-26 Thread Edward Capriolo

On Sat, Dec 24, 2016 at 5:58 AM, Kant Kodali <k...@peernova.com> wrote:

> @Edward Agreed JVM is awesome and it is a work of many smart people and
> this is obvious if one looks into the JDK code. But given Oracle history of
> business practices and other decisions it is a bit hard to convince oneself
> that everything is going to be OK and that they actually care about open
> source. Even the module system that they are trying to come up with is
> something that motivated by the problem they have faced internally.
>
> To reiterate again just watch this video https://www.youtube.com/
> watch?v=9ei-rbULWoA
>
> My statements are not solely based on this video but I certainly would
> give good weight for James Gosling.
>
> I tend to think that Oracle has not closed Java because they know that
> cant get money from users because these days not many people are willing to
> pay even for distributed databases so I don't think anyone would pay for
> programming language. In short, Let me end by saying Oracle just has lot of
> self interest but I really hope that I am wrong since I am a big fan of JVM.
>
>
>
>
>
> On Fri, Dec 23, 2016 at 7:15 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>>
>> On Fri, Dec 23, 2016 at 6:01 AM, Kant Kodali <k...@peernova.com> wrote:
>>
>>> Java 9 Module system looks really interesting. I would be very curious
>>> to see how Cassandra would leverage that.
>>>
>>> On Thu, Dec 22, 2016 at 9:09 AM, Kant Kodali <k...@peernova.com> wrote:
>>>
>>>> I would agree with Eric with his following statement. In fact, I was
>>>> trying to say the same thing.
>>>>
>>>> "I don't really have any opinions on Oracle per say, but Cassandra is a
>>>> Free Software project and I would prefer that we not depend on
>>>> commercial software, (and that's kind of what we have here, an
>>>> implicit dependency)."
>>>>
>>>> On Thu, Dec 22, 2016 at 3:09 AM, Brice Dutheil <brice.duth...@gmail.com
>>>> > wrote:
>>>>
>>>>> Pretty much a non-story, it seems like.
>>>>>
>>>>> Clickbait imho. Search ‘The Register’ in this wikipedia page
>>>>> <https://en.wikipedia.org/wiki/Wikipedia:Potentially_unreliable_sources#News_media>
>>>>>
>>>>> @Ben Manes
>>>>>
>>>>> Agreed, OpenJDK and Oracle JDK are now pretty close, but there is
>>>>> still some differences in the VM code and third party dependencies like
>>>>> security libraries. Maybe that’s fine for some productions, but maybe not
>>>>> for everyone.
>>>>>
>>>>> Also another thing, while OpenJDK source is available to all, I don’t
>>>>> think all OpenJDK builds have been certified with the TCK. For example the
>>>>> Zulu OpenJDK is, as Azul have access to the TCK and certifies
>>>>> <https://www.azul.com/products/zulu/> the builds. Another example
>>>>> OpenJDK build installed on RHEL is certified
>>>>> <https://access.redhat.com/articles/1299013>. Canonical probably is
>>>>> running TCK comliance tests as well on thei OpenJDK 8 since they are 
>>>>> listed
>>>>> on the signatories
>>>>> <http://openjdk.java.net/groups/conformance/JckAccess/jck-access.html>
>>>>> but not sure as I couldn’t find evidence on this; on this signatories list
>>>>> again there’s an individual – Emmanuel Bourg – who is related to
>>>>> Debian <https://lists.debian.org/debian-java/2015/01/msg00015.html> (
>>>>> linkedin <https://www.linkedin.com/in/ebourg>), but not sure again
>>>>> the TCK is passed for each build.
>>>>>
>>>>> Bad OpenJDK intermediary builds, i.e without TCK compliance tests, is
>>>>> a reality
>>>>> <https://github.com/docker-library/openjdk/commit/00a9c5c080f2a5fd1510bc0716db7afe06cbd017>
>>>>> .
>>>>>
>>>>> While the situation has enhanced over the past months I’ll still
>>>>> double check before using any OpenJDK builds.
>>>>> 
>>>>>
>>>>> -- Brice
>>>>>
>>>>> On Wed, Dec 21, 2016 at 5:08 PM, Voytek Jarnot <
>>>>> voytek.jar...@gmail.com> wrote:
>>>>>
>>>>>> Reading that article the only conclusion I can reach (unless I'm
>>>>>> misreading) is that all

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-23 Thread Edward Capriolo

On Fri, Dec 23, 2016 at 6:01 AM, Kant Kodali  wrote:

> Java 9 Module system looks really interesting. I would be very curious to
> see how Cassandra would leverage that.
>
> On Thu, Dec 22, 2016 at 9:09 AM, Kant Kodali  wrote:
>
>> I would agree with Eric with his following statement. In fact, I was
>> trying to say the same thing.
>>
>> "I don't really have any opinions on Oracle per say, but Cassandra is a
>> Free Software project and I would prefer that we not depend on
>> commercial software, (and that's kind of what we have here, an
>> implicit dependency)."
>>
>> On Thu, Dec 22, 2016 at 3:09 AM, Brice Dutheil 
>> wrote:
>>
>>> Pretty much a non-story, it seems like.
>>>
>>> Clickbait imho. Search ‘The Register’ in this wikipedia page
>>> 
>>>
>>> @Ben Manes
>>>
>>> Agreed, OpenJDK and Oracle JDK are now pretty close, but there is still
>>> some differences in the VM code and third party dependencies like security
>>> libraries. Maybe that’s fine for some productions, but maybe not for
>>> everyone.
>>>
>>> Also another thing, while OpenJDK source is available to all, I don’t
>>> think all OpenJDK builds have been certified with the TCK. For example the
>>> Zulu OpenJDK is, as Azul have access to the TCK and certifies
>>>  the builds. Another example
>>> OpenJDK build installed on RHEL is certified
>>> . Canonical probably is
>>> running TCK comliance tests as well on thei OpenJDK 8 since they are listed
>>> on the signatories
>>> 
>>> but not sure as I couldn’t find evidence on this; on this signatories list
>>> again there’s an individual – Emmanuel Bourg – who is related to Debian
>>>  (linkedin
>>> ), but not sure again the TCK is
>>> passed for each build.
>>>
>>> Bad OpenJDK intermediary builds, i.e without TCK compliance tests, is a
>>> reality
>>> 
>>> .
>>>
>>> While the situation has enhanced over the past months I’ll still double
>>> check before using any OpenJDK builds.
>>> 
>>>
>>> -- Brice
>>>
>>> On Wed, Dec 21, 2016 at 5:08 PM, Voytek Jarnot 
>>> wrote:
>>>
 Reading that article the only conclusion I can reach (unless I'm
 misreading) is that all the stuff that was never free is still not free -
 the change is that Oracle may actually be interested in the fact that some
 are using non-free products for free.

 Pretty much a non-story, it seems like.

 On Tue, Dec 20, 2016 at 11:55 PM, Kant Kodali 
 wrote:

> Looking at this http://www.theregister.co
> .uk/2016/12/16/oracle_targets_java_users_non_compliance/?mt=
> 1481919461669 I don't know why Cassandra recommends Oracle JVM?
>
> JVM is a great piece of software but I would like to stay away from
> Oracle as much as possible. Oracle is just horrible the way they are
> dealing with Java in General.
>
>
>

>>>
>>
>
"I don't really have any opinions on Oracle per say, but Cassandra is a
Free Software project and I would prefer that we not depend on
commercial software, (and that's kind of what we have here, an
implicit dependency)."

We are a bit loose here with terms "free" and "commercial". The oracle JVM
is open source, it is free to use and the trademark is owned by a company.

That is not much different then using a tool for cassandra like a driver
hosted on github but made my a company.

The thing about a JVM is that like a kernel you want really smart dedicated
people working on it. Oracle has moved the JVM forward since taking over
sun. You can not just manage a JVM like say the freebsd port of x
maintained by 3 part time dudes that all get paid to do something else.

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-23 Thread Edward Capriolo

Anecdotal CAS works differently than the typical cassandra workload. If you
run a stress instance 3 nodes one host, you find that you typically run
into CPU issues, but if you are doing a CAS workload you see things timing
out and before you hit 100% CPU. It is a strange beast.

On Fri, Dec 23, 2016 at 7:28 AM, horschi  wrote:

> Update: I replace all quorum reads on that table with serial reads, and
> now these errors got less. Somehow quorum reads on CAS values cause most of
> these WTEs.
>
> Also I found two tickets on that topic:
> https://issues.apache.org/jira/browse/CASSANDRA-9328
> https://issues.apache.org/jira/browse/CASSANDRA-8672
>
> On Thu, Dec 15, 2016 at 3:14 PM, horschi  wrote:
>
>> Hi,
>>
>> I would like to warm up this old thread. I did some debugging and found
>> out that the timeouts are coming from StorageProxy.proposePaxos()
>> - callback.isFullyRefused() returns false and therefore triggers a
>> WriteTimeout.
>>
>> Looking at my ccm cluster logs, I can see that two replica nodes return
>> different results in their ProposeVerbHandler. In my opinion the
>> coordinator should not throw a Exception in such a case, but instead retry
>> the operation.
>>
>> What do the CAS/Paxos experts on this list say to this? Feel free to
>> instruct me to do further tests/code changes. I'd be glad to help.
>>
>> Log:
>>
>> node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15 14:48:36,896
>> PaxosState.java:124 - Rejecting proposal for 
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> node1/logs/system.log-Row: id=@ | value=) because
>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> --
>> node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15
>> 14:48:36,980 StorageProxy.java:506 - proposePaxos:
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>> columns=[[] | [value]]
>> node1/logs/system.log-Row: id=@ | value=)//1//0
>> --
>> node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15 14:48:36,969
>> PaxosState.java:117 - Accepting proposal: 
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> node2/logs/system.log-Row: id=@ | value=)
>> --
>> node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15 14:48:36,897
>> PaxosState.java:124 - Rejecting proposal for 
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> node3/logs/system.log-Row: id=@ | value=) because
>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>>
>>
>> kind regards,
>> Christian
>>
>>
>> On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers  wrote:
>>
>>> My thinking was that due to the size of the data that there maybe I/O
>>> issues. But it sounds more like you're competing for locks and hit a
>>> deadlock issue.
>>>
>>> Regards,
>>> Denise
>>> Cell - (860)989-3431 <(860)%20989-3431>
>>>
>>> Sent from mi iPhone
>>>
>>> On Apr 15, 2016, at 9:00 AM, horschi  wrote:
>>>
>>> Hi Denise,
>>>
>>> in my case its a small blob I am writing (should be around 100 bytes):
>>>
>>>  CREATE TABLE "Lock" (
>>>  lockname varchar,
>>>  id varchar,
>>>  value blob,
>>>  PRIMARY KEY (lockname, id)
>>>  ) WITH COMPACT STORAGE
>>>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
>>> 'chunk_length_kb' : '8' };
>>>
>>> You ask because large values are known to cause issues? Anything special
>>> you have in mind?
>>>
>>> kind regards,
>>> Christian
>>>
>>>
>>>
>>>
>>> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:
>>>
 Also, what type of data were you reading/writing?

 Regards,
 Denise

 Sent from mi iPad

 On Apr 15, 2016, at 8:29 AM, horschi  wrote:

 Hi Jan,

 were you able to resolve your Problem?

 We are trying the same and also see a lot of WriteTimeouts:
 WriteTimeoutException: Cassandra timeout during write query at
 consistency SERIAL (2 replica were required but only 1 acknowledged the
 write)

 How many clients were competing for a lock in your case? In our case
 its only two :-(

 cheers,
 Christian


 On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
 wrote:

> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
> jan.algermis...@nordsc.com> wrote:
>
>> I am experimenting with C* 2.0 ( and today's java-driver 2.0
>> snapshot) for implementing distributed locks.
>>
>
> [ and I'm experiencing the problem described in the subject ... ]
>
>
>> Any idea how to approach this problem?
>>
>
> 1) Upgrade to 2.0.1 release.
> 2) Try to

Re: Advice in upgrade plan from 1.2.18 to 2.2.8

2016-12-22 Thread Edward Capriolo

Also before you get started. Make sure:
1) no one attempts to change schema during the process
2) no one attempts to run a repair
3) no one attempts to join a node
4) no one attempts to remove/move nodes from the cluster

Each of these things trigger repair sessions and stream data which do not
work in a mix cluster

On Thu, Dec 22, 2016 at 3:07 PM, Aiman Parvaiz  wrote:

> Thanks Alain. This was extremely helpful, really grateful.
>
> Aiman
>
> On Dec 22, 2016, at 5:00 AM, Alain RODRIGUEZ  wrote:
>
> Hi,
>
> Here are some thoughts:
>
> running 1.2.18. I plan to upgrade them to 2.2.latest
>>
>
> Going 1 major release at the time is probably the safest way to go indeed.
>
>
>>1. Install 2.0.latest on one node at a time, start and wait for it to
>>join the ring.
>>2. Run upgradesstables on this node.
>>3. Repeat Step 1,2 on each node installing cassandra2.0 in a rolling
>>manner and running upgradesstables in parallel. (Please let me know if
>>running upgrades stables in parallel is not right here. My cluster is not
>>under much load really)
>>
>>
> I would:
>
> - Upgrade one node, check for cluster health (monitoring, logs, nodeool
> commands), having a special attention to the 2.0 node.
> - If everything is ok, then go for more nodes, if using distinct racks I
> would go per rack; sequentially, node per node, all the nodes from
> DC1-rack1, then DC1-rack2, then DC1-rack3. Then move to the next DC if
> everything is fine.
> - Start the 'upgradesstables' when the cluster is completely and
> successfully running with the new version (2.0.17). It is perfectly fine to
> run this in parallel as the last part of the upgrade. As you guessed, it is
> good to keep monitoring the cluster load.
>
> 4. Now I will have both my DCs running 2.0.latest.
>
>
> Without really having any strong argument, I would let it run for "some
> time" like this, hours at least, maybe days. In any case, you will probably
> have some work to prepare before the next upgrade, so you will have time to
> check how the cluster is doing.
>
> 6. Do I need to run upgradesstables here again after the node has started
>> and joined? (I think yes, but seek advice. https://docs.datastax.
>> com/en/latest-upgrade/upgrade/cassandra/upgrdCassandra.html)
>
>
> Yes, every time you run a major upgrade. Anyway, nodetool upgradesstables
> will skip any sstable that do not need to be upgraded (as long as you don't
> add the option to force it), so it is probably better to run it when you
> have a doubt.
>
>
> As additional information, I would prepare, for each upgrade:
>
>
>- The new Cassandra configuration (cassandra.yaml and
>cassandra-sh.yaml mainly but also other configuration files)
>
>To do that, I use to merge the current file in use (your configuration
>on C* 1.2.18) and the Cassandra version file from github for the new
>version (i.e. https://github.com/apache/cassandra/tree/
>cassandra-2.0.17/conf).
>
>This allows you to
>   - Acknowledge and consider the new and removed configuration
>   settings
>   - Keep comments and default values in the configuration files up to
>   date
>   - Be fully exhaustive, and learn as you parse the files
>
>   - Make sure clients will still work with the new version (see the
>doc, do the tests)
>- Cassandra metrics changed in the latest versions, you might have to
>rework your dashboards. Anticipating the dashboard creation for new
>versions would prevent you from loosing metrics when you need them the 
> most.
>
>
> Finally keep in mind that you should not perform any streaming while
> running multiple version and as long as 'nodetool upgradesstables' is not
> completely done. Meaning you should not add, remove, replace, move or
> repair a node. Also, I would limit schema changes as much as possible while
> running multiple versions as it caused troubles in the past.
>
> During an upgrade, almost nothing else than the normal load due to the
> service and the upgrade itself should happen. We always try to keep this
> time window as short as possible.
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-12-21 20:36 GMT+01:00 Aiman Parvaiz :
>
>> Hi everyone,
>> I have 2 C* DCs with 12 nodes in each running 1.2.18. I plan to upgrade
>> them to 2.2.latest and wanted to run by you experts my plan.
>>
>>
>>1. Install 2.0.latest on one node at a time, start and wait for it to
>>join the ring.
>>2. Run upgradesstables on this node.
>>3. Repeat Step 1,2 on each node installing cassandra2.0 in a rolling
>>manner and running upgradesstables in parallel. (Please let me know if
>>running upgrades stables in parallel is not right here. My cluster is not
>>under much load really)
>>4. Now I will have

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Edward Capriolo

On Wednesday, December 21, 2016, Kant Kodali  wrote:

> https://www.youtube.com/watch?v=9ei-rbULWoA
>
> On Wed, Dec 21, 2016 at 2:59 AM, Kant Kodali  > wrote:
>
>> https://www.elastic.co/guide/en/elasticsearch/guide/current/
>> _java_virtual_machine.html
>>
>> On Wed, Dec 21, 2016 at 2:58 AM, Kant Kodali > > wrote:
>>
>>> The fact is Oracle is horrible :)
>>>
>>>
>>> On Wed, Dec 21, 2016 at 2:54 AM, Brice Dutheil >> > wrote:
>>>
 Let's not debate opinion on the Oracle stewardship here, we certainly
 have different views that come from different experiences.

 Let's discuss facts instead :)

 -- Brice

 On Wed, Dec 21, 2016 at 11:34 AM, Kant Kodali > wrote:

> yeah well I don't think Oracle is treating Java the way Google is
> treating Go and I am not a big fan of Go mainly because I understand the
> JVM is far more robust than anything that is out there.
>
> "Oracle just doesn't understand open source" These are the words from
> James Gosling himself
>
> I do think its better to stay away from Oracle as we never know when
> they would switch open source to closed source. Given their history of
> practices their statements are not credible.
>
> I am pretty sure the community would take care of OpenJDK.
>
>
>
>
>
> On Wed, Dec 21, 2016 at 2:04 AM, Brice Dutheil <
> brice.duth...@gmail.com
> > wrote:
>
>> The problem described in this article is different than what you have
>> on your servers and I’ll add this article should be reaad with caution, 
>> as
>> The Register is known for sensationalism. The article itself has no
>> substantial proof or enough details. In my opinion this article is
>> clickbait.
>>
>> Anyway there’s several point to think of instead of just swicthing to
>> OpenJDK :
>>
>>-
>>
>>There is technical differences between Oracle JDK and openjdk.
>>Where there’s licensing issues some libraries are closed source in 
>> Hotspot
>>like font, rasterizer or cryptography and OpenJDK use open source
>>alternatives which leads to different bugs or performance. I believe 
>> they
>>also have minor differences in the hotspot code to plug in stuff like 
>> Java
>>Mission Control or Flight Recorder or hotpost specific options.
>>Also I believe that Oracle JDK is more tested or more up to date
>>than OpenJDK.
>>
>>So while OpenJDK is functionnaly the same as Oracle JDK it may
>>not have the same performance or the same bugs or the same security 
>> fixes.
>>(Unless are your ready to test that with your production servers and 
>> your
>>production data).
>>
>>I don’t know if datastax have released the details of their
>>configuration when they test Cassandra.
>>-
>>
>>There’s also a question of support. OpeJDK is for the community.
>>Oracle can offer support but maybe only for Oracle JDK.
>>
>>Twitter uses OpenJDK, but they have their own JVM support team.
>>Not sure everyone can afford that.
>>
>> As a side note I’ll add that Oracle is paying talented engineers to
>> work on the JVM to make it great.
>>
>> Cheers,
>> 
>>
>> -- Brice
>>
>> On Wed, Dec 21, 2016 at 6:55 AM, Kant Kodali > > wrote:
>>
>>> Looking at this http://www.theregister.co
>>> .uk/2016/12/16/oracle_targets_java_users_non_compliance/?mt=
>>> 1481919461669 I don't know why Cassandra recommends Oracle JVM?
>>>
>>> JVM is a great piece of software but I would like to stay away from
>>> Oracle as much as possible. Oracle is just horrible the way they are
>>> dealing with Java in General.
>>>
>>>
>>>
>>
>

>>>
>>
>
Generally a good decision is to balance between a platform you are familiar
with and a platform most commonly deployed in production.

Ie even if i saw a talk from facebook that says cassandra is awesome on
solaris x running on cool threads chips, but if i was at a windows intel
shop i might not pain myself with the burden.

Cassandra uses specific native/unsafe libraries not guarenteed to be
portable. Eg once i was using a non sun jvm and the saved key caches would
not load.

As to oracle not knowing open source, maybe not but sun had its own issues,
see the story about apache harmony and sun unwilling to certify the harmony
jvm. What

Re: Benefit of LOCAL_SERIAL consistency

2016-12-08 Thread Edward Capriolo

On Thu, Dec 8, 2016 at 5:10 AM, Sylvain Lebresne <sylv...@datastax.com>
wrote:

> > The reason you don't want to use SERIAL in multi-DC clusters
>
> I'm not a fan of blanket statements like that. There is a high cost to
> SERIAL
> consistency in multi-DC setups, but if you *need* global linearizability,
> then
> you have no choice and the latency may be acceptable for your use case.
> Take
> the example of using LWT to ensure no 2 user creates accounts with the same
> name in your system: it's something you don't want to screw up, but it's
> also
> something for which a high-ish latency is probably acceptable. I don't
> think
> users would get super pissed off because registering a new account on some
> service takes 500ms.
>
> So yes it's costly, as is most things that willingly depends on cross-DC
> latency, but I don't think that means it's never ever useful.
>
> > So, I am not sure about what is the good use case for LOCAL_SERIAL.
>
> Well, a good use case is when you're ok with operations within a
> datacenter to
> be linearizable, but can accept 2 operations in different datacenters to
> not be.
> Imagine a service that pins a given user to a DC on login for different
> reasons,
> that service might be fine using LOCAL_SERIAL for operations confined to a
> given user session since it knows it's DC local.
>
> So I think both SERIAL and LOCAL_SERIAL have their uses, though we
> absolutely
> agree they are not meant to be used together. And it's certainly worth
> trying to
> design your system in a way that make sure LOCAL_SERIAL is enough for you,
> if
> you can, since SERIAL is pretty costly. But that doesn't mean there isn't
> case
> where you care more about global linearizability than latency: engineering
> is
> all about trade-offs.
>
> > I am not sure what of the state of this is anymore but I was under the
> > impression the linearizability of lwt was in question. I never head it
> > specifically addressed.
>
> That's a pretty vague statement to make, let's not get into FUD. You
> "might" be
> thinking of a fairly old blog post by Aphyr that tested LWT in their very
> early
> days and they were bugs indeed, but those were fixed a long time ago. Since
> then, his tests and much more were performed
> (http://www.datastax.com/dev/blog/testing-apache-cassandra-with-jepsen)
> and no problem with linearizability that I know of has been found. Don't
> get me
> wrong, any code can have subtle bug and not finding problems doesn't
> guarantee
> there isn't one, but if someone has demonstrated legit problems with the
> linearizability of LWT, it's unknown to me and I'm watching this pretty
> carefully.
>
> I'll note to be complete that I'm not pretending the LWT implementation is
> perfect, it's not (it's slow for one), and using them correctly can be more
> challenging that it may sound at first (mostly because you need to handle
> query timeouts properly and that's not always simple, sometimes requiring
> a more complex data model that you'd want), but those are not break of
> linearizability.
>
> > https://issues.apache.org/jira/browse/CASSANDRA-6106
>
> That ticket has nothing to do with LWT. In fact, LWT is the one mechanism
> in
> Cassandra where this ticket has not impact whatsoever because the whole
> point of
> the mechanism is to ensure timestamps are assigned in a collision free
> manner.
>
>
> On Thu, Dec 8, 2016 at 8:32 AM, Hiroyuki Yamada <mogwa...@gmail.com>
> wrote:
>
>> Hi DuyHai,
>>
>> Thank you for the comments.
>> Yes, that's exactly what I mean.
>> (Your comment is very helpful to support my opinion.)
>>
>> As you said, SERIAL with multi-DCs incurs latency increase,
>> but it's a trade-off between latency and high availability bacause one
>> DC can be down from a disaster.
>> I don't think there is any way to achieve global linearlizability
>> without latency increase, right ?
>>
>> > Edward
>> Thank you for the ticket.
>> I'll read it through.
>>
>> Thanks,
>> Hiro
>>
>> On Thu, Dec 8, 2016 at 12:01 AM, Edward Capriolo <edlinuxg...@gmail.com>
>> wrote:
>> >
>> >
>> > On Wed, Dec 7, 2016 at 8:25 AM, DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>> >>
>> >> The reason you don't want to use SERIAL in multi-DC clusters is the
>> >> prohibitive cost of lightweight transaction (in term of latency),
>> especially
>> >> if your data centers are separated by continents. A ping from London
>> to New
>> >> York takes 52ms just by speed of light in optic cable. Sinc

Re: Batch size warnings

2016-12-07 Thread Edward Capriolo

I have been circling around a thought process over batches. Now that
Cassandra has aggregating functions, it might be possible write a type of
record that has an END_OF_BATCH type marker and the data can be suppressed
from view until it was all there.

IE you write something like a checksum record that an intelligent client
can use to tell if the rest of the batch is complete.

On Wed, Dec 7, 2016 at 11:58 AM, Voytek Jarnot 
wrote:

> Been about a month since I have up on it, but it was very much related to
> the stuff you're dealing with ... Basically Cassandra just stepping on its
> own er, tripping over its own feet streaming MVs.
>
> On Dec 7, 2016 10:45 AM, "Benjamin Roth"  wrote:
>
>> I meant the mv thing
>>
>> Am 07.12.2016 17:27 schrieb "Voytek Jarnot" :
>>
>>> Sure, about which part?
>>>
>>> default batch size warning is 5kb
>>> I've increased it to 30kb, and will need to increase to 40kb (8x default
>>> setting) to avoid WARN log messages about batch sizes.  I do realize it's
>>> just a WARNing, but may as well avoid those if I can configure it out.
>>> That said, having to increase it so substantially (and we're only dealing
>>> with 5 tables) is making me wonder if I'm not taking the correct approach
>>> in terms of using batches to guarantee atomicity.
>>>
>>> On Wed, Dec 7, 2016 at 10:13 AM, Benjamin Roth 
>>> wrote:
>>>
 Could you please be more specific?

 Am 07.12.2016 17:10 schrieb "Voytek Jarnot" :

> Should've mentioned - running 3.9.  Also - please do not recommend
> MVs: I tried, they're broken, we punted.
>
> On Wed, Dec 7, 2016 at 10:06 AM, Voytek Jarnot <
> voytek.jar...@gmail.com> wrote:
>
>> The low default value for batch_size_warn_threshold_in_kb is making
>> me wonder if I'm perhaps approaching the problem of atomicity in a
>> non-ideal fashion.
>>
>> With one data set duplicated/denormalized into 5 tables to support
>> queries, we use batches to ensure inserts make it to all or 0 tables.  
>> This
>> works fine, but I've had to bump the warn threshold and fail threshold
>> substantially (8x higher for the warn threshold).  This - in turn - makes
>> me wonder, with a default setting so low, if I'm not solving this problem
>> in the canonical/standard way.
>>
>> Mostly just looking for confirmation that we're not unintentionally
>> doing something weird...
>>
>
>
>>>

Re: Benefit of LOCAL_SERIAL consistency

2016-12-07 Thread Edward Capriolo

On Wed, Dec 7, 2016 at 8:25 AM, DuyHai Doan  wrote:

> The reason you don't want to use SERIAL in multi-DC clusters is the
> prohibitive cost of lightweight transaction (in term of latency),
> especially if your data centers are separated by continents. A ping from
> London to New York takes 52ms just by speed of light in optic cable. Since
> LightWeight Transaction involves 4 network round-trips, it means at least
> 200ms just for raw network transfer, not even taking into account the cost
> of processing the operation
>
> You're right to raise a warning about mixing LOCAL_SERIAL with SERIAL.
> LOCAL_SERIAL guarantees you linearizability inside a DC, SERIAL guarantees
> you linearizability across multiple DC.
>
> If I have 3 DCs with RF = 3 each (total 9 replicas) and I did an INSERT IF
> NOT EXISTS with LOCAL_SERIAL in DC1, then it's possible that a subsequent
> INSERT IF NOT EXISTS on the same record succeeds when using SERIAL because
> SERIAL on 9 replicas = at least 5 replicas. Those 5 replicas which respond
> can come from DC2 and DC3 and thus did not apply yet the previous INSERT...
>
> On Wed, Dec 7, 2016 at 2:14 PM, Hiroyuki Yamada 
> wrote:
>
>> Hi,
>>
>> I have been using lightweight transactions for several months now and
>> wondering what is the benefit of having LOCAL_SERIAL serial consistency
>> level.
>>
>> With SERIAL, it achieves global linearlizability,
>> but with LOCAL_SERIAL, it only achieves DC-local linearlizability,
>> which is missing point of linearlizability, I think.
>>
>> So, for example,
>> once when SERIAL is used,
>> we can't use LOCAL_SERIAL to achieve local linearlizability
>> since data in local DC might not be updated yet to meet quorum.
>> And vice versa,
>> once when LOCAL_SERIAL is used,
>> we can't use SERIAL to achieve global linearlizability
>> since data is not globally updated yet to meet quorum .
>>
>> So, it would be great if we can use LOCAL_SERIAL if possible and
>> use SERIAL only if local DC is down or unavailable,
>> but based on the example above, I think it is not possible, is it ?
>> So, I am not sure about what is the good use case for LOCAL_SERIAL.
>>
>> The only case that I can think of is having a cluster in one DC for
>> online transactions and
>> having another cluster in another DC for analytics purpose.
>> In this case, I think there is no big point of using SERIAL since data
>> for analytics sometimes doesn't have to be very correct/fresh and
>> data can be asynchronously replicated to analytics node. (so using
>> LOCAL_SERIAL for one DC makes sense.)
>>
>> Could anyone give me some thoughts about it ?
>>
>> Thanks,
>> Hiro
>>
>
>
You're right to raise a warning about mixing LOCAL_SERIAL with SERIAL.
LOCAL_SERIAL guarantees you linearizability inside a DC, SERIAL guarantees
you linearizability across multiple DC.

I am not sure what of the state of this is anymore but I was under the
impression the linearizability of lwt was in question. I never head it
specifically addressed.

https://issues.apache.org/jira/browse/CASSANDRA-6106

Its hard to follow 6106 because most of the tasks are closed 'fix later'
 or closed 'not a problem' .

Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo

On Sat, Dec 3, 2016 at 11:01 AM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

>
>
> On Saturday, December 3, 2016, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>>
>>
>> On Saturday, December 3, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote:
>>
>>> That isn't what the original thread is about. The thread is about the
>>> timestamp portion of the UUID being different.
>>>
>>> Having UUID() return the same thing for all rows in a batch would be the
>>> unexpected thing virtually every time.
>>> On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo <edlinuxg...@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Friday, December 2, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote:
>>>>
>>>>> This isn't about using the same UUID though. It's about the timestamp
>>>>> bits in the UUID.
>>>>>
>>>>> What the use case is for generating multiple UUIDs in a single row?
>>>>> Why do you need to extract the timestamp out of both?
>>>>> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo <edlinuxg...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <
>>>>>> sylv...@datastax.com> wrote:
>>>>>>
>>>>>>> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <
>>>>>>> edlinuxg...@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure you saw my reply on thread but I believe everyone's
>>>>>>>> needs can be met I will copy that here:
>>>>>>>>
>>>>>>>
>>>>>>> I saw it, but the real problem that was raised initially was not
>>>>>>> that of UDF and of allowing both behavior. It's a matter of people being
>>>>>>> confused by the behavior of a non-UDF function, now(), and suggesting it
>>>>>>> should be changed.
>>>>>>>
>>>>>>> The Hive idea is interesting I guess, and we can switch to
>>>>>>> discussing that, but it's a different problem really and I'm not a fond 
>>>>>>> of
>>>>>>> derailing threads. I will just note though that if we're not talking 
>>>>>>> about
>>>>>>> a confusion issue but rather how to get a timeuuid to be fixed within a
>>>>>>> statement, then there is much much more trivial solution: generate it
>>>>>>> client side. The `now()` function is a small convenience but there is
>>>>>>> nothing you cannot do without it client side, and that actually 
>>>>>>> basically
>>>>>>> stands for almost any use of (non aggregate) function in Cassandra
>>>>>>> currently.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "Food for thought: Hive's UDFs introduced an annotation
>>>>>>>> @UDFType(deterministic = false)
>>>>>>>>
>>>>>>>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
>>>>>>>> -and-reduce-side-in-hive/
>>>>>>>>
>>>>>>>> The effect is the query planner can see when such a UDF is in use
>>>>>>>> and determine the value once at the start of a very long query."
>>>>>>>>
>>>>>>>> Essentially hive had a similar if not identical problem, during a
>>>>>>>> long running distributed process like map/reduce some users wanted the
>>>>>>>> semantics of:
>>>>>>>>
>>>>>>>> 1) Each call should have a new timestamps
>>>>>>>>
>>>>>>>> While other users wanted the semantics of:
>>>>>>>>
>>>>>>>> 2) Each call should generate the same timestamp
>>>>>>>>
>>>>>>>> The solution implemented was to add an annotation to udf such that
>>>>>>>> the query planner would pick up the annotation and act accordingly.
>>>>>>>>
>>>>>>>> (Here is a related issue https://issues.apache.or
>>>>>>>> g/jira/browse/HIVE-198

Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo

On Saturday, December 3, 2016, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

>
>
> On Saturday, December 3, 2016, Jonathan Haddad <j...@jonhaddad.com
> <javascript:_e(%7B%7D,'cvml','j...@jonhaddad.com');>> wrote:
>
>> That isn't what the original thread is about. The thread is about the
>> timestamp portion of the UUID being different.
>>
>> Having UUID() return the same thing for all rows in a batch would be the
>> unexpected thing virtually every time.
>> On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo <edlinuxg...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Friday, December 2, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote:
>>>
>>>> This isn't about using the same UUID though. It's about the timestamp
>>>> bits in the UUID.
>>>>
>>>> What the use case is for generating multiple UUIDs in a single row? Why
>>>> do you need to extract the timestamp out of both?
>>>> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo <edlinuxg...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <
>>>>> sylv...@datastax.com> wrote:
>>>>>
>>>>>> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <
>>>>>> edlinuxg...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> I am not sure you saw my reply on thread but I believe everyone's
>>>>>>> needs can be met I will copy that here:
>>>>>>>
>>>>>>
>>>>>> I saw it, but the real problem that was raised initially was not that
>>>>>> of UDF and of allowing both behavior. It's a matter of people being
>>>>>> confused by the behavior of a non-UDF function, now(), and suggesting it
>>>>>> should be changed.
>>>>>>
>>>>>> The Hive idea is interesting I guess, and we can switch to discussing
>>>>>> that, but it's a different problem really and I'm not a fond of derailing
>>>>>> threads. I will just note though that if we're not talking about a
>>>>>> confusion issue but rather how to get a timeuuid to be fixed within a
>>>>>> statement, then there is much much more trivial solution: generate it
>>>>>> client side. The `now()` function is a small convenience but there is
>>>>>> nothing you cannot do without it client side, and that actually basically
>>>>>> stands for almost any use of (non aggregate) function in Cassandra
>>>>>> currently.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> "Food for thought: Hive's UDFs introduced an annotation
>>>>>>> @UDFType(deterministic = false)
>>>>>>>
>>>>>>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
>>>>>>> -and-reduce-side-in-hive/
>>>>>>>
>>>>>>> The effect is the query planner can see when such a UDF is in use
>>>>>>> and determine the value once at the start of a very long query."
>>>>>>>
>>>>>>> Essentially hive had a similar if not identical problem, during a
>>>>>>> long running distributed process like map/reduce some users wanted the
>>>>>>> semantics of:
>>>>>>>
>>>>>>> 1) Each call should have a new timestamps
>>>>>>>
>>>>>>> While other users wanted the semantics of:
>>>>>>>
>>>>>>> 2) Each call should generate the same timestamp
>>>>>>>
>>>>>>> The solution implemented was to add an annotation to udf such that
>>>>>>> the query planner would pick up the annotation and act accordingly.
>>>>>>>
>>>>>>> (Here is a related issue https://issues.apache.or
>>>>>>> g/jira/browse/HIVE-1986
>>>>>>>
>>>>>>> As a result you can essentially implement two UDFS
>>>>>>>
>>>>>>> @UDFType(deterministic = false)
>>>>>>> public class UDFNow
>>>>>>>
>>>>>>> and for the other people
>>>>>>>
>>>>>>> @UDFType(deterministic = true)
>>>>>>> public

Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo

On Saturday, December 3, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote:

> That isn't what the original thread is about. The thread is about the
> timestamp portion of the UUID being different.
>
> Having UUID() return the same thing for all rows in a batch would be the
> unexpected thing virtually every time.
> On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo <edlinuxg...@gmail.com
> <javascript:_e(%7B%7D,'cvml','edlinuxg...@gmail.com');>> wrote:
>
>>
>>
>> On Friday, December 2, 2016, Jonathan Haddad <j...@jonhaddad.com
>> <javascript:_e(%7B%7D,'cvml','j...@jonhaddad.com');>> wrote:
>>
>>> This isn't about using the same UUID though. It's about the timestamp
>>> bits in the UUID.
>>>
>>> What the use case is for generating multiple UUIDs in a single row? Why
>>> do you need to extract the timestamp out of both?
>>> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo <edlinuxg...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <sylv...@datastax.com
>>>> > wrote:
>>>>
>>>>> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <edlinuxg...@gmail.com
>>>>> > wrote:
>>>>>
>>>>>>
>>>>>> I am not sure you saw my reply on thread but I believe everyone's
>>>>>> needs can be met I will copy that here:
>>>>>>
>>>>>
>>>>> I saw it, but the real problem that was raised initially was not that
>>>>> of UDF and of allowing both behavior. It's a matter of people being
>>>>> confused by the behavior of a non-UDF function, now(), and suggesting it
>>>>> should be changed.
>>>>>
>>>>> The Hive idea is interesting I guess, and we can switch to discussing
>>>>> that, but it's a different problem really and I'm not a fond of derailing
>>>>> threads. I will just note though that if we're not talking about a
>>>>> confusion issue but rather how to get a timeuuid to be fixed within a
>>>>> statement, then there is much much more trivial solution: generate it
>>>>> client side. The `now()` function is a small convenience but there is
>>>>> nothing you cannot do without it client side, and that actually basically
>>>>> stands for almost any use of (non aggregate) function in Cassandra
>>>>> currently.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> "Food for thought: Hive's UDFs introduced an annotation  
>>>>>> @UDFType(deterministic
>>>>>> = false)
>>>>>>
>>>>>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-
>>>>>> map-and-reduce-side-in-hive/
>>>>>>
>>>>>> The effect is the query planner can see when such a UDF is in use and
>>>>>> determine the value once at the start of a very long query."
>>>>>>
>>>>>> Essentially hive had a similar if not identical problem, during a
>>>>>> long running distributed process like map/reduce some users wanted the
>>>>>> semantics of:
>>>>>>
>>>>>> 1) Each call should have a new timestamps
>>>>>>
>>>>>> While other users wanted the semantics of:
>>>>>>
>>>>>> 2) Each call should generate the same timestamp
>>>>>>
>>>>>> The solution implemented was to add an annotation to udf such that
>>>>>> the query planner would pick up the annotation and act accordingly.
>>>>>>
>>>>>> (Here is a related issue https://issues.apache.
>>>>>> org/jira/browse/HIVE-1986
>>>>>>
>>>>>> As a result you can essentially implement two UDFS
>>>>>>
>>>>>> @UDFType(deterministic = false)
>>>>>> public class UDFNow
>>>>>>
>>>>>> and for the other people
>>>>>>
>>>>>> @UDFType(deterministic = true)
>>>>>> public class UDFNowOnce extends UDFNow
>>>>>>
>>>>>> Both user cases are met in a sensible way.
>>>>>>
>>>>>
>>>>>
>>>> The `now()` function is a small convenience but there is nothing you
>>>> cannot do without it client side, and that actua

Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo

On Friday, December 2, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote:

> This isn't about using the same UUID though. It's about the timestamp bits
> in the UUID.
>
> What the use case is for generating multiple UUIDs in a single row? Why do
> you need to extract the timestamp out of both?
> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo <edlinuxg...@gmail.com
> <javascript:_e(%7B%7D,'cvml','edlinuxg...@gmail.com');>> wrote:
>
>>
>> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <sylv...@datastax.com
>> <javascript:_e(%7B%7D,'cvml','sylv...@datastax.com');>> wrote:
>>
>>> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <edlinuxg...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','edlinuxg...@gmail.com');>> wrote:
>>>
>>>>
>>>> I am not sure you saw my reply on thread but I believe everyone's needs
>>>> can be met I will copy that here:
>>>>
>>>
>>> I saw it, but the real problem that was raised initially was not that of
>>> UDF and of allowing both behavior. It's a matter of people being confused
>>> by the behavior of a non-UDF function, now(), and suggesting it should be
>>> changed.
>>>
>>> The Hive idea is interesting I guess, and we can switch to discussing
>>> that, but it's a different problem really and I'm not a fond of derailing
>>> threads. I will just note though that if we're not talking about a
>>> confusion issue but rather how to get a timeuuid to be fixed within a
>>> statement, then there is much much more trivial solution: generate it
>>> client side. The `now()` function is a small convenience but there is
>>> nothing you cannot do without it client side, and that actually basically
>>> stands for almost any use of (non aggregate) function in Cassandra
>>> currently.
>>>
>>>
>>>>
>>>>
>>>> "Food for thought: Hive's UDFs introduced an annotation  
>>>> @UDFType(deterministic
>>>> = false)
>>>>
>>>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-
>>>> map-and-reduce-side-in-hive/
>>>>
>>>> The effect is the query planner can see when such a UDF is in use and
>>>> determine the value once at the start of a very long query."
>>>>
>>>> Essentially hive had a similar if not identical problem, during a long
>>>> running distributed process like map/reduce some users wanted the semantics
>>>> of:
>>>>
>>>> 1) Each call should have a new timestamps
>>>>
>>>> While other users wanted the semantics of:
>>>>
>>>> 2) Each call should generate the same timestamp
>>>>
>>>> The solution implemented was to add an annotation to udf such that the
>>>> query planner would pick up the annotation and act accordingly.
>>>>
>>>> (Here is a related issue https://issues.apache.
>>>> org/jira/browse/HIVE-1986
>>>>
>>>> As a result you can essentially implement two UDFS
>>>>
>>>> @UDFType(deterministic = false)
>>>> public class UDFNow
>>>>
>>>> and for the other people
>>>>
>>>> @UDFType(deterministic = true)
>>>> public class UDFNowOnce extends UDFNow
>>>>
>>>> Both user cases are met in a sensible way.
>>>>
>>>
>>>
>> The `now()` function is a small convenience but there is nothing you
>> cannot do without it client side, and that actually basically stands for
>> almost any use of (non aggregate) function in Cassandra currently.
>>
>> Casandra's changing philosophy over which entity should create such
>> information client/server/driver does not make this problem easy.
>>
>> If you take into account that you have users who do not understand all
>> the intricacy of uuid the problem is compounded. IE How does one generate a
>> UUID each c#, python, java etc? with the 47 random bits of bla bla. That is
>> not super easy information to find. Maybe you find a stack overflow post
>> that actually gives bad advice etc.
>>
>> Many times in Cassandra you are using a uuid because you do not have a
>> unique key in the insert and you wish to create one. If you are inserting
>> more then a single record using that same UUID and you do not want the
>> burden of wanting to do it yourself you would have to do write>>read>>write
>> which is an anti-pattern.
>>
>
Not multiple ids for a single row. The same id for multiple inserts in a
batch.

For example lets say I have an application where my data has no unique key.

Table poke
Poker, pokee, time

Suppose i consume pokes from kafka build a batch of 30k and insert them.
You probably want to denormalize into two tables:
Primary key (poker, time)
Primary key (pokee,time)

It makes sense that they all have the same uuid if you want it to be the
uuid of the batch. This would make it easy to correlate all the events.
Easy to delete them all as well.

The do it client side argument is totally valid, but has been a
justification for not adding features many of which are eventually added
anyway.




-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Why does `now()` produce different times within the same query?

2016-12-02 Thread Edward Capriolo

On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <sylv...@datastax.com>
wrote:

> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>>
>> I am not sure you saw my reply on thread but I believe everyone's needs
>> can be met I will copy that here:
>>
>
> I saw it, but the real problem that was raised initially was not that of
> UDF and of allowing both behavior. It's a matter of people being confused
> by the behavior of a non-UDF function, now(), and suggesting it should be
> changed.
>
> The Hive idea is interesting I guess, and we can switch to discussing
> that, but it's a different problem really and I'm not a fond of derailing
> threads. I will just note though that if we're not talking about a
> confusion issue but rather how to get a timeuuid to be fixed within a
> statement, then there is much much more trivial solution: generate it
> client side. The `now()` function is a small convenience but there is
> nothing you cannot do without it client side, and that actually basically
> stands for almost any use of (non aggregate) function in Cassandra
> currently.
>
>
>>
>>
>> "Food for thought: Hive's UDFs introduced an annotation
>> @UDFType(deterministic = false)
>>
>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
>> -and-reduce-side-in-hive/
>>
>> The effect is the query planner can see when such a UDF is in use and
>> determine the value once at the start of a very long query."
>>
>> Essentially hive had a similar if not identical problem, during a long
>> running distributed process like map/reduce some users wanted the semantics
>> of:
>>
>> 1) Each call should have a new timestamps
>>
>> While other users wanted the semantics of:
>>
>> 2) Each call should generate the same timestamp
>>
>> The solution implemented was to add an annotation to udf such that the
>> query planner would pick up the annotation and act accordingly.
>>
>> (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986
>>
>> As a result you can essentially implement two UDFS
>>
>> @UDFType(deterministic = false)
>> public class UDFNow
>>
>> and for the other people
>>
>> @UDFType(deterministic = true)
>> public class UDFNowOnce extends UDFNow
>>
>> Both user cases are met in a sensible way.
>>
>
>
The `now()` function is a small convenience but there is nothing you cannot
do without it client side, and that actually basically stands for almost
any use of (non aggregate) function in Cassandra currently.

Casandra's changing philosophy over which entity should create such
information client/server/driver does not make this problem easy.

If you take into account that you have users who do not understand all the
intricacy of uuid the problem is compounded. IE How does one generate a
UUID each c#, python, java etc? with the 47 random bits of bla bla. That is
not super easy information to find. Maybe you find a stack overflow post
that actually gives bad advice etc.

Many times in Cassandra you are using a uuid because you do not have a
unique key in the insert and you wish to create one. If you are inserting
more then a single record using that same UUID and you do not want the
burden of wanting to do it yourself you would have to do write>>read>>write
which is an anti-pattern.

Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Edward Capriolo

On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <sylv...@datastax.com>
wrote:

> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>>
>> I am not sure you saw my reply on thread but I believe everyone's needs
>> can be met I will copy that here:
>>
>
> I saw it, but the real problem that was raised initially was not that of
> UDF and of allowing both behavior. It's a matter of people being confused
> by the behavior of a non-UDF function, now(), and suggesting it should be
> changed.
>
> The Hive idea is interesting I guess, and we can switch to discussing
> that, but it's a different problem really and I'm not a fond of derailing
> threads. I will just note though that if we're not talking about a
> confusion issue but rather how to get a timeuuid to be fixed within a
> statement, then there is much much more trivial solution: generate it
> client side. The `now()` function is a small convenience but there is
> nothing you cannot do without it client side, and that actually basically
> stands for almost any use of (non aggregate) function in Cassandra
> currently.
>
>
>>
>>
>> "Food for thought: Hive's UDFs introduced an annotation
>> @UDFType(deterministic = false)
>>
>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
>> -and-reduce-side-in-hive/
>>
>> The effect is the query planner can see when such a UDF is in use and
>> determine the value once at the start of a very long query."
>>
>> Essentially hive had a similar if not identical problem, during a long
>> running distributed process like map/reduce some users wanted the semantics
>> of:
>>
>> 1) Each call should have a new timestamps
>>
>> While other users wanted the semantics of:
>>
>> 2) Each call should generate the same timestamp
>>
>> The solution implemented was to add an annotation to udf such that the
>> query planner would pick up the annotation and act accordingly.
>>
>> (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986
>>
>> As a result you can essentially implement two UDFS
>>
>> @UDFType(deterministic = false)
>> public class UDFNow
>>
>> and for the other people
>>
>> @UDFType(deterministic = true)
>> public class UDFNowOnce extends UDFNow
>>
>> Both user cases are met in a sensible way.
>>
>
>
I agree that changing the semantics of something already in existence is a
bad idea. What is there "now" no pun on works should stay working as is.

I will also point out that presto addresses this issue with specific
functions:

https://prestodb.io/docs/current/functions/datetime.html

localtime -> time

Returns the current time as of the start of the query.
localtimestamp -> timestamp

Returns the current timestamp as of the start of the query.

Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Edward Capriolo

On Thu, Dec 1, 2016 at 4:06 AM, Sylvain Lebresne 
wrote:

> One can of course always open a JIRA, but I'm going to strongly disagree
> with a
> change here (outside of a documentation one that is).
>
> The now() function is a timeuuid generator, and it thus generates a unique
> timeuuid on every call, as specified by the timeuuid spec. I'll note that
> document lists it under "Timeuuid functions", and has sentences like
> "the value returned by now() is guaranteed to be unique", so while I'm
> sure the
> documentation can be further clarified, I think it's pretty clear it's not
> the
> now() of SQL, and getting unique values on every call shouldn't be *that*
> surprising.
>
> Also, now() was primarily meant for use on timeuuid clustering columns for
> a
> time-series like table, something like:
>   CREATE TABLE ts (
> k int,
> t timeuuid,
> v text,
> PRIMARY KEY (k, t)
>   )
> and if you use it multiple times in a batch, this would look something
> like:
>   BEGIN BATCH
> INSERT INTO ts (k, t, v) VALUES (0, now(), 'foo');
> INSERT INTO ts (k, t, v) VALUES (0, now(), 'bar');
>   APPLY BATCH
> and you definitively want that to insert 2 "events", not just one.
>
> This is also why changing the behavior of this method *would* be a breaking
> change.
>
> Another reason this work the way it is is that functions in CQL are just
> that,
> functions. Each execution is unique and they have no notion of being
> executed in
> the same statement/batch/whatever. I actually think this is sensible,
> assuming
> one stops being obsessed with what other databases that aren't Apache
> Cassandra
> do.
>
> I will note that Ben seems to suggest keeping the return of now() unique
> across
> call while keeping the time component equals, thus varying the rest of the
> uuid
> bytes. However:
>  - I'm starting to wonder what this would buy us. Why would someone be
> super
>confused by the time changing across calls (in a single
> statement/batch), but
>be totally not confused by the actual full return to not be equal? And
> how is
>that actually useful: you're having different result anyway and you're
>letting the server pick the timestamp in the first place, so you're
> probably
>not caring about milliseconds precision of that timestamp in the first
> place.
>  - This would basically be a violation of the timeuuid spec
>  - This would be a big pain in the code and make of now() a special case
> among functions. I'm unconvinced special cases are making things easier
> in general.
>
> So I'm all for improving the documentation if this confuses users due to
> expectations (mistakenly) carried from prior experiences, and please
> feel free to open a JIRA for that. I'm a lot less in agreement that there
> is
> something wrong with the way the function behave in principle.
>
> > I can see why this issue has been largely ignored and hasn't had a
> chance for
> > the behaviour to be formally defined
>
> Don't make too much assumptions. The behavior is perfectly well defined:
> now()
> is a "normal" function and is evaluated whenever it's called according to
> the
> timeuuid spec (or as close to it as we can make it).
>
> On Thu, Dec 1, 2016 at 7:25 AM, Benjamin Roth 
> wrote:
>
>> Great comment. +1
>>
>> Am 01.12.2016 06:29 schrieb "Ben Bromhead" :
>>
>>> tl;dr +1 yup raise a jira to discuss how now() should behave in a single
>>> statement (and possible extend to batch statements).
>>>
>>> The values of now should be the same if you assume that now() works like
>>> it does in relational databases such as postgres or mysql, however at the
>>> moment it instead works like sysdate() in mysql. Given that CQL is supposed
>>> to be SQL like, I think the assumption around the behaviour of now() was a
>>> fair one to make.
>>>
>>> I definitely agree that raising a jira ticket would be a great place to
>>> discuss what the behaviour of now() should be for Cassandra. Personally I
>>> would be in favour of seeing the deterministic component (the actual time
>>> part) being the same across multiple calls in the one statement or multiple
>>> statements in a batch.
>>>
>>> Cassandra documentation does not make any claims as to how now() works
>>> within a single statement and reading the code it shows the intent is to
>>> work like sysdate() from MySQL rather than now(). One of the identified
>>> dangers of making cql similar to sql is that, while yes it aids adoption,
>>> users will find that SQL like things don't behave as expected. Of course as
>>> a user, one shouldn't have to read the source code to determine correct
>>> behaviour.
>>>
>>> Given that a timeuuid is made up of deterministic and (pseudo)
>>> non-deterministic components I can see why this issue has been largely
>>> ignored and hasn't had a chance for the behaviour to be formally defined
>>> (you would expect now to return the same time in the one statement despite

Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Edward Capriolo

On Wed, Nov 30, 2016 at 10:53 PM, Cody Yancey  wrote:

> This is not a bug, and in fact changing it would be a serious bug.
>
> False. Absolutely no consumer would be broken by a change to guarantee an
> identical time component that isn't broken already, for the simple reason
> your code already has to handle that case, as it is in fact the majority
> case RIGHT NOW. Users can hit this bug, in production, because unit tests
> might not experienced it! The time component should be the time that the
> command was processed by the coordinator node.
>
>  would one expect a java/py/bash script that loops
>
> Individual Cassandra writes (which is what OP is referring to
> specifically) are not loops. They are in almost every case atomic
> operations that either succeed completely or fail completely. Allowing a
> single atomic operation to witness multiple times in these corner cases is
> not only surprising, as this thread demonstrates, it is also needlessly
> restricting to what developers can use the database for, and provides NO
> BENEFIT.
>
> Calling now PRIOR to initiating multiple inserts is in most cases
> exactly what one does...the ONLY practice is to set the value before
> initiating the sequence of calls
>
> Also false. Cassandra does not have a way of doing this on the coordinator
> node rather than the client device, and as I already showed, the client
> device is the wrong place to do it in situations where guaranteeing bounded
> clock-skew actually makes a difference one way or the other.
>
> Thanks,
> Cody
>
>
>
> On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle 
> wrote:
>
>> This is not a bug, and in fact changing it would be a serious bug.
>>
>> What it is is a wonderful case of bad coding: would one expect a
>> java/py/bash script that loops on a bunch of read/execut/update calls where
>> each iteration calls time to return the same exact time for the duration of
>> the execution of the code? Whether the code runs for 5 seconds or 5 hours?
>>
>> Every call to a system call is unique, including within C*. Calling now
>> PRIOR to initiating multiple inserts is in most cases exactly what one does
>> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
>> identical system time as would be the uuid of the row, one tries to call
>> time as close to just before the insert as possible. Then repeat.
>>
>> You have a logic issue in your code. If you want the same value for a set
>> of calls, the ONLY practice is to set the value before initiating the
>> sequence of calls.
>>
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey  wrote:
>>
>>> Getting the same TimeUUID values might be a major problem. Getting two
>>> different TimeUUIDs that at least have time component would not be a major
>>> problem as this is the main case today. Getting different time components
>>> is actually the corner case, and it is a corner case that breaks
>>> Internet-of-Things applications. We can tightly control clock skew in our
>>> cluster. We most definitely CANNOT control clock skew on the thousands of
>>> sensors that write to our cluster.
>>>
>>> Thanks,
>>> Cody
>>>
>>> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille  wrote:
>>>
 In my opinion, this is not broken and “fixing” it would break existing
 code. Consider a batch that includes multiple inserts, each of which
 inserts the value returned by now(). Getting the same UUID for each insert
 would be a major problem.

 Cheers

 Robert


 On Nov 30, 2016, at 4:46 PM, Todd Fast 
 wrote:

 FWIW I'd suggest opening a bug--this behavior is certainly quite
 unexpected and more than just a documentation issue. In general I can't
 imagine any desirable properties of the current implementation, and there
 are likely a bunch of latent bugs sitting out there, so it should be fixed.

 Todd

 On Wed, Nov 30, 2016 at 12:37 PM Terry Liu  wrote:

> Sorry for my typo. Obviously, I meant:
> "It appears that a single query that calls Cassandra's`now()` time
> function *multiple times *may actually cause a query to write or
> return different times."
>
> Less of a surprise now that I realize more about the implementation,
> but I agree that more explicit documentation around when exactly the
> "execution" of each now() statement happens and what implications it has
> for the resulting timestamps would be helpful when running into this.
>
> Thanks for the quick responses!
>
> -Terry
>
>
>
> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek 
> wrote:
>
> every now() call in statement is under the hood

Re: Schema Changes

2016-11-15 Thread Edward Capriolo

You can start here:

https://issues.apache.org/jira/browse/CASSANDRA-10699

And here:

http://stackoverflow.com/questions/20293897/cassandra-resolution-of-concurrent-schema-changes

In a nutshell, schema changes works best when issued serially, when all
nodes are up, and reachable. When these 3 conditions are not met a variety
of behavior can be observed.

On Tue, Nov 15, 2016 at 1:04 PM, Josh Smith 
wrote:

> Would someone please explain how schema changes happen?
>
> Here are some of the ring details
>
> We have 5 nodes in 1 DC and 5 nodes in another DC across the country.
>
> Here is our problem, we have a tool which automates our schema creation.
> Our schema consists of 7 keyspaces with 21 tables in each keyspace, so a
> total of 147 tables are created at the initial provisioning.  During this
> schema creation we end up with system_schema keyspace corruption, we have
> found that it is due to schema version disagreement. To combat this we
> setup a wait until there is only one version in both system.local and
> system.peers tables.
>
> The way I understand it schema changes are made on the local node only;
> changes are then propagated through either Thrift or Gossip, I could not
> find a definitive answer online if thrift or gossip was the carrier. So if
> I make all of the schema changes to one node it should propagate the
> changes to the other nodes one at a time. This is how I used to think that
> schema changes are propagated but we still get schema disagreement when
> changing the schema only on one node. Is the only option to introduce a
> wait after every table creation?  Should we be looking at another table
> besides system.local and peers? Any help would be appreciated.
>
>
>
> Josh Smith
>

Re: Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Edward Capriolo

Here is a solution that I have leverage. Ignore the count of the value and
use a multi-part column name as it's value.

For example:

create column family stuff (
rowkey string,
column string,
value string.
counter_to_ignore long,
primary key( rowkey, column, value));



On Tue, Nov 1, 2016 at 9:29 AM, Ali Akhtar  wrote:

> That's a terrible gotcha rule.
>
> On Tue, Nov 1, 2016 at 6:27 PM, Cody Yancey  wrote:
>
>> In your table schema, you have KEYS and you have VALUES. Your KEYS are
>> text, but they could be any non-counter type or compound thereof. KEYS
>> obviously cannot ever be counters.
>>
>> Your VALUES, however, must be either all counters or all non-counters.
>> The official example you posted conforms to this limitation.
>>
>> Thanks,
>> Cody
>>
>> On Nov 1, 2016 7:16 AM, "Ali Akhtar"  wrote:
>>
>>> I'm not referring to the primary key, just to other columns.
>>>
>>> My primary key is a text, and my table contains a mix of texts, ints,
>>> and timestamps.
>>>
>>> If I try to change one of the ints to a counter and run the create table
>>> query, I get the error ' Cannot mix counter and non counter columns in
>>> the same table'
>>>
>>>
>>> On Tue, Nov 1, 2016 at 6:11 PM, Cody Yancey  wrote:
>>>
 For counter tables, non-counter types are of course allowed in the
 primary key. Counters would be meaningless otherwise.

 Thanks,
 Cody

 On Nov 1, 2016 7:00 AM, "Ali Akhtar"  wrote:

> In the documentation for counters:
>
> https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
>
> The example table is created via:
>
> CREATE TABLE counterks.page_view_counts
>   (counter_value counter,
>   url_name varchar,
>   page_name varchar,
>   PRIMARY KEY (url_name, page_name)
> );
>
> Yet if I try to create a table with a mixture of texts, ints,
> timestamps, and counters, i get the error ' Cannot mix counter and non
> counter columns in the same table'
>
> Is that supposed to be allowed or not allowed, given that the official
> example contains a mix of counters and non-counters?
>

>>>
>

Re: how to get the size of the particular partition key belonging to an sstable ??

2016-10-28 Thread Edward Capriolo

There are actually multiple tickets for different size functions. Examples
include computing size of collections, number of rows, and physical sizes
server side.

I also have a patch to make the warn and info settable at runtime.

https://issues.apache.org/jira/browse/CASSANDRA-12661?filter=-1

It is an ask that has come up multiple times now.

On Fri, Oct 28, 2016 at 3:40 PM, Jeff Jirsa 
wrote:

> See also  https://issues.apache.org/jira/browse/CASSANDRA-12367
>
>
>
>
>
>
>
> *From: *Justin Cameron 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Friday, October 28, 2016 at 12:35 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: how to get the size of the particular partition key
> belonging to an sstable ??
>
>
>
>
>
> nodetool cfhistograms / nodetool tablehistograms will also output
> partition size statistics for a given table: http://docs.datastax.com/en/
> cassandra/3.0/cassandra/tools/toolsTablehisto.html
> 
>
>
>
>
>
> On Fri, 28 Oct 2016 at 12:32 Justin Cameron 
> wrote:
>
> If you're trying to determine this in order to diagnose wide row issues,
> then you can check your logs - Cassandra will log a warning for partitions
> > 100MB during compaction.
>
>
>
> See https://docs.datastax.com/en/cassandra/3.x/cassandra/
> configuration/configCassandra_yaml.html#configCassandra_
> yaml__compaction_large_partition_warning_threshold_mb
> 
>
>
>
>
>
> On Fri, 28 Oct 2016 at 01:49 Oleg Krayushkin  wrote:
>
> Hi,
>
>
>
> I guess it's about getting particular Partition Size on disk. If so, I
> would like to know this too.
>
>
>
> 2016-10-28 9:09 GMT+03:00 Vladimir Yudovin :
>
> Hi,
>
>
>
> >size of a particular partition key
>
> Can you please elucidate this? Key can be just number, or string, or
> several values.
>
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone
> 
> - Hosted Cloud CassandraLaunch your cluster in minutes.*
>
>
>
>
>
>  On Thu, 27 Oct 2016 11:45:47 -0400*Pranay akula
> >* wrote 
>
>
>
> how can i get the size of a particular partition key belonging to an
> sstable ?? can we find it using index or summary or Statistics.db files ??
> does reading the hexdump of these files help ??
>
>
>
>
>
>
>
> Thanks
>
> Pranay.
>
>
>
>
>
>
>
> --
>
>
>
> Oleg Krayushkin
>
> --
>
> *Justin Cameron*
>
> Senior Software Engineer | Instaclustr
>
>
>
>
>
>
> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
> Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
>
> --
>
> *Justin Cameron*
>
> Senior Software Engineer | Instaclustr
>
>
>
>
>
>
> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
> Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and
> may be legally privileged. If you are not the intended recipient, do not
> disclose, copy, distribute, or use this email or any attachments. If you
> have received this in error please let the sender know and then delete the
> email and all attachments.
>

Re: Tools to manage repairs

2016-10-28 Thread Edward Capriolo

On Fri, Oct 28, 2016 at 11:21 AM, Vincent Rischmann <m...@vrischmann.me>
wrote:

> Doesn't paging help with this ? Also if we select a range via the cluster
> key we're never really selecting the full partition. Or is that wrong ?
>
>
> On Fri, Oct 28, 2016, at 05:00 PM, Edward Capriolo wrote:
>
> Big partitions are an anti-pattern here is why:
>
> First Cassandra is not an analytic datastore. Sure it has some UDFs and
> aggregate UDFs, but the true purpose of the data store is to satisfy point
> reads. Operations have strict timeouts:
>
> # How long the coordinator should wait for read operations to complete
> read_request_timeout_in_ms: 5000
>
> # How long the coordinator should wait for seq or index scans to complete
> range_request_timeout_in_ms: 1
>
> This means you need to be able to satisfy the operation in 5 seconds.
> Which is not only the "think time" for 1 server, but if you are doing a
> quorum the operation has to complete and compare on 2 or more servers.
> Beyond these cutoffs are thread pools which fill up and start dropping
> requests once full.
>
> Something has to give, either functionality or physics. Particularly the
> physics of aggregating an ever-growing data set across N replicas in less
> than 5 seconds.  How many 2ms point reads will be blocked by 50 ms queries
> etc.
>
> I do not see the technical limitations of big partitions on disk is the
> only hurdle to climb here.
>
>
> On Fri, Oct 28, 2016 at 10:39 AM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
> Hi Eric,
>
> that would be https://issues.apache.org/jira/browse/CASSANDRA-9754 by
> Michael Kjellman and https://issues.apache.org/jira/browse/CASSANDRA-11206 by
> Robert Stupp.
> If you haven't seen it yet, Robert's summit talk on big partitions is
> totally worth it :
> Video : https://www.youtube.com/watch?v=N3mGxgnUiRY
> Slides : http://www.slideshare.net/DataStax/myths-of-big-partitions
> -robert-stupp-datastax-cassandra-summit-2016
>
> Cheers,
>
>
> On Fri, Oct 28, 2016 at 4:09 PM Eric Evans <john.eric.ev...@gmail.com>
> wrote:
>
> On Thu, Oct 27, 2016 at 4:13 PM, Alexander Dejanovski
> <a...@thelastpickle.com> wrote:
> > A few patches are pushing the limits of partition sizes so we may soon be
> > more comfortable with big partitions.
>
> You don't happen to have Jira links to these handy, do you?
>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>
>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
"Doesn't paging help with this ? Also if we select a range via the cluster
key we're never really selecting the full partition. Or is that wrong ?"

What I am suggestion is that the data store has had this practical
limitation on size of partition since inception. As a result the common use
case is not to use it in such a way. For example, the compaction manager
may not be optimized for this cases, queries running across large
partitions may cause more contention or lots of young gen garbage , queries
running across large partitions may occupy the slots of the read stage etc.


http://mail-archives.apache.org/mod_mbox/cassandra-user/201602.mbox/%3CCAJjpQyTS2eaCcRBVa=zmm-hcbx5nf4ovc1enw+sffgwvngo...@mail.gmail.com%3E

I think there is possibly some more "little details" to discover. Not in a
bad thing. I just do not think it you can hand-waive like a specific thing
someone is working on now or paging solves it. If it was that easy it would
be solved by now :)

Re: Tools to manage repairs

2016-10-28 Thread Edward Capriolo

Big partitions are an anti-pattern here is why:

First Cassandra is not an analytic datastore. Sure it has some UDFs and
aggregate UDFs, but the true purpose of the data store is to satisfy point
reads. Operations have strict timeouts:

# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 5000

# How long the coordinator should wait for seq or index scans to complete
range_request_timeout_in_ms: 1

This means you need to be able to satisfy the operation in 5 seconds. Which
is not only the "think time" for 1 server, but if you are doing a quorum
the operation has to complete and compare on 2 or more servers. Beyond
these cutoffs are thread pools which fill up and start dropping requests
once full.

Something has to give, either functionality or physics. Particularly the
physics of aggregating an ever-growing data set across N replicas in less
than 5 seconds.  How many 2ms point reads will be blocked by 50 ms queries
etc.

I do not see the technical limitations of big partitions on disk is the
only hurdle to climb here.

On Fri, Oct 28, 2016 at 10:39 AM, Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> Hi Eric,
>
> that would be https://issues.apache.org/jira/browse/CASSANDRA-9754 by
> Michael Kjellman and https://issues.apache.org/jira/browse/CASSANDRA-11206 by
> Robert Stupp.
> If you haven't seen it yet, Robert's summit talk on big partitions is
> totally worth it :
> Video : https://www.youtube.com/watch?v=N3mGxgnUiRY
> Slides : http://www.slideshare.net/DataStax/myths-of-big-
> partitions-robert-stupp-datastax-cassandra-summit-2016
>
> Cheers,
>
>
> On Fri, Oct 28, 2016 at 4:09 PM Eric Evans 
> wrote:
>
>> On Thu, Oct 27, 2016 at 4:13 PM, Alexander Dejanovski
>>  wrote:
>> > A few patches are pushing the limits of partition sizes so we may soon
>> be
>> > more comfortable with big partitions.
>>
>> You don't happen to have Jira links to these handy, do you?
>>
>>
>> --
>> Eric Evans
>> john.eric.ev...@gmail.com
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: Cassandra failure during read query at consistency QUORUM (2 responses were required but only 0 replica responded, 2 failed)

2016-10-28 Thread Edward Capriolo

This looks like another case of an assert bubbling through try catch that
don't catch assert

On Fri, Oct 28, 2016 at 6:30 AM, Denis Mikhaylov  wrote:

> Hi!
>
> We’re running Cassandra 3.9
>
> On the application side I see failed reads with this exception
> com.datastax.driver.core.exceptions.ReadFailureException: Cassandra
> failure during read query at consistency QUORUM (2 responses were required
> but only 0 replica responded, 2 failed)
>
> On the server side we see:
>
> WARN  [SharedPool-Worker-3] 2016-10-28 13:28:22,965
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread
> Thread[SharedPool-Worker-3,5,
> main]: {}
> java.lang.AssertionError: null
> at org.apache.cassandra.db.rows.BTreeRow.getCell(BTreeRow.java:212)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.db.SinglePartitionReadCommand.
> canRemoveRow(SinglePartitionReadCommand.java:899)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.db.SinglePartitionReadCommand.
> reduceFilter(SinglePartitionReadCommand.java:863)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.db.SinglePartitionReadCommand.
> queryMemtableAndSSTablesInTimestampOrder(SinglePartitionReadCommand.java:748)
> ~[apache-cassan
> dra-3.7.jar:3.7]
> at org.apache.cassandra.db.SinglePartitionReadCommand.
> queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:519)
> ~[apache-cassandra-3.7.jar:
> 3.7]
> at org.apache.cassandra.db.SinglePartitionReadCommand.
> queryMemtableAndDisk(SinglePartitionReadCommand.java:496)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.db.SinglePartitionReadCommand.
> queryStorage(SinglePartitionReadCommand.java:358)
> ~[apache-cassandra-3.7.jar:3.7]
> at 
> org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:366)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(
> ReadCommandVerbHandler.java:48) ~[apache-cassandra-3.7.jar:3.7]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
> ~[apache-cassandra-3.7.jar:3.7]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_102]
> at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
> ice$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> ~[apache-cassandra-
> 3.7.jar:3.7]
> at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
> ice$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
> [apache
> -cassandra-3.7.jar:3.7]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-3.7.jar:3.7]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
>
> It’s only affect single table. Sadly both on test (3.9) and production
> (3.7) deployments of cassandra.
>
> What could be the problem? Please help.

Re: How does the "batch" commit log sync works

2016-10-27 Thread Edward Capriolo

I mentioned during my Cassandra.yaml presentation at the summit that I
never saw anyone use these settings. Things off by default are typically
not highly not covered well by tests. It sounds like it is not working.
Quick suggestion: go back in time maybe to a version like 1.2.X or 0.7 and
see if it behaves like the yaml suggests it should.

On Thu, Oct 27, 2016 at 11:48 PM, Hiroyuki Yamada 
wrote:

> Hello Satoshi and the community,
>
> I am also using commitlog_sync for durability, but I have never
> modified commitlog_sync_batch_window_in_ms parameter yet,
> so I wondered if it is working or not.
>
> As Satoshi said, I also changed commitlog_sync_batch_window_in_ms (to
> 1) and restarted C* and
> issued some INSERT command.
> But, it actually returned immediately right after issuing.
>
> So, it seems like the parameter is not working correctly.
> Are we missing something ?
>
> Thanks,
> Hiro
>
> On Thu, Oct 27, 2016 at 5:58 PM, Satoshi Hikida 
> wrote:
> > Hi, all.
> >
> > I have a question about "batch" commit log sync behavior with C* version
> > 2.2.8.
> >
> > Here's what I have done:
> >
> > * set commitlog_sync to the "batch" mode as follows:
> >
> >> commitlog_sync: batch
> >> commitlog_sync_batch_window_in_ms: 1
> >
> > * ran a script which inserts the data to a table
> > * prepared a disk dedicated to store the commit logs
> >
> > According to the DataStax document, I expected that fsync is done once
> in a
> > batch window (one fsync per 10sec in this case) and writes issued within
> > this batch window are blocked until fsync is completed.
> >
> > In my experiment, however, it seems that the write requests returned
> almost
> > immediately (within 300~400 ms).
> >
> > Am I misunderstanding something? If so, can someone give me any advices
> as
> > to the reason why C* behaves like this?
> >
> >
> > I referred to this document:
> > https://docs.datastax.com/en/cassandra/2.2/cassandra/
> configuration/configCassandra_yaml.html#configCassandra_
> yaml__PerformanceTuningProps
> >
> > Regards,
> > Satoshi
> >
>

Re: Handle Leap Seconds with Cassandra

2016-10-27 Thread Edward Capriolo

Following https://issues.apache.org/jira/browse/CASSANDRA-9131. It is very
interesting to track how the timestamp has moved from the user, to the
server, then back to the user quasi the driver.

Next we will be accounting for the earths slowing rotation as the ice caps
melt :)

https://www.uwgb.edu/dutchs/PSEUDOSC/IceCaps.HTM

On Thu, Oct 27, 2016 at 1:18 PM, Anuj Wadehra 
wrote:

> Hi Ben,
>
> Thanks for your reply. We dont use timestamps in primary key. We rely on
> server side timestamps generated by coordinator. So, no functions at
> client side would help.
>
> Yes, drifts can create problems too. But even if you ensure that nodes are
> perfectly synced with NTP, you will surely mess up the order of updates
> during the leap second(interleaving). Some applications update same column
> of same row quickly (within a second ) and reversing the order would
> corrupt the data.
>
> I am interested in learning how people relying on strict order of updates
> handle leap second scenario when clock goes back one second(same second is
> repeated). What kind of tricks people use  to ensure that server side
> timestamps are monotonic ?
>
> As per my understanding NTP slew mode may not be suitable for Cassandra as
> it may cause unpredictable drift amongst the Cassandra nodes. Ideas ??
>
>
> Thanks
> Anuj
>
>
>
> Sent from Yahoo Mail on Android
> 
>
> On Thu, 20 Oct, 2016 at 11:25 PM, Ben Bromhead
>  wrote:
> http://www.datastax.com/dev/blog/preparing-for-the-leap-second gives a
> pretty good overview
>
> If you are using a timestamp as part of your primary key, this is the
> situation where you could end up overwriting data. I would suggest using
> timeuuid instead which will ensure that you get different primary keys even
> for data inserted at the exact same timestamp.
>
> The blog post also suggests using certain monotonic timestamp classes in
> Java however these will not help you if you have multiple clients that may
> overwrite data.
>
> As for the interleaving or out of order problem, this is hard to address
> in Cassandra without resorting to external coordination or LWTs. If you are
> relying on a wall clock to guarantee order in a distributed system you will
> get yourself into trouble even without leap seconds (clock drift, NTP
> inaccuracy etc).
>
> On Thu, 20 Oct 2016 at 10:30 Anuj Wadehra  wrote:
>
>> Hi,
>>
>> I would like to know how you guys handle leap seconds with Cassandra.
>>
>> I am not bothered about the livelock issue as we are using appropriate
>> versions of Linux and Java. I am more interested in finding an optimum
>> answer for the following question:
>>
>> How do you handle wrong ordering of multiple writes (on same row and
>> column) during the leap second? You may overwrite the new value with old
>> one (disaster).
>>
>> And Downtime is no option :)
>>
>> I can see that CASSANDRA-9131 is still open..
>>
>> FYI..we are on 2.0.14 ..
>>
>>
>> Thanks
>> Anuj
>>
> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>

Re: Error creating pool to /IP_ADDRESS33:9042 (Proving Cassandra's NO SINGLE point of failure)

2016-10-26 Thread Edward Capriolo

I would suggest you look some existing work
http://techblog.netflix.com/2014/07/revisiting-1-million-writes-per-second.html
and attempt to re-create those scenarios and methodologies for failing
nodes and seeing the performance impact.

This would yield faster and more easily verifiable results than starting
from scratch.

On Wed, Oct 26, 2016 at 6:41 AM, Rajesh Radhakrishnan <
rajesh.radhakrish...@phe.gov.uk> wrote:

> Hi Vladimir,
>
> Thank you for the response.
>
> Yes I added all the three node IPs while connecting to the cluster via
> driver.
>
> Its not failed operation. while the script is running and it takes some
> time to read millions of data and during this time , I intentionally put
> one node down to see how the script react.
>
> But entire script stops with timeout error. Why?
>
>
> Kind regards,
> Rajesh R
>
> --
> *From:* Vladimir Yudovin [vla...@winguzone.com]
> *Sent:* 24 October 2016 17:04
> *To:* user
> *Subject:* Re: Error creating pool to /IP_ADDRESS33:9042 (Proving
> Cassandra's NO SINGLE point of failure)
>
> Probably used Python driver can't restart failed operation with connection
> to other node. Do you provide all three IPs to Python driver for connecting?
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone
> 
> - Hosted Cloud Cassandra Launch your cluster in minutes.*
>
>
>  On Mon, 24 Oct 2016 07:48:05 -0400*Rajesh Radhakrishnan
> >*
> wrote 
>
> Hi,
>
> I have  3 nodes Cassandra cluster.
> Cassandra version : dsc-cassandra-2.1.5
> Python Cassandra Driver : 2.5.1
>
> Running the nodes in Red Hat virtual machines.
>
> Node ip info:
> Node 1: IP_ADDRESS219
> Node 2: IP_ADDRESS229
> Node 3: IP_ADDRESS230
>
>
> (IP_ADDRESS219 is masked for this email which represents something similar
> 123.321.123.219)
>
>
> Cassandra.yaml configuration details of node1:
>
> listen_address: IP_ADDRESS219
> broadcast_address: commented
> rpc_address: IP_ADDRESS219
> broadcast_rpc_address : commented
>
> The IP address of the node is ( using the ifconfig command ) IP_ADDRESS219.
>
> While the cluster is up and running , when I put Node 3 (IP_ADDRESS230)
> down, I was able to connect to CQLSH from IP_ADDRESS219 and IP_ADDRESS229.
>
> But while I was running a Python script which just reads data from
> Cassandra using the cassandra-python-driver, and when the Node 3
> stops(while the script is still running I intentionally stops  node3).
>
> Then the script comes to a halt with OperationTimedOut: errors={},
> last_host= IP_ADDRESS219.
>
> However if I run the script when node3 is already down, it runs and reads
> data.
>
> So during the reading operation  if any of the nodes in the cluster goes
> down its affects the client operation??? Does anyone has similar situation.
>
> Here we are trying to establish or prove Cassandra's always on (NO single
> point of failure).  Do you know why this is happening Thank you .
>
>
>
> Kind regards,
> Rajesh R
>
>
> **
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named
> addressee(s). It may not be disclosed to any other person without the
> express authority of Public Health England, or the intended recipient, or
> both. If you are not the intended recipient, you must not disclose, copy,
> distribute or retain this message or any part of it. This footnote also
> confirms that this EMail has been swept for computer viruses by
> Symantec.Cloud, but please re-sweep any attachments before opening or
> saving. http://www.gov.uk/PHE
> 
> **
>
>
>
> **
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named
> addressee(s). It may not be disclosed to any other person without the
> express authority of Public Health England, or the intended recipient, or
> both. If you are not the intended recipient, you must not disclose, copy,
> distribute or retain this message or any part of it. This footnote also
> confirms that this EMail has been swept for computer viruses by
> Symantec.Cloud, but please re-sweep any attachments before opening or
> saving. http://www.gov.uk/PHE
> **
>

Re: Keyspace/CF creation Timeouts

2016-10-25 Thread Edward Capriolo

I do not believe the ConsistencyLevel matters for schema changes. In recent
versions request_timeout_in_ms has been replaced by N variables which allow
different timeout values for different types of operations.

You seem to have both a lot of keyspaces and column families. It seems
likely that you have a large cluster since you have mentioned multiple data
centers. Many people have similar problems namely: the operation that
changes the schema will timeout at the client level, but the cluster will
eventually carry out the change.

Many people seem to be writing their schema changing code to issue the
request, ignore the response (in the case of timeout) and then use a
command like "describe schema|cluster" to confirm the change propagation.
Generally it is viewed as a annoying problem that people deal with.

On Tue, Oct 25, 2016 at 4:41 PM, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> 1. Yes, all nodes are up and running,
> 2. We are using the Local_QUORUM.
>
> On Tue, Oct 25, 2016 at 1:28 PM, Surbhi Gupta 
> wrote:
>
>> 1. Make sure all nodes are up and running while you are trying to create
>> the Keyspaces and Column Family.
>> 2. What is the write consistency level u r using?
>>
>>
>> On 25 October 2016 at 13:18, Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> I have recently started noticing timeouts while creating KS/CF. this is
>>> happening with increase in no.of keyspaces.
>>>
>>> Does anyone have an idea what to look for? as I don't see any error or
>>> exception in the logs.
>>> or is there some kind of parameter change required?
>>>
>>> C* Version: 2.1.16
>>> Total KS: 170
>>> Total CF: 500
>>> Total DC : 8
>>>
>>> request_timeout_in_ms: 1
>>>
>>
>>
>

Re: Thousands of SSTables generated in only one node

2016-10-25 Thread Edward Capriolo

I have not read the entire thread so sorry if this is already mentioned.
You should review your logs, a potential problem could be a corrupted
sstable.

In a situation like this you will notice that the system is repeatedly
trying to compact a given sstable. The compaction fails and based on the
heuristics it may successfully compact some other files, but ultimately
each time it attempts to do a compaction involving this sstable the process
fails and the number of files keeps growing.

Good luck,
Edward

On Tue, Oct 25, 2016 at 10:31 AM, DuyHai Doan  wrote:

> what are your disk hardware specs ?
>
> On Tue, Oct 25, 2016 at 8:47 AM, Lahiru Gamathige 
> wrote:
>
>> Hi Users,
>>
>> I have a single server code deployed with multiple environments (staging,
>> dev etc) but they all use a single Cassandra cluster but keyspaces are
>> prefixed with the environment name, so each server has its own keyspace to
>> store data. I am using Cassandra 2.1.0 and using it to store timeseries
>> data.
>>
>> I see thousands of SSTables in only one node for one environment and that
>> node is running out of memory because of this (I am guessing thats the
>> cause because I see lots of logs trying to compact that data). All the
>> other nodes which use other environments too are just works fine but this
>> not with one environment keeps having this issue.
>> Given that explanation I have two main questions.
>>
>> Anyone of you had the similar issue ? If so how did you solve it.
>>
>> If I want to clean only this keyspace from the full cluster what are the
>> steps I should be doing ?
>>
>> Do you think if I shut down the cluster and delete the folder for the
>> keyspace in all the nodes and restart the cluster would do the job ? Are
>> there any other steps I need to follow ?
>> (Reason I ask is if I just truncate from cql still data will be there and
>> there's seriously something wrong in that table and I'm not sure it will
>> ever get cleaned up)
>>
>> Thanks
>> Lahiru
>>
>
>

Re: Question on write failures logs show Uncaught exception on thread Thread[MutationStage-1,5,main]

2016-10-24 Thread Edward Capriolo

The driver will enforce a max batch size of 65k.
This is an issue in versions of cassandra like 2.1.X. There are control
variables for the logged and unlogged batch size. You may also have to
tweak your commitlog size as well.

I demonstrate this here:
https://github.com/edwardcapriolo/ec/blob/master/src/test/java/Base/batch/BigBatches2_2_6_tweeked.java

Latest tick-tock version I tried worked out of the box.

The only drawback of batches is potential JVM pressure. I did some some
permutations of memory settings with the tests above. You can get a feel
for rate + batch size and the jvm pressure it causes.

On Mon, Oct 24, 2016 at 4:10 PM, George Webster  wrote:

> Hey cassandra users,
>
> When performing writes I have hit an issue where the server is unable to
> perform writes. The logs show:
>
> WARN  [MutationStage-1] 2016-10-24 22:05:52,592
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread
> Thread[MutationStage-1,5,main]: {}
> java.lang.IllegalArgumentException: Mutation of 16.011MiB is too large
> for the maximum size of 16.000MiB
> at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262)
> ~[apache-cassandra-3.9.jar:3.9]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493)
> ~[apache-cassandra-3.9.jar:3.9]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396)
> ~[apache-cassandra-3.9.jar:3.9]
> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215)
> ~[apache-cassandra-3.9.jar:3.9]
> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:220)
> ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:69)
> ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
> ~[apache-cassandra-3.9.jar:3.9]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_101]
> at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
> ice$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> ~[apache-cassandra-3.9.jar:3.9]
> at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
> ice$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
> [apache-cassandra-3.9.jar:3.9]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
> [apache-cassandra-3.9.jar:3.9]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
>
>
> Looking around on google I found this guide https://support.
> datastax.com/hc/en-us/articles/207267063-Mutation-
> of-x-bytes-is-too-large-for-the-maxiumum-size-of-y-
> that states I can increase the commitlog_segment_size_in_mb to solve the
> problem.
>
> However, I wanted to ask if their are any drawbacks to doing so.
>
> Thanks you for your guidance.
>
> Respectfully,
> George
>

Re: Inconsistencies in materialized views

2016-10-17 Thread Edward Capriolo

https://issues.apache.org/jira/browse/CASSANDRA-11198

Which has problems "maybe" fixed by:

https://issues.apache.org/jira/browse/CASSANDRA-11475

Which has it's own set of problems.

One of these patches was merged into 3.7 which tells you are running a
version 3.6 with known bugs. Also as the feature is "new ish" you should be
aware that "new ish" major features usually take 4-6 versions to solidify.



On Mon, Oct 17, 2016 at 3:19 AM, siddharth verma <
sidd.verma29.l...@gmail.com> wrote:

> Hi,
> We have a base table with ~300 million entries.
> And in a recent sanity activity, I saw approx ~33k entires(in one DC)
> which were in the materialized view, but not in the base table. (reads with
> quorum, DCAware)
> (I haven't done it the other way round yet, i.e. entries in base table but
> not in materialized view)
>
> Could someone suggest a possible cause for the same?
> We saw some glitches in cassandra cluster
> 1. node down.
> If this is the case, will repair fix the issue?
> 2. IOPS maxed out in one DC.
> 3. Another DC added with some glitches.
>
> Could someone suggest how could we replicate inconsistency between base
> table and materialized view. Any help would be appreciated.
>
> C* 3.6
> Regards
> SIddharth Verma
> (Visit https://github.com/siddv29/cfs for a high speed cassandra full
> table scan)
>

Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

2016-10-12 Thread Edward Capriolo

The "2 billion column limit" press clipping "puffery". This statement
seemingly became popular because highly traffic traffic-ed story, in which
a tech reporter embellished on a statement to make a splashy article.

The effect is something like this:
http://www.healthnewsreview.org/2012/08/iced-tea-kidney-stones-and-the-study-that-never-existed/

Iced tea does not cause kidney stones! Cassandra does not store rows with 2
billion columns! It is just not true.

On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali  wrote:

> Well 1) I have not sent it to postgresql mailing lists 2) I thought this
> is an open ended question as it can involve ideas from everywhere including
> the Cassandra java driver mailing lists so sorry If that bothered you for
> some reason.
>
> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha 
> wrote:
>
>> Also, I'm not sure, but I don't think it's "cool" to write to multiple
>> lists in the same message. (based on postgresql mailing lists rules).
>> Example I'm not subscribed to those, and now the messages are separated.
>>
>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha 
>> wrote:
>>
>>> There are some issues working on larger partitions.
>>> Hbase doesn't do what you say! You have also to be carefull on hbase not
>>> to create large rows! But since they are globally-sorted, you can easily
>>> sort between them and create small rows.
>>>
>>> In my opinion, cassandra people are wrong, in that they say "globally
>>> sorted is the devil!" while all fb/google/etc actually use globally-sorted
>>> most of the time! You have to be careful though (just like with random
>>> partition)
>>>
>>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a
>>> way.
>>> The most "recent", means there's a timestamp in there ?
>>>
>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali  wrote:
>>>
 Hi All,

 I understand Cassandra can have a maximum of 2B rows per partition but
 in practice some people seem to suggest the magic number is 100K. why not
 create another partition/rowkey automatically (whenever we reach a safe
 limit that  we consider would be efficient)  with auto increment bigint  as
 a suffix appended to the new rowkey? so that the driver can return the new
 rowkey  indicating that there is a new partition and so on...Now I
 understand this would involve allowing partial row key searches which
 currently Cassandra wouldn't do (but I believe HBASE does) and thinking
 about token ranges and potentially many other things..

 My current problem is this

 I have a row key followed by bunch of columns (this is not time series
 data)
 and these columns can grow to any number so since I have 100K limit (or
 whatever the number is. say some limit) I want to break the partition into
 level/pages

 rowkey1, page1->col1, col2, col3..
 rowkey1, page2->col1, col2, col3..

 now say my Cassandra db is populated with data and say my application
 just got booted up and I want to most recent value of a certain partition
 but I don't know which page it belongs to since my application just got
 booted up? how do I solve this in the most efficient that is possible in
 Cassandra today? I understand I can create MV, other tables that can hold
 some auxiliary data such as number of pages per partition and so on..but
 that involves the maintenance cost of that other table which I cannot
 afford really because I have MV's, secondary indexes for other good
 reasons. so it would be great if someone can explain the best way possible
 as of today with Cassandra? By best way I mean is it possible with one
 request? If Yes, then how? If not, then what is the next best way to solve
 this?

 Thanks,
 kant

>>>
>>>
>>
>

Re: Question on Read Repair

2016-10-11 Thread Edward Capriolo

This is theory but not the all practice. The failure detector heartbeats is
a process happening outside the read.

Take for example a cluster with Replication Factor 3.
At time('1) the failure detector might read three nodes as UP.
A request "soon after '1" issued at time(`2) might start a read process.
One of the three nodes may not respond within the read timeout window.Call
the end of the read timeout window time('3)
Note: Anti-entropy read-repair like Read repair is set to only happen a
fraction of requests.
Note: Anti-entropy read-repair is (async) not guaranteed not retried (might
need a fact check but fairly sure of this)
A read-repair may be issue at time('4) moments after time('3).
Those read repairs could fail or pass as well.

The long and short is well the day may be repaired after a read of ALL.
There is no guarantee that it will be.

.

On Tue, Oct 11, 2016 at 1:29 PM, Jeff Jirsa 
wrote:

> If the failuredetector knows that the node is down, it won’t attempt a
> read, because the consistency level can’t be satisfied – none of the other
> replicas will be repaired.
>
>
>
>
>
> *From: *Anubhav Kale 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Tuesday, October 11, 2016 at 10:24 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Question on Read Repair
>
>
>
> Hello,
>
>
>
> This is more of a theory / concept question. I set CL=ALL and do a read.
> Say one replica was down, will the rest of the replicas get repaired as
> part of this ? (I am hoping the answer is yes).
>
>
>
> Thanks !
> 
> CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and
> may be legally privileged. If you are not the intended recipient, do not
> disclose, copy, distribute, or use this email or any attachments. If you
> have received this in error please let the sender know and then delete the
> email and all attachments.
>

Re: Running Cassandra in Integration Tests

2016-10-06 Thread Edward Capriolo

Checkout https://github.com/edwardcapriolo/farsandra. It falls under the
realm of almost 100% pure java (besides the fact it uses some shell to
launch Cassandra).

On Thu, Oct 6, 2016 at 7:08 PM, Ali Akhtar  wrote:

> Is it possible to create an isolated cassandra instance which is run
> during integration tests and it disappears after tests have finished
> running? Then its recreated the next time tests run (perhaps being
> populated with test data).
>
>  I'm using Java.
>
>
>

Re: Row cache not working

2016-10-03 Thread Edward Capriolo

Since the feature is off by default. The coverage might could be only as
deep as the specific tests that test it.

On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa 
wrote:

> Seems like it’s probably worth opening a jira issue to track it (either to
> confirm it’s a bug, or to be able to better explain if/that it’s working as
> intended – the row cache is probably missing because trace indicates the
> read isn’t cacheable, but I suspect it should be cacheable).
>
>
>
>
>
>
> Do note, though, that setting rows_per_partition to ALL can be very very
> very dangerous if you have very wide rows in any of your tables with row
> cache enabled.
>
>
>
>
>
>
>
> *From: *Abhinav Solan 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, October 3, 2016 at 1:38 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: Row cache not working
>
>
>
> It's cassandra 3.0.7,
>
> I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then
> only it works don't know why.
>
> If I set 'rows_per_partition':'1' then it does not work.
>
>
>
> Also wanted to ask one thing, if I set row_cache_save_period: 60 then this
> cache would be refreshed automatically or it would be lazy, whenever the
> fetch call is made then only it caches it.
>
>
>
> On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa 
> wrote:
>
> Which version of Cassandra are you running (I can tell it’s newer than
> 2.1, but exact version would be useful)?
>
>
>
> *From: *Abhinav Solan 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, October 3, 2016 at 11:35 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: Row cache not working
>
>
>
> Hi, can anyone please help me with this
>
>
>
> Thanks,
>
> Abhinav
>
>
>
> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan 
> wrote:
>
> Hi Everyone,
>
>
>
> My table looks like this -
>
> CREATE TABLE test.reads (
>
> svc_pt_id bigint,
>
> meas_type_id bigint,
>
> flags bigint,
>
> read_time timestamp,
>
> value double,
>
> PRIMARY KEY ((svc_pt_id, meas_type_id))
>
> ) WITH bloom_filter_fp_chance = 0.1
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>
> AND comment = ''
>
> AND compaction = {'class': 'org.apache.cassandra.db.compaction.
> LeveledCompactionStrategy'}
>
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99PERCENTILE';
>
>
>
> Have set up the C* nodes with
>
> row_cache_size_in_mb: 1024
>
> row_cache_save_period: 14400
>
>
>
> and I am making this query
>
> select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146;
>
>
>
> with tracing on every time it says Row cache miss
>
>
>
> activity
>
>| timestamp  | source  | source_elapsed
>
> 
> 
> ---+
> +-+
>
>
>
> Execute CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>  0
>
>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>111
>
>
>Preparing statement
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>209
>
>
> reading data from /192.168.170.186
> 
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>370
>
>
>Sending READ message to /192.168.170.186
> 
> [MessagingService-Outgoing-/192.168.170.186
>

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Edward Capriolo

I undertook a similar effort a while ago.

https://issues.apache.org/jira/browse/CASSANDRA-7014

Other than the fact that it was closed with no comments, I can tell you
that other efforts I had to embed things in Cassandra did not go
swimmingly. Although at the time ideas were rejected like groovy udfs

On Mon, Oct 3, 2016 at 4:22 PM, Bhuvan Rawal  wrote:

> Hi Jonathan,
>
> If full scan is a regular requirement then setting up a spark cluster in
> locality with Cassandra nodes makes perfect sense. But supposing that it is
> a one off requirement, say a weekly or a fortnightly task, a spark cluster
> could be an added overhead with additional capacity, resource planning as
> far as operations / maintenance is concerned.
>
> So this could be thought a simple substitute for a single threaded scan
> without additional efforts to setup and maintain another technology.
>
> Regards,
> Bhuvan
>
> On Tue, Oct 4, 2016 at 1:37 AM, siddharth verma <
> sidd.verma29.l...@gmail.com> wrote:
>
>> Hi Jon,
>> It wan't allowed.
>> Moreover, if someone who isn't familiar with spark, and might be new to
>> map filter reduce etc. operations, could also use the utility for some
>> simple operations assuming a sequential scan of the cassandra table.
>>
>> Regards
>> Siddharth Verma
>>
>> On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad 
>> wrote:
>>
>>> Couldn't set up as couldn't get it working, or its not allowed?
>>>
>>> On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma <
>>> verma.siddha...@snapdeal.com> wrote:
>>>
 Hi Jon,
 We couldn't setup a spark cluster.

 For some use case, a spark cluster was required, but for some reason we
 couldn't create spark cluster. Hence, one may use this utility to iterate
 through the entire table at very high speed.

 Had to find a work around, that would be faster than paging on result
 set.

 Regards

 Siddharth Verma
 *Software Engineer I - CaMS*
 *M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697
 CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
 Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
 Download Our App
 [image: A]

  [image:
 A]

  [image:
 W]

 On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad 
 wrote:

 It almost sounds like you're duplicating all the work of both spark and
 the connector. May I ask why you decided to not use the existing tools?

 On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <
 sidd.verma29.l...@gmail.com> wrote:

 Hi DuyHai,
 Thanks for your reply.
 A few more features planned in the next one(if there is one) like,
 custom policy keeping in mind the replication of token range on
 specific nodes,
 fine graining the token range(for more speedup),
 and a few more.

 I think, as fine graining a token range,
 If one token range is split further in say, 2-3 parts, divided among
 threads, this would exploit the possible parallelism on a large scaled out
 cluster.

 And, as you mentioned the JIRA, streaming of request, that would of
 huge help with further splitting the range.

 Thanks once again for your valuable comments. :-)

 Regards,
 Siddharth Verma

>>
>

Re: Row cache not working

2016-10-03 Thread Edward Capriolo

I was thinking about this issue. I was wondering on the dev side if it
would make sense to make a utility for the unit tests that could enable
tracing and then assert that a number of steps in the trace happened.

Something like:

setup()
runQuery("SELECT * FROM X")
Assertion.assertTrace("Preparing statement").then("Row cache
hit").then("Request complete");

This would be a pretty awesome way to verify things without mock/mockito.



On Mon, Oct 3, 2016 at 2:35 PM, Abhinav Solan 
wrote:

> Hi, can anyone please help me with this
>
> Thanks,
> Abhinav
>
> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan 
> wrote:
>
>> Hi Everyone,
>>
>> My table looks like this -
>> CREATE TABLE test.reads (
>> svc_pt_id bigint,
>> meas_type_id bigint,
>> flags bigint,
>> read_time timestamp,
>> value double,
>> PRIMARY KEY ((svc_pt_id, meas_type_id))
>> ) WITH bloom_filter_fp_chance = 0.1
>> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>> AND comment = ''
>> AND compaction = {'class': 'org.apache.cassandra.db.compaction.
>> LeveledCompactionStrategy'}
>> AND compression = {'chunk_length_in_kb': '64', 'class': '
>> org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>>
>> Have set up the C* nodes with
>> row_cache_size_in_mb: 1024
>> row_cache_save_period: 14400
>>
>> and I am making this query
>> select svc_pt_id, meas_type_id, read_time, value FROM
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
>> 146;
>>
>> with tracing on every time it says Row cache miss
>>
>> activity
>>
>>  | timestamp  | source  | source_elapsed
>> 
>> 
>> ---+
>> +-+
>>
>>
>> Execute CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>>  0
>>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
>> 146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>>111
>>
>>Preparing statement
>> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>>209
>>
>> reading data from /192.168.170.186
>> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>>370
>>
>>Sending READ message to /192.168.170.186 [MessagingService-Outgoing-/
>> 192.168.170.186] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>>  450
>>
>> REQUEST_RESPONSE message received from /192.168.170.186
>> [MessagingService-Incoming-/192.168.170.186] | 2016-09-30
>> 18:15:00.448000 |  192.168.199.75 |   2469
>>
>>  Processing response from /192.168.170.186
>> [SharedPool-Worker-8] | 2016-09-30 18:15:00.448000 |  192.168.199.75 |
>>   2609
>>
>>   READ message received from /192.168.199.75 [MessagingService-Incoming-/
>> 192.168.199.75] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>> 75
>>
>> Row cache miss
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>>218
>>   Fetching data but not
>> populating cache as query does not query from the start of the partition
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>>246
>>
>> Executing single-partition query on cts_svc_pt_latest_int_read
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>>259
>>
>>   Acquiring sstable references
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>281
>>
>>  Merging memtable contents
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>295
>>
>>Merging data from sstable 8
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>326
>>
>>Key cache hit for sstable 8
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>351
>>
>>Merging data from sstable 7
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>439
>>
>>Key

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo

You know what don't "go low" and suggest the recent un-subscriber on me.

If your so eager to deal with my pull request please review this one:
I would rather you review this pull request:
https://issues.apache.org/jira/browse/CASSANDRA-10825





On Mon, Oct 3, 2016 at 1:04 PM, Benedict Elliott Smith <bened...@apache.org>
wrote:

> Nobody is disputing that the docs can and should be improved to avoid this
> misreading.  I've invited Ed to file a JIRA and/or pull request twice now.
>
> You are of course just as welcome to do this.  Perhaps you will actually
> do it, so we can all move on with our lives!
>
>
>
>
> On 3 October 2016 at 17:45, Peter Lin <wool...@gmail.com> wrote:
>
>> I've met clients that read the cassandra docs and then said in a big
>> meeting "it's just like relational database, it has tables just like
>> sqlserver/oracle."
>>
>> I'm not putting words in other people's mouth either, but I've heard that
>> said enough times to want to puke. Does the docs claim cassandra is
>> relational ? it absolutely doesn't make that claim, but the docs play
>> loosey goosey with terminology. End result is it confuses new users that
>> aren't experts, or technology managers that try to make a case for
>> cassandra.
>>
>> we can make all the excuses we want, but that doesn't change the fact the
>> docs aren't user friendly. writing great documentation is tough and most
>> developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
>> writing user friendly documentation". As many people have pointed out, it's
>> not unique to Cassandra. 80% of the tech docs out there suck, starting with
>> IBM at the top.
>>
>> Saying the docs suck isn't an indictment of anyone, it's just the reality
>> of writing good documentation.
>>
>> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>>> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
>>> coming up.
>>> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <edlinuxg...@gmail.com>
>>> wrote:
>>>
>>>> My original point can be summed up as:
>>>>
>>>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>>>> "like" and "close relative".
>>>>
>>>> For the specifics:
>>>>
>>>>
>>>> Any relational db could (and I'm sure one does!) allow for sparse
>>>> fields as well. MySQL can be backed by rocksdb now, does that make it not a
>>>> row store?
>>>>
>>>>
>>>> Lets draw some lines, a relational database is clearly defined.
>>>>
>>>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>>>
>>>> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a
>>>> result proven in his seminal work on the relational model, equates the
>>>> expressive power of relational algebra
>>>> <https://en.wikipedia.org/wiki/Relational_algebra> and relational
>>>> calculus <https://en.wikipedia.org/wiki/Relational_calculus> (both of
>>>> which, lacking recursion, are strictly less powerful thanfirst-order
>>>> logic <https://en.wikipedia.org/wiki/First-order_logic>).[*citation
>>>> needed <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>>>>
>>>> As the relational model started to become fashionable in the early
>>>> 1980s, Codd fought a sometimes bitter campaign to prevent the term being
>>>> misused by database vendors who had merely added a relational veneer to
>>>> older technology. As part of this campaign, he published his 12 rules
>>>> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
>>>> constituted a relational database. This made his position in IBM
>>>> increasingly difficult, so he left to form his own consulting company with
>>>> Chris Date and others.
>>>>
>>>> Cassandra is not a relational database.
>>>>
>>>> I am have attempted to illustrate that a "row store" is defined as
>>>> well. I do not believe Cassandra is a "row store".
>>>>
>>>>
>>>>
>>>> "Just because it uses log structured storage, sparse fields, and
>>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>>> store""
>>>>
>>>> What is the definition of "row

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo

My original point can be summed up as:

Do not define cassandra in terms SMILES & METAPHORS. Such words include
"like" and "close relative".

For the specifics:

Any relational db could (and I'm sure one does!) allow for sparse fields as
well. MySQL can be backed by rocksdb now, does that make it not a row store?

Lets draw some lines, a relational database is clearly defined.

https://en.wikipedia.org/wiki/Edgar_F._Codd

Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a result
proven in his seminal work on the relational model, equates the expressive
power of relational algebra
<https://en.wikipedia.org/wiki/Relational_algebra> and relational calculus
<https://en.wikipedia.org/wiki/Relational_calculus> (both of which, lacking
recursion, are strictly less powerful thanfirst-order logic
<https://en.wikipedia.org/wiki/First-order_logic>).[*citation needed
<https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]

As the relational model started to become fashionable in the early 1980s,
Codd fought a sometimes bitter campaign to prevent the term being misused
by database vendors who had merely added a relational veneer to older
technology. As part of this campaign, he published his 12 rules
<https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
constituted a relational database. This made his position in IBM
increasingly difficult, so he left to form his own consulting company with
Chris Date and others.

Cassandra is not a relational database.

I am have attempted to illustrate that a "row store" is defined as well. I
do not believe Cassandra is a "row store".

"Just because it uses log structured storage, sparse fields, and
semi-flexible collections doesn't disqualify it from calling it a "row
store""

What is the definition of "row store". Is it a logical construct or a
physical one?

Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
present it as rows and columns. It seems to pass the litmus test being
presented.

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage

On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,4;
>> 002:12,Jones,Mary,5;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,4;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* 
>> store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad <j...@jonhaddad.com> wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> O

Re: Cassandra data model right definition

2016-10-01 Thread Edward Capriolo

https://github.com/apache/cassandra

Row store <http://wiki.apache.org/cassandra/DataModel> means that like
relational databases, Cassandra organizes data by rows and columns. The
Cassandra Query Language (CQL) is a close relative of SQL.

I generally do not know what to say about these high level
"oversimplifications" like "firewalls block hackers". Are there "firewalls"
or do they mean IP routers with layer 4 packet inspections and layer 3
Access Control Lists?

We say (and I catch myself doing it all the time) "like relational
databases" often as if all relational databases work alike. A columnar
store like HP Vertica is a relational database.MySql has different storage
engines does MyIsam work like InnoDB?

Google docs organizes data by rows and columns as well. You can wrap any
storage system into an API that makes them look like rows and columns.
Microsoft LINQ can enumerate your network cars and query them
https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really does
not make your network cards a "row store"

"Theoretically a row can have 2 billion columns, but in practice it
shouldn't have more than 100 million columns."
In practice (In my experience) the number is much lower than 100 million,
and if the data actually is deleted and readded frequently the number of
live columns(rows, whatever) you can use happily is even lower

I believe on twitter (I am unable to find the tweet) someone was trying to
convince me Cassandra was a "columnar analytic database".  ROFL

I believe telling someone it "row store" "like a database", is not a good
idea. They might away content with that explanation. You are setting them
up to walk into an anti-pattern. Like a case where the user is attempting
to write and deleting 1 row and 1 column 6 billion times a day. Then you
end up explaining to them
http://stackoverflow.com/questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached

and how the cassandra storage model is not "like a relational database".

On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> I can iterate over JSON data stored in mongo and present it as a table
> with rows and columns. It does not make mongo a rowstore.
>
> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>> The problem with calling it a row store:
>>
>> https://en.wikipedia.org/wiki/Row_(database)
>>
>> In the context of a relational database
>> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also called
>> a record <https://en.wikipedia.org/wiki/Record_(computer_science)> or
>> tuple <https://en.wikipedia.org/wiki/Tuple>—represents a single,
>> implicitly structured data <https://en.wikipedia.org/wiki/Data> item in
>> a table <https://en.wikipedia.org/wiki/Table_(database)>. In simple
>> terms, a database table can be thought of as consisting of *rows* and
>> columns <https://en.wikipedia.org/wiki/Column_(database)> or fields
>> <https://en.wikipedia.org/wiki/Field_(computer_science)>.[1]
>> <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each row in a
>> table represents a set of related data, and every row in the table has the
>> same structure.
>>
>> When you have static columns and rows with maps, and lists, it is hard to
>> argue that every row has the same structure. Physically at the storage
>> layer they do not have the same structure and logically when accessing the
>> data they barely have the same structure, as the static column is just
>> appearing inside each row it is actually not contained in.
>>
>> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>>> which usually needs some extra explanation but is more accurate than
>>> "column family" or whatever other thrift era terminology people still use.
>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com>
>>> wrote:
>>>
>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>> table. This definition is closer to CQL and has some academic background
>>>> (distributed hash table).
>>>>
>>>>
>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>> bened...@apache.org> wrote:
>>>>
>>>>> Cassandra is not a "wide column store" anymore.  It has a schema.
>>>>> Only thrift users no longer think they have a schema (though they do), a

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo

I can iterate over JSON data stored in mongo and present it as a table with
rows and columns. It does not make mongo a rowstore.

On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> The problem with calling it a row store:
>
> https://en.wikipedia.org/wiki/Row_(database)
>
> In the context of a relational database
> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also called
> a record <https://en.wikipedia.org/wiki/Record_(computer_science)> or
> tuple <https://en.wikipedia.org/wiki/Tuple>—represents a single,
> implicitly structured data <https://en.wikipedia.org/wiki/Data> item in a
> table <https://en.wikipedia.org/wiki/Table_(database)>. In simple terms,
> a database table can be thought of as consisting of *rows* andcolumns
> <https://en.wikipedia.org/wiki/Column_(database)> or fields
> <https://en.wikipedia.org/wiki/Field_(computer_science)>.[1]
> <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each row in a
> table represents a set of related data, and every row in the table has the
> same structure.
>
> When you have static columns and rows with maps, and lists, it is hard to
> argue that every row has the same structure. Physically at the storage
> layer they do not have the same structure and logically when accessing the
> data they barely have the same structure, as the static column is just
> appearing inside each row it is actually not contained in.
>
> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com> wrote:
>>
>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>> table. This definition is closer to CQL and has some academic background
>>> (distributed hash table).
>>>
>>>
>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>> bened...@apache.org> wrote:
>>>
>>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>>> thrift users no longer think they have a schema (though they do), and
>>>> thrift is being deprecated.
>>>>
>>>> I really wish everyone would kill the term "wide column store" with
>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>
>>>> Not only that, but people don't even seem to realise the term "column
>>>> store" existed long before "wide column store" and the latter is often
>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>> /what-is-nosql/
>>>>
>>>> Since it no longer applies, let's all agree as a community to forget
>>>> this awful nomenclature ever existed.
>>>>
>>>>
>>>>
>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>> joaq...@thelastpickle.com> wrote:
>>>>
>>>>> Hi Mehdi,
>>>>>
>>>>> I can help clarify a few things.
>>>>>
>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>>> can have 2 billion columns, but in practice it shouldn't have more than 
>>>>> 100
>>>>> million columns.
>>>>>
>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>> keys. Together, the partition key(s) and clustering key(s) form the 
>>>>> primary
>>>>> key.
>>>>>
>>>>> When writing to Cassandra, you will need to provide the full primary
>>>>> key, however, when reading from Cassandra, you only need to provide the
>>>>> full partition key.
>>>>>
>>>>> When you only provide the partition key for a read operation, you're
>>>>> able to return all columns that exist on that partition with low latency.
>>>>> These columns are displayed as "CQL rows" to make it easier to reason 
>>>>> about.
>>>>>
>>>>> Consider the schema:
>>>&

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo

The problem with calling it a row store:

https://en.wikipedia.org/wiki/Row_(database)

In the context of a relational database
, a *row*—also called a
record  or tuple
—represents a single, implicitly
structured data  item in a table
. In simple terms, a
database table can be thought of as consisting of *rows* andcolumns
 or fields
.[1]
 Each row in a
table represents a set of related data, and every row in the table has the
same structure.

When you have static columns and rows with maps, and lists, it is hard to
argue that every row has the same structure. Physically at the storage
layer they do not have the same structure and logically when accessing the
data they barely have the same structure, as the static column is just
appearing inside each row it is actually not contained in.

On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad  wrote:

> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>> thrift users no longer think they have a schema (though they do), and
>>> thrift is being deprecated.
>>>
>>> I really wish everyone would kill the term "wide column store" with
>>> fire.  It seems to have never meant anything beyond "schema-less,
>>> row-oriented", and a "column store" means literally the opposite of this.
>>>
>>> Not only that, but people don't even seem to realise the term "column
>>> store" existed long before "wide column store" and the latter is often
>>> abbreviated to the former, as here: http://www.planetcassandra.
>>> org/what-is-nosql/
>>>
>>> Since it no longer applies, let's all agree as a community to forget
>>> this awful nomenclature ever existed.
>>>
>>>
>>>
>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>> joaq...@thelastpickle.com> wrote:
>>>
 Hi Mehdi,

 I can help clarify a few things.

 As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
 can have 2 billion columns, but in practice it shouldn't have more than 100
 million columns.

 Cassandra partitions data to certain nodes based on the partition
 key(s), but does provide the option of setting zero or more clustering
 keys. Together, the partition key(s) and clustering key(s) form the primary
 key.

 When writing to Cassandra, you will need to provide the full primary
 key, however, when reading from Cassandra, you only need to provide the
 full partition key.

 When you only provide the partition key for a read operation, you're
 able to return all columns that exist on that partition with low latency.
 These columns are displayed as "CQL rows" to make it easier to reason 
 about.

 Consider the schema:

 CREATE TABLE foo (
   bar uuid,

   boz uuid,

   baz timeuuid,
   data1 text,

   data2 text,

   PRIMARY KEY ((bar, boz), baz)

 );

 When you write to Cassandra you will need to send bar, boz, and baz and
 optionally data*, if it's relevant for that CQL row. If you chose not to
 define a data* field for a particular CQL row, then nothing is stored nor
 allocated on disk. But I wouldn't consider that caveat to be "schema-less".

 However, all writes to the same bar/boz will end up on the same
 Cassandra replica set (a configurable number of nodes) and be stored on the
 same place(s) on disk within the SSTable(s). And on disk, each field that's
 not a partition key is stored as a column, including clustering keys (this
 is optimized in Cassandra 3+, but now we're getting deep into internals).

 In this way you can get fast responses for all activity for bar/boz
 either over time, or for a specific time, with roughly the same number of
 disk seeks, with varying lengths on the disk scans.

 Hope that helps!

 Joaquin Casares
 Consultant
 Austin, TX

 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On Fri, Sep 30, 2016 at 11:40 AM, Carlos

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo

Then:
Physically: A data store which physically structured-log-merge of SSTables
(see) https://cloud.google.com/bigtable/.
Now:
One of the change made in Apache Cassandra 3.0 is a relatively
important refactor
of the storage engine .
I say refactor because the basics have not changed: data is still inserted
in a memtable which get flushed over time to a sstable with compaction
baby-sitting the set of sstables on disk, and reads uses both memtable and
sstables to retrieve results. But the internal structure of the objects
manipulated in those phases has changed, and that entails a significant
amount of refactoring in the code. The principal motivation is that new
storage engine more directly manipulate the structure that is exposed
through CQL, and knowing that structure at the storage engine level has
many advantages: some features are easier to add and the engine has more
information to optimize.

http://www.datastax.com/2015/12/storage-engine-30

Then:
An RPC abstraction over he data with methods like get_slice which selected
columns from a single 'row key'
Now:
A Query based abstraction over the data with queries like SELECT * FROM
table WHERE x=y in which most language features works over single
'partitions'

And 3? implementations of secondary index like things:
Secondary Indexes
Materialized Views
SasiIndex

Which add to query functionality typically by storing an index (or
secondary form) in a way optimized for given query functionality.






On Fri, Sep 30, 2016 at 1:52 PM, DuyHai Doan  wrote:

> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.org
>> /what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares > > wrote:
>>
>>> Hi Mehdi,
>>>
>>> I can help clarify a few things.
>>>
>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>> million columns.
>>>
>>> Cassandra partitions data to certain nodes based on the partition
>>> key(s), but does provide the option of setting zero or more clustering
>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>> key.
>>>
>>> When writing to Cassandra, you will need to provide the full primary
>>> key, however, when reading from Cassandra, you only need to provide the
>>> full partition key.
>>>
>>> When you only provide the partition key for a read operation, you're
>>> able to return all columns that exist on that partition with low latency.
>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>
>>> Consider the schema:
>>>
>>> CREATE TABLE foo (
>>>   bar uuid,
>>>
>>>   boz uuid,
>>>
>>>   baz timeuuid,
>>>   data1 text,
>>>
>>>   data2 text,
>>>
>>>   PRIMARY KEY ((bar, boz), baz)
>>>
>>> );
>>>
>>>
>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>> define a data* field for a particular CQL row, then nothing is stored nor
>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>
>>> However, all writes to the same bar/boz will end up on the same
>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>> not a partition key is stored as a column, including clustering keys (this
>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>
>>> In this way you can get fast responses for all activity for bar/boz
>>> either over time, or for a specific time, with roughly the same number of
>>> disk seeks, with varying lengths on the disk scans.
>>>
>>> Hope that helps!
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
>>> wrote:
>>>
 Cassandra is a

Re: Way to write to dc1 but keep data only in dc2

2016-09-29 Thread Edward Capriolo

You can do something like this, though your use of terminology like "queue"
really do not apply.

You can setup your keyspace with replication in only one data center.

CREATE KEYSPACE NTSkeyspace WITH REPLICATION = { 'class' :
'NetworkTopologyStrategy', 'dc2' : 3 };

This will make the NTSkeyspace like only in one data center. You can always
write to any Cassandra node, since they will transparently proxy the writes
to the proper place. You can configure your client to ONLY bind to specific
hosts or data centers/hosts DC1.

You can use a write consistency level like ANY. IF you use a consistency
level like ONE. It will cause the the write to block anyway waiting for
completion on the other datacenter.

Since you mentioned the words "like a queue" I would suggest an alternative
is to writing the data do a distributed commit log like kafka. At that
point you can decouple the write systems either through producer consumer
or through a tool like Kafka's mirror maker.

On Thu, Sep 29, 2016 at 5:24 PM, Dorian Hoxha 
wrote:

> I have dc1 and dc2.
> I want to keep a keyspace only on dc2.
> But I only have my app on dc1.
> And I want to write to dc1 (lower latency) which will not keep data
> locally but just push it to dc2.
> While reading will only work for dc2.
> Since my app is mostly write, my app ~will be faster while not having to
> deploy to the app to dc2 or write directly to dc2 with higher latency.
>
> dc1 would act like a queue or something and just push data + delete
> locally.
>
> Does this make sense ?
>
> Thank You
>

Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread Edward Capriolo

Truncate does a few things (based on version)
  truncate takes snapshots
  truncate causes a flush
  in very old versions truncate causes a schema migration.

In newer versions like cassandra 3.4 you have this knob.

# How long the coordinator should wait for truncates to complete
# (This can be much longer, because unless auto_snapshot is disabled
# we need to flush first so we can snapshot before removing the data.)
truncate_request_timeout_in_ms: 6

In older versions you can not control when this call will timeout, it is
fairly normal that it does!

On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos 
wrote:

> Hello,
>
> I keep executing a TRUNCATE command on an empty table and it throws
> OperationTimedOut randomly:
>
> cassandra@cqlsh> truncate test.mytable;
> OperationTimedOut: errors={}, last_host=cassiebeta-01
> cassandra@cqlsh> truncate test.mytable;
> OperationTimedOut: errors={}, last_host=cassiebeta-01
>
> Having a 3 node cluster running 2.1.14. No connectivity problems. Has
> anybody come across the same error?
>
> Thanks,
> George
>
>

Re: Reproducing exception in cassandra for testing failover scenarios

2016-09-24 Thread Edward Capriolo

You can also look at ccmbridge and farsandra.

Here is an example of bringing up an 8 node 3 datacenter cluster in a
single unit test using farsandra.

https://github.com/edwardcapriolo/ec/blob/master/src/test/java/Base/ThreeDcTest.java

On Sat, Sep 24, 2016 at 3:53 PM, Jonathan Haddad  wrote:

> Have you looked at Stubbed Cassandra, by Chris Batey?
> http://www.scassandra.org/
>
> On Sat, Sep 24, 2016 at 7:51 AM Bhuvan Rawal  wrote:
>
>> Hi,
>>
>> Is there a way to produce exceptions like NoHostAvailable, Timeout
>> exceptions , Not Enough replicas available, etc. locally so as to test the
>> failover code written at application level (paging state reinjection, or
>> retries).
>>
>> Thanks & Regards,
>> Bhuvan
>>
>

Re: Lightweight tx is good enough to handle counter?

2016-09-23 Thread Edward Capriolo

This might help you:

https://github.com/edwardcapriolo/ec/blob/master/src/test/java/Base/CompareAndSwapTest.java

It counts using lwt's with multiple threads.

On Fri, Sep 23, 2016 at 2:31 PM, Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> Since SERIAL consistency is not supported for batch updates, I used QUORUM
> for the operation.
>
> On Fri, Sep 23, 2016 at 11:23 AM, DuyHai Doan 
> wrote:
>
>> What is the consistency level used for the batch query ?
>>
>>
>> On Fri, Sep 23, 2016 at 8:19 PM, Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>>> Ok.  But I am trying to understand a scenario under which this mis-match
>>> can occur with light-weight tx.
>>>
>>> On Fri, Sep 23, 2016 at 11:14 AM, DuyHai Doan 
>>> wrote:
>>>
 Lightweight transaction is not available for counters, for the simple
 reason that counters are not idempotent

 On Fri, Sep 23, 2016 at 8:10 PM, Jaydeep Chovatia <
 chovatia.jayd...@gmail.com> wrote:

> We have a following table:
>
> create table mytable {
>
> id int,
> count int static,
> rec_id int,
> primary key (id, rec_id)
>
> };
>
> The count in the table represents how many records (rec_id clustering
> columns) exists. So when we add new a new record we do it following way:
>
> UNLOGGED BATCH
> insert into mytable (id, rec_id) values (, );
> update mytable set count =  + 1 where id =  if count =
> ; //light-weight transaction
> APPLY BATCH
>
> Then we do following read query as QUORUM:
>
> select count, rec_id from mytable where id = ;
>
> Here we expect count to exactly match number of rows (number of
> clustering rec_id) returned. But under a stress we have observed that they
> do not match sometimes.
>
> Is this expected?
>
> Thanks,
> Jaydeep
>


>>>
>>
>

Re: Upgrading from Cassandra 2.1.12 to 3.0.9

2016-09-23 Thread Edward Capriolo

To me clear about the mixed versions. You do not want to do it. Especially
if the versions are very far apart.

Typically you can not run repair in mixed versions.
You can not do schema changes with mixed versions.
Data files from new versions are not readable from old versions.

Basically you only want to have mixed versions while you are doing an
upgrade. You want to finish the update across all nodes quickly (without
rushing).

On Fri, Sep 23, 2016 at 1:22 PM, Jonathan Haddad  wrote:

> Oh yeah, and to the second question, can you run a cluster with mixed
> versions, the answer is absolutely not in any sort of sane way.
>
> On Fri, Sep 23, 2016 at 10:01 AM SmartCat - Scott Hirleman <
> sc...@smartcat.io> wrote:
>
>> I think the TLP team are recommending the approach I would as well, which
>> is to spin up a new cluster and copy your data into it for testing
>> purposes. If your app isn't in production yet, playing around with 3.7 is
>> great, really helps the community as Jon said; the word "upgrading" will
>> set off many alarm bells because the connotation is you have a stable
>> application built and are looking to put it on pretty bleeding edge tech
>> that hasn't been well tested yet, which is usually a road to tears.
>>
>> On Fri, Sep 23, 2016 at 10:28 AM, Jonathan Haddad 
>> wrote:
>>
>>> I strongly recommend not upgrading to 3.7.  Here's my thoughts on Tick
>>> Tock releases, copy / pasted from a previous email I wrote on this ML:
>>>
>>> 3.7 falls under the Tick Tock release cycle, which is almost completely
>>> untested in production by experienced operators.  In the cases where it
>>> has
>>> been tested, there have been numerous bugs found which I (and I think
>>> most
>>> people on this list) consider to be show stoppers.  Additionally, the
>>> Tick
>>> Tock release cycle puts the operator in the uncomfortable position of
>>> having to decide between upgrading to a new version with new features
>>> (probably new bugs) or back porting bug fixes from future versions
>>> themselves.There will never be a 3.7.1 release which fixes bugs in
>>> 3.7
>>> without adding new features.
>>>
>>> https://github.com/apache/cassandra/blob/trunk/NEWS.txt
>>>
>>> For new projects I recommend starting with the recently released 3.0.9.
>>>
>>> Assuming the project changes it's policy on releases (all signs point to
>>> yes), then by the time 4.0 rolls out a lot of the features which have
>>> been
>>> released in the 3.x series will have matured a bit, so it's very possible
>>> 4.0 will stabilize faster than the usual 6 months it takes for a major
>>> release.
>>>
>>> All that said, there's nothing wrong with doing compatibility & smoke
>>> tests
>>> against the latest 3.x release as well as 3.0 and reporting bugs back to
>>> the Apache Cassandra JIRA, I'm sure it would be greatly appreciated.
>>>
>>> https://issues.apache.org/jira/secure/Dashboard.jspa
>>>
>>> Jon
>>>
>>>
>>>
>>> On Fri, Sep 23, 2016 at 9:00 AM Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
>>> raziuddin.kh...@nih.gov> wrote:
>>>
 Thank you Joaquim for the advice.

 I seem to have sent this email with the wrong subject.  It should have
 been *Upgrading from Cassandra 2.1.12 to 3.7*, but too late now.

 The plan is to upgrade from 2.1.12 to 3.7 and to maintain a
 heterogeneous cluster only for a short time, while we observe how 3.7
 reacts to our client applications with traffic, then proceed with upgrading
 all DCs to 3.7.

 In our current installation we are using *memtable_allocation_type:
 offheap_objects*. Support for offheap_objects was removed in the 3.0.x
 branch and only added back in 3.4+, so an upgrade to 3.0.9 will not be
 possible for me unless I change this parameter.

 Still looking to hear from others about upgrade experiences, problems
 etc.

 -Razi

 *From: *Joaquin Casares 
 *Reply-To: *"user@cassandra.apache.org" 
 *Date: *Friday, September 23, 2016 at 11:41 AM
 *To: *"user@cassandra.apache.org" 
 *Cc: *"Khaja, Raziuddin (NIH/NLM/NCBI) [C]" 
 *Subject: *Re: Upgrading from Cassandra 2.1.12 to 3.0.9

 Hello Razi,

 Since you were using a highly stable version of 2.1.x, you may want to
 stick with using 3.0.9. 3.7 has introduced many great features, but has not
 been as heavily tested in production as 3.0.9.

 Running heterogenous clusters, even when using the same major version
 (e.g. 3.0.8 and 3.0.9), is never recommended. Running a cluster that spans
 major releases, for longer than the timespan of a routine upgrade, is
 strongly not advised.

 Hope that helps!

 Joaquin Casares

 Consultant

 Austin, TX

Re: Partition size

2016-09-12 Thread Edward Capriolo

In US english it is also debatable over which words are profane.

https://simple.wikipedia.org/wiki/Profanity
Different words can be profanity to different people, and what words are
thought of as profanity in English can change over time.

Suggestion:
https://www.youtube.com/watch?v=L0MK7qz13bU

On Mon, Sep 12, 2016 at 9:36 AM, Benedict Elliott Smith  wrote:

> The guidelines stipulate no "excessive or unnecessary" profanity.  Perhaps
> you also decide what qualifies as necessary or non-excessive?
>
> To summarise my view of this entire discussion: policing users is just...
> mind boggling. Well worthy of profanity.
>
>
>
>
>
> On 12 September 2016 at 14:16, Mark Thomas  wrote:
>
>> On 12/09/2016 12:51, Benedict Elliott Smith wrote:
>>
>> Please tone down your language. There is no need for profanity.
>>
>> Now is probably a good time to remind everyone of the Apache Code of
>> Conduct:
>> http://www.apache.org/foundation/policies/conduct.html
>>
>>
>> > (a link to 3rd party docs in response to a question when an
>> > equivalent link to project hosted docs was available)
>> >
>> >
>> > No, it wasn't.  Or at least the link you sent was not remotely the same
>> > as the link in the email you responded to, which was about how to
>> > understand your partition sizes - not the configuration parameter.
>> > Possibly you responded to the wrong email.
>>
>> I did respond to the wrong e-mail. I apologise for any confusion caused.
>> I intended to respond to this message:
>>
>> https://lists.apache.org/thread.html/6a68da3467b1fe8fe96c1be
>> de135d329419b78bf3cc3912e727304db@%3Cuser.cassandra.apache.org%3E
>>
>> rather than this one:
>>
>> https://lists.apache.org/thread.html/39a47ddf3cdecf6a196967b
>> a679c30d65279a2afc05a2588e8c69bac@%3Cuser.cassandra.apache.org%3E
>>
>> I must have clicked on the wrong message in the thread as I moved
>> between windows.
>>
>> > Any member of a project community (contributor, committer or PMC
>> > member)
>> >
>> >
>> > Right.  But policing /users/ (which Mark most certainly is) is just
>> > douchebaggery.  Users should feel free to participate with the resources
>> > /they know best /without fear of reprisal.  All of your statement
>> > suggests this shit belongs on the dev list.
>>
>> Users are as much part of the community as anyone else.
>>
>> > Or are we really suggesting that anyone discussing things on the user
>> > list must be 100% conversant with the "official" docs before they can
>> > make any kind of posting to the list?  Or otherwise they can expect to
>> > be attacked by other community members?
>>
>> I am not saying that at all. I am saying that, unless there is a good
>> reason, links to documentation - particularly reference documentation -
>> should be to the official Apache hosts docs in preference to links to a
>> third party.
>>
>> > Talk about chilling.  I do not see this promoting engagement - who wants
>> > to help other users out if this is what they can expect in return?  A
>> > public shaming?
>>
>> My response was not to Mark, but to the community as a whole. It was not
>> intended as either a reprimand or a shaming. If Mark feels differently,
>> then I apologise. My intention was to make a simple request to the
>> community as a whole to reference the official documentation in
>> preference to 3rd party docs unless there was a good reason.
>>
>> > Linking to third party docs, blogs, etc is fairly common but they
>> > tend to be linked by the OP in the form of "I've followed the
>> > instructions I found here and it doesn't work".
>> >
>> >
>> > Bullshit. Try a simple google
>> > search: site:https://mail-archives.apache.org/mod_mbox/cassandra-user/
>> > thelastpickle.com/blog 
>> >
>> > There are 500 results.  For just one external resource.  I don't recall
>> > a single one of these resulting in a reprimand.  Try the first three
>> > links from the search - they do not fit /any/ of your characterisations
>> > of "normal" - but they do fit mine.
>>
>> None of which, according to Google, have been made since I joined the
>> list in August. The past is the past and I don't see how a review of any
>> of those posts helps the project.
>>
>> There are also ~1500 references to docs.datastax.com. I don't think
>> reviewing those posts would help either.
>>
>> I'll note that the search didn't turn up this post (probably because of
>> the combined delay in mail-archives.a.o updating and Google indexing the
>> site):
>>
>> https://lists.apache.org/thread.html/7f60b641c40e5e7ba9c7c5c
>> 90eee47a94e5ce8690450c7617adc4a41@%3Cuser.cassandra.apache.org%3E
>>
>> That is a good example of the "more involved" question I referred to
>> previously. Hopefully, some of that information will find its way into
>> the architecture section of the official docs.
>>
>> > Perhaps you can link the history of projects attacking users for their
>> > email content?
>>
>> I did say

Re: select query returns wrong value if use DESC option

2014-03-13 Thread Edward Capriolo

Consider filing a jira. Cql is the standard interface to cassandra
everything is heavily tested.
On Thursday, March 13, 2014, Katsutoshi Nagaoka nagapad.0...@gmail.com
wrote:
 Hi.

 I am using Cassandra 2.0.6 version. There is a case that select query
returns wrong value if use DESC option. My test procedure is as follows:

 --
 cqlsh:test CREATE TABLE mytable (key int, range int, PRIMARY KEY (key,
range));
 cqlsh:test INSERT INTO mytable (key, range) VALUES (0, 0);
 cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0;

  key | range
 -+---
0 | 0

 (1 rows)

 cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0 ORDER BY
range ASC;

  key | range
 -+---
0 | 0

 (1 rows)

 cqlsh:test SELECT * FROM mytable WHERE key = 0 AND range = 0 ORDER BY
range DESC;

 (0 rows)
 --

 Why returns value is 0 rows if using DESC option? I expected the same 1
row as the return value of other queries. Does anyone has a similar issue?

 Thanks,
 Katsutoshi

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo

 and underlying CFs.  If
 anything, I think we could add some DESCRIBE information, which would help
 users with this, along the lines of:
 (https://issues.apache.org/jira/browse/CASSANDRA-6676)

 CQL does open up the *opportunity* for users to articulate more complex
 queries using more familiar syntax.  (including future things such as
 joins, grouping, etc.)   To me, that is exciting, and again -- one of the
 reasons we are leaning on it.

 my two cents,
 brian

 ---

 Brian O'Neill

 Chief Technology Officer


 *Health Market Science*

 *The Science of Better Results*

 2700 Horizon Drive * King of Prussia, PA * 19406

 M: 215.588.6024 * @boneill42 http://www.twitter.com/boneill42  *

 healthmarketscience.com


 This information transmitted in this email message is for the intended
 recipient only and may contain confidential and/or privileged material. If
 you received this email in error and are not the intended recipient, or the
 person responsible to deliver it to the intended recipient, please contact
 the sender at the email above and delete this email and any attachments and
 destroy any copies thereof. Any review, retransmission, dissemination,
 copying or other use of, or taking any action in reliance upon, this
 information by persons or entities other than the intended recipient is
 strictly prohibited.




 From: Peter Lin wool...@gmail.com
 Reply-To: user@cassandra.apache.org
 Date: Wednesday, March 12, 2014 at 8:44 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Proposal: freeze Thrift starting with 2.1.0


 yes, I was looking at intravert last nite.

 For the kinds of reports my customers ask us to do, joins and subqueries
 are important. Having tried to do a simple join in PIG, the level of pain
 is  high. I'm a masochist, so I don't mind breaking a simple join into
 multiple MR tasks, though I do find myself asking why the hell does it
 need to be so painful in PIG? Many of my friends say what is this crap!
 or this is better than writing sql queries to run reports?

 Plus, using ETL techniques to extract summaries only works for cases
 where the data is small enough. Once it gets beyond a certain size, it's
 not practical, which means we're back to crappy reporting languages that
 make life painful. Lots of big healthcare companies have thousands of MOLAP
 cubes on dozens of mainframes. The old OLTP - DW/OLAP creates it's own set
 of management headaches.

 being able to report directly on the raw data avoids many of the issues,
 but that's my bias perspective.




 On Wed, Mar 12, 2014 at 8:15 AM, DuyHai Doan doanduy...@gmail.comwrote:

 I would love to see Cassandra get to the point where users can define
 complex queries with subqueries, like, group by and joins -- Did you have
 a look at Intravert ? I think it does union  intersection on server side
 for you. Not sure about join though..


 On Wed, Mar 12, 2014 at 12:44 PM, Peter Lin wool...@gmail.com wrote:


 Hi Ed,

 I agree Solr is deeply integrated into DSE. I've looked at Solandra in
 the past and studied the code.

 My understanding is DSE uses Cassandra for storage and the user has
 both API available. I do think it can be integrated further to make
 moderate to complex queries easier and probably faster. That's why we 
 built
 our own JPA-like object query API. I would love to see Cassandra get to 
 the
 point where users can define complex queries with subqueries, like, group
 by and joins. Clearly lots of people want these features and even google
 built their own tools to do these types of queries.

 I see lots of people trying to improve this with Presto, Impala,
 drill, etc. To me, it's a natural progression as NoSql databases mature.
 For most people, at some point you want to be able to report/analyze the
 data. Today some people use MapReduce to summarize the data and ETL it 
 into
 a relational database or OLAP database for reporting. Even though I don't
 need CAS or atomic batch for what I do in cassandra today, I'm sure in the
 future it will be handy. From my experience in the financial and insurance
 sector, features like CAS and select for update are important for the
 kinds of transactions they handle. I'm bias, these kinds of features are
 useful and good addition to cassandra.

 These are interesting times in database land!




 On Tue, Mar 11, 2014 at 10:57 PM, Edward Capriolo 
 edlinuxg...@gmail.com wrote:

 Peter,
 Solr is deeply integrated into DSE. Seemingly this can not
 efficiently be done client side (CQL/Thrift whatever) but the Solandra
 approach was to embed Solr in Cassandra. I think that is actually the
 future client dev, allowing users to embedded custom server side logic 
 into
 there own API.

 Things like this take a while. Back in the day no one wanted
 cassandra to be heavy-weight and rejected ideas like read-before write
 operations. The common advice was do them client side. Now in the case 
 of
 collections sometimes they do read-before-write and it is the stuff

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo

 in this email message is for the intended
 recipient only and may contain confidential and/or privileged material. If
 you received this email in error and are not the intended recipient, or the
 person responsible to deliver it to the intended recipient, please contact
 the sender at the email above and delete this email and any attachments and
 destroy any copies thereof. Any review, retransmission, dissemination,
 copying or other use of, or taking any action in reliance upon, this
 information by persons or entities other than the intended recipient is
 strictly prohibited.




 From: Peter Lin wool...@gmail.com
 Reply-To: user@cassandra.apache.org
 Date: Wednesday, March 12, 2014 at 8:44 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Proposal: freeze Thrift starting with 2.1.0


 yes, I was looking at intravert last nite.

 For the kinds of reports my customers ask us to do, joins and subqueries
 are important. Having tried to do a simple join in PIG, the level of pain
 is  high. I'm a masochist, so I don't mind breaking a simple join into
 multiple MR tasks, though I do find myself asking why the hell does it
 need to be so painful in PIG? Many of my friends say what is this crap!
 or this is better than writing sql queries to run reports?

 Plus, using ETL techniques to extract summaries only works for cases
 where the data is small enough. Once it gets beyond a certain size, it's
 not practical, which means we're back to crappy reporting languages that
 make life painful. Lots of big healthcare companies have thousands of MOLAP
 cubes on dozens of mainframes. The old OLTP - DW/OLAP creates it's own set
 of management headaches.

 being able to report directly on the raw data avoids many of the issues,
 but that's my bias perspective.




 On Wed, Mar 12, 2014 at 8:15 AM, DuyHai Doan doanduy...@gmail.comwrote:

 I would love to see Cassandra get to the point where users can define
 complex queries with subqueries, like, group by and joins -- Did you have
 a look at Intravert ? I think it does union  intersection on server side
 for you. Not sure about join though..


 On Wed, Mar 12, 2014 at 12:44 PM, Peter Lin wool...@gmail.com wrote:


 Hi Ed,

 I agree Solr is deeply integrated into DSE. I've looked at Solandra in
 the past and studied the code.

 My understanding is DSE uses Cassandra for storage and the user has
 both API available. I do think it can be integrated further to make
 moderate to complex queries easier and probably faster. That's why we built
 our own JPA-like object query API. I would love to see Cassandra get to the
 point where users can define complex queries with subqueries, like, group
 by and joins. Clearly lots of people want these features and even google
 built their own tools to do these types of queries.

 I see lots of people trying to improve this with Presto, Impala, drill,
 etc. To me, it's a natural progression as NoSql databases mature. For most
 people, at some point you want to be able to report/analyze the data. Today
 some people use MapReduce to summarize the data and ETL it into a
 relational database or OLAP database for reporting. Even though I don't
 need CAS or atomic batch for what I do in cassandra today, I'm sure in the
 future it will be handy. From my experience in the financial and insurance
 sector, features like CAS and select for update are important for the
 kinds of transactions they handle. I'm bias, these kinds of features are
 useful and good addition to cassandra.

 These are interesting times in database land!




 On Tue, Mar 11, 2014 at 10:57 PM, Edward Capriolo 
 edlinuxg...@gmail.com wrote:

 Peter,
 Solr is deeply integrated into DSE. Seemingly this can not efficiently
 be done client side (CQL/Thrift whatever) but the Solandra approach was to
 embed Solr in Cassandra. I think that is actually the future client dev,
 allowing users to embedded custom server side logic into there own API.

 Things like this take a while. Back in the day no one wanted cassandra
 to be heavy-weight and rejected ideas like read-before write operations.
 The common advice was do them client side. Now in the case of 
 collections
 sometimes they do read-before-write and it is the stuff users want.



 On Tue, Mar 11, 2014 at 10:07 PM, Peter Lin wool...@gmail.com wrote:


 I'll give you a concrete example.

 One of the things we often need to do is do a keyword search on
 unstructured text. What we did in our tooling is we combined solr with
 cassandra, but we put an Object API infront of it. The API is inspired by
 JPA, but designed specifically to fit our needs.

 the user can do queries with like %blah% and behind the scenes we
 issues a query to solr to find the keys and then query cassandra for the
 records.

 With plain Cassandra, the developer has to manually do all of this
 stuff and integrate solr. Then they have to know which system to query 
 and
 in what order.  Our tooling lets the user define the schema in a modeler.
 Once

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo

This brainstorming idea has already been -1 ed in jira. ROFL.


On Wed, Mar 12, 2014 at 12:26 PM, Tupshin Harper tups...@tupshin.comwrote:

 OK, so I'm greatly encouraged by the level of interest in this. I went
 ahead and created https://issues.apache.org/jira/browse/CASSANDRA-6846,
 and will be starting to look into what the interface would have to look
 like. Anybody feel free to continue the discussion here, email me
 privately, or comment on ticket with your thoughts.

 -Tupshin


 On Wed, Mar 12, 2014 at 12:21 PM, Peter Lin wool...@gmail.com wrote:


 @Tupshin
 LOL, there's always enough rope to hang oneself. I agree it's badly
 needed for folks that really do need more messy queries. I was just
 discussing a similar concept with a co-worker and going over the pros/cons
 of various approaches to realizing the goal. I'm still digging into Presto.
 I saw some people are working on support for cassandra in presto.



 On Wed, Mar 12, 2014 at 12:15 PM, Tupshin Harper tups...@tupshin.comwrote:

 Peter,

 I didn't specifically call it out, but the interface I just proposed in
 my last email would be very much with the goal of make writing complex
 queries less painful and more efficient. by providing a deep integration
 mechanism to host that code.  It's very much a enough rope to hang
 ourselves approach, but badly needed,  IMO

 -Tupshin
 On Mar 12, 2014 12:12 PM, Peter Lin wool...@gmail.com wrote:


 @Nate
 I don't want to change the separation of components in cassandra. My
 ultimate goal is make writing complex queries less painful and more
 efficient. How that becomes reality is anyone's guess. There's different
 ways to get there. I also like having a plugging transport layer, which is
 why I feel sad every time I hear people say thrift is dead or thrift is
 frozen beyond 2.1 or don't use thrift. When people ask me what to learn
 with Cassandra, I say both thrift and CQL. Not everyone has time to read
 the native protocol spec or dive into cassandra code, but clearly some
 people do and enjoy it. I understand some people don't want the burden of
 maintaining Thrift, and it's totally valid. It's up to those that want to
 keep thrift to make sure patches and enhancements are well tested and 
 solid.





 On Wed, Mar 12, 2014 at 11:52 AM, Nate McCall 
 n...@thelastpickle.comwrote:

 IME/O one of the best things about Cassandra was the separation of
 (and I'm over-simplifying a bit, but still):

 - The transport/API layer
 - The Datacenter layer
 - The Storage layer


  I don't think we're well-served by the construction kit approach.
  It's difficult enough to evaluate NoSQL without deciding if you
 should
  run CQLSandra or Hectorsandra or Intravertandra etc.

 In tree, or even documented, I agree completely. I've never argued
 CQL3 is not the best approach for new users.

 But I've been around long enough that I know precisely what I want to
 do sometimes and any general purpose API will get in the way of that.

 I would like the transport/API layer to at least remain pluggable
 (hackable if you will) in it's current form. I really just want to be
 able to create my own *Daemon - as I can now - and go on my merry way
 without having to modify any internals. Much like with compaction
 strategies and SSTable components.

 Do you intend to change this current behavior of allowing a custom
 transport without code modification? (as opposed to changing the daemon
 class in a script?).

Re:

2014-03-12 Thread Edward Capriolo

That is too much ram for cassandra make that 6g to 10g.

The uneven perf could be because your requests do not shard evenly.

On Wednesday, March 12, 2014, Batranut Bogdan batra...@yahoo.com wrote:
 Hello all,

 The environment:

 I have a 6 node Cassandra cluster. On each node I have:
 - 32 G RAM
 - 24 G RAM for cassa
 - ~150 - 200 MB/s disk speed
 - tomcat 6 with axis2 webservice that uses the datastax java driver to
make
 asynch reads / writes
 - replication factor for the keyspace is 3

 All nodes in the same data center
 The clients that read / write are in the same datacenter so network is
 Gigabit.

 Writes are performed via exposed methods from Axis2 WS . The Cassandra
Java
 driver uses the round robin load balancing policy so all the nodes in the
 cluster should be hit with write requests under heavy write or read load
 from multiple clients.

 I am monitoring all nodes with JConsole from another box.

 The problem:

 When wrinting to a particular column family, only 3 nodes have high CPU
load
 ~ 80 - 99 %. The remaining 3 are at ~2 - 10 % CPU. During writes, reads
 timeout.

 I need more speed for both writes of reads. Due to the fact that 3 nodes
 barely have CPU activity leads me to think that the whole potential for C*
 is not touched.

 I am running out of ideas...

 If further details about the environment I can provide them.


 Thank you very much.

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo

The biggest expense of them is that you need to be authenticated to a
keyspace to perform and operation. Thus connection pools are bound to
keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
If you have 100 keyspaces you need 100 connection pools that starts to be a
pain very quickly.

I suggest keeping everything in one keyspace unless you really need
different replication factors and or network replication settings per
keyspace.


On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.comwrote:

 Hey all -

 My company is working on introducing a configuration service system to
 provide cofig data to several of our applications, to be backed by
 Cassandra. We're already using Cassandra for other services, and at
 the moment our pending design just puts all the new tables (9 of them,
 I believe) in one of our pre-existing keyspaces.

 I've got a few questions about keyspaces that I'm hoping for input on.
 Some Google hunting didn't turn up obvious answers, at least not for
 recent versions of Cassandra.

 1) What trade offs are being made by using a new keyspace versus
 re-purposing an existing one (that is in active use by another
 application)? Organization is the obvious answer, I'm looking for any
 technical reasons.

 2) Is there any per-keyspace overhead incurred by the cluster?

 3) Does it impact on-disk layout at all for tables to be in a
 different keyspace from others? Is any sort of file fragmentation
 potentially introduced just by doing this in a new keyspace as opposed
 to an exiting one?

 4) Does it add any metadata overhead to the system keyspace?

 5) Why might we *not* want to make a separate keyspace for this?

 6) Does anyone have experience with creating additional keyspaces to
 the point that Cassandra can no longer handle it? Note that we're
 *not* planning to do this, I'm just curious.

 Cheers,
 Martin

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo

The mathematical overhead is one thing. I would guess if you tried some
design with 10,000 keyspaces and then you ran into a bug/performance
problem the first thing someone would say to you is WTF do you have that
many keyspaces :) Don't let that be you.


On Tue, Mar 11, 2014 at 11:38 AM, Jeremiah D Jordan 
jeremiah.jor...@gmail.com wrote:

 Also, in terms of overhead, server side the overhead is pretty much all at
 the Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the
 same as 1 keyspace, 100 CF's.

 -Jeremiah

 On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan jeremiah.jor...@gmail.com
 wrote:

 The use of more than one keyspace is not uncommon.  Using 100's of them
 is.  That being said, different keyspaces let you specify different
 replication and different authentication.  If you are not going to be doing
 one of those things, then there really is no point to multiple keyspaces.
  If you do want to do one of those things, then go for it, make multiple
 keyspaces.


 -Jeremiah

 On Mar 11, 2014, at 10:17 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 I am not sure. As stated the only benefit of multiple keyspaces is if you
 need:

 1) different replication per keyspace
 2) different multiple data center configurations per keyspace

 Unless you have one of these cases you do not need to do this. I would
 always tackle this problem at the application level using something like:


 http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html

 Client issues aside, it is not a very common case and I would advice
 against uncommon set ups.



 On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright kwri...@nanigans.comwrote:

 Does this whole true for the native protocol?  I've noticed that you can
 create a session object in the datastax driver without specifying a
 keyspace and so long as you include the keyspace in all queries instead of
 just table name, it works fine.  In that case, I assume there's only one
 connection pool for all keyspaces.

 From: Edward Capriolo edlinuxg...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Tuesday, March 11, 2014 at 11:05 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: How expensive are additional keyspaces?

 The biggest expense of them is that you need to be authenticated to a
 keyspace to perform and operation. Thus connection pools are bound to
 keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
 If you have 100 keyspaces you need 100 connection pools that starts to be a
 pain very quickly.

 I suggest keeping everything in one keyspace unless you really need
 different replication factors and or network replication settings per
 keyspace.


 On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.comwrote:

 Hey all -

 My company is working on introducing a configuration service system to
 provide cofig data to several of our applications, to be backed by
 Cassandra. We're already using Cassandra for other services, and at
 the moment our pending design just puts all the new tables (9 of them,
 I believe) in one of our pre-existing keyspaces.

 I've got a few questions about keyspaces that I'm hoping for input on.
 Some Google hunting didn't turn up obvious answers, at least not for
 recent versions of Cassandra.

 1) What trade offs are being made by using a new keyspace versus
 re-purposing an existing one (that is in active use by another
 application)? Organization is the obvious answer, I'm looking for any
 technical reasons.

 2) Is there any per-keyspace overhead incurred by the cluster?

 3) Does it impact on-disk layout at all for tables to be in a
 different keyspace from others? Is any sort of file fragmentation
 potentially introduced just by doing this in a new keyspace as opposed
 to an exiting one?

 4) Does it add any metadata overhead to the system keyspace?

 5) Why might we *not* want to make a separate keyspace for this?

 6) Does anyone have experience with creating additional keyspaces to
 the point that Cassandra can no longer handle it? Note that we're
 *not* planning to do this, I'm just curious.

 Cheers,
 Martin

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo

This mistake is not a thrift limitation. In 0.6.X you could switch
keyspaces without calling setKeyspace(String) methods specified the
keyspace in every operation. This is mirrors the StorageProxy class. In
0.7.X setKeyspace() was created and the keyspace was removed from all these
thrift methods. I really dislike that change personally :)

If someone was so motivated, they could pretty easily (a couple days work)
add new methods to thrift that do not have this limitation.




On Tue, Mar 11, 2014 at 11:39 AM, Jonathan Ellis jbel...@gmail.com wrote:

 That is correct.  Another place where the mistakes of Thrift informed
 our development of the native protocol.

 On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright kwri...@nanigans.com
 wrote:
  Does this whole true for the native protocol?  I've noticed that you can
  create a session object in the datastax driver without specifying a
 keyspace
  and so long as you include the keyspace in all queries instead of just
 table
  name, it works fine.  In that case, I assume there's only one connection
  pool for all keyspaces.
 
  From: Edward Capriolo edlinuxg...@gmail.com
  Reply-To: user@cassandra.apache.org user@cassandra.apache.org
  Date: Tuesday, March 11, 2014 at 11:05 AM
  To: user@cassandra.apache.org user@cassandra.apache.org
  Subject: Re: How expensive are additional keyspaces?
 
  The biggest expense of them is that you need to be authenticated to a
  keyspace to perform and operation. Thus connection pools are bound to
  keyspaces. Switching a keyspace is an RPC operation. In the thrift
 client,
  If you have 100 keyspaces you need 100 connection pools that starts to
 be a
  pain very quickly.
 
  I suggest keeping everything in one keyspace unless you really need
  different replication factors and or network replication settings per
  keyspace.
 
 
  On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.com
  wrote:
 
  Hey all -
 
  My company is working on introducing a configuration service system to
  provide cofig data to several of our applications, to be backed by
  Cassandra. We're already using Cassandra for other services, and at
  the moment our pending design just puts all the new tables (9 of them,
  I believe) in one of our pre-existing keyspaces.
 
  I've got a few questions about keyspaces that I'm hoping for input on.
  Some Google hunting didn't turn up obvious answers, at least not for
  recent versions of Cassandra.
 
  1) What trade offs are being made by using a new keyspace versus
  re-purposing an existing one (that is in active use by another
  application)? Organization is the obvious answer, I'm looking for any
  technical reasons.
 
  2) Is there any per-keyspace overhead incurred by the cluster?
 
  3) Does it impact on-disk layout at all for tables to be in a
  different keyspace from others? Is any sort of file fragmentation
  potentially introduced just by doing this in a new keyspace as opposed
  to an exiting one?
 
  4) Does it add any metadata overhead to the system keyspace?
 
  5) Why might we *not* want to make a separate keyspace for this?
 
  6) Does anyone have experience with creating additional keyspaces to
  the point that Cassandra can no longer handle it? Note that we're
  *not* planning to do this, I'm just curious.
 
  Cheers,
  Martin
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo

So in the 0.6.X days a signature of a get looked something like this:

get(String keyspace, ColumnPath cp, String rowkey)

Besides changes form string - ByteBuffer the keyspace was pulled out of
the argument.

I think the better more flexible way to do this would be:

struct GetRequest {
   1: optional keyspace,
   2: required rowkey
   3: optional columnPath
}

get(GetRequest g)

This would put some burden on clients to make builder objects instead of
calling methods, but it would make something easier to evolve I think.

However it is hard for me to justify making a second copy of each method
for this small use case. Otherwise I would take that up.




On Tue, Mar 11, 2014 at 12:07 PM, Peter Lin wool...@gmail.com wrote:


 if I have time this summer, I may work on that, since I like having thrift.


 On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 This mistake is not a thrift limitation. In 0.6.X you could switch
 keyspaces without calling setKeyspace(String) methods specified the
 keyspace in every operation. This is mirrors the StorageProxy class. In
 0.7.X setKeyspace() was created and the keyspace was removed from all these
 thrift methods. I really dislike that change personally :)

 If someone was so motivated, they could pretty easily (a couple days
 work) add new methods to thrift that do not have this limitation.




 On Tue, Mar 11, 2014 at 11:39 AM, Jonathan Ellis jbel...@gmail.comwrote:

 That is correct.  Another place where the mistakes of Thrift informed
 our development of the native protocol.

 On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright kwri...@nanigans.com
 wrote:
  Does this whole true for the native protocol?  I've noticed that you
 can
  create a session object in the datastax driver without specifying a
 keyspace
  and so long as you include the keyspace in all queries instead of just
 table
  name, it works fine.  In that case, I assume there's only one
 connection
  pool for all keyspaces.
 
  From: Edward Capriolo edlinuxg...@gmail.com
  Reply-To: user@cassandra.apache.org user@cassandra.apache.org
  Date: Tuesday, March 11, 2014 at 11:05 AM
  To: user@cassandra.apache.org user@cassandra.apache.org
  Subject: Re: How expensive are additional keyspaces?
 
  The biggest expense of them is that you need to be authenticated to a
  keyspace to perform and operation. Thus connection pools are bound to
  keyspaces. Switching a keyspace is an RPC operation. In the thrift
 client,
  If you have 100 keyspaces you need 100 connection pools that starts to
 be a
  pain very quickly.
 
  I suggest keeping everything in one keyspace unless you really need
  different replication factors and or network replication settings per
  keyspace.
 
 
  On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer elreydet...@gmail.com
  wrote:
 
  Hey all -
 
  My company is working on introducing a configuration service system to
  provide cofig data to several of our applications, to be backed by
  Cassandra. We're already using Cassandra for other services, and at
  the moment our pending design just puts all the new tables (9 of them,
  I believe) in one of our pre-existing keyspaces.
 
  I've got a few questions about keyspaces that I'm hoping for input on.
  Some Google hunting didn't turn up obvious answers, at least not for
  recent versions of Cassandra.
 
  1) What trade offs are being made by using a new keyspace versus
  re-purposing an existing one (that is in active use by another
  application)? Organization is the obvious answer, I'm looking for any
  technical reasons.
 
  2) Is there any per-keyspace overhead incurred by the cluster?
 
  3) Does it impact on-disk layout at all for tables to be in a
  different keyspace from others? Is any sort of file fragmentation
  potentially introduced just by doing this in a new keyspace as opposed
  to an exiting one?
 
  4) Does it add any metadata overhead to the system keyspace?
 
  5) Why might we *not* want to make a separate keyspace for this?
 
  6) Does anyone have experience with creating additional keyspaces to
  the point that Cassandra can no longer handle it? Note that we're
  *not* planning to do this, I'm just curious.
 
  Cheers,
  Martin
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo

Peter,

My advice. Do not bother. I have become very active recently in attempting
to add features to thrift. I had 4 open tickets I was actively working on.
(I even found two bugs in the Cassandra in the process).

People were aware of this and still called this vote. Several commit people
have voted in a +1 and my -1 vote is non binding. It is a clear message:
The committers are unwilling to accept new thrift features even if said
features are contributed by others.

Edward



On Tue, Mar 11, 2014 at 5:51 PM, Peter Lin wool...@gmail.com wrote:


 My bias opinion, just because some member of cassandra develop want to
 abandon Thrift, I see benefits of continuing to improve it.

 The great thing about open source is that as long as some people want to
 keep working on it and improve it, it can happen. I plan to do my best to
 keep Thrift going, since it gives me fine grain control that I want and
 need. If the ultimate goal of Cassandra is to be as close to SQL as
 practical, my bias take is use a NewSQL database that gives you the full
 power of subqueries, like, exists and disjunction.

 When customers ask me which database to choose and they really want
 Relational model, I tell them use NewSql. I love that Cassandra sits
 between NoSql and NewSql. There are things I do in Cassandra today that are
 much harder in NewSql or NoSql document databases. NewSql database can
 scale to similar sizes, so the big part of big data won't be a
 significant advantage forever. Looking at some of the recent NewSql
 performance numbers, it's clear the gap is closing.

 peter



 On Tue, Mar 11, 2014 at 3:59 PM, Tyler Hobbs ty...@datastax.com wrote:


 On Tue, Mar 11, 2014 at 2:41 PM, Shao-Chuan Wang 
 shaochuan.w...@bloomreach.com wrote:


 So, does anyone know how to do describing the splits and describing
 the local rings using native protocol?


 For a ring description, you would do something like select peer, tokens
 from system.peers.  I'm not sure about describe_splits().



 Also, cqlsh uses python client, which is talking via thrift protocol
 too. Does it mean that it will be migrated to native protocol soon as well?


 Yes: https://issues.apache.org/jira/browse/CASSANDRA-6307


 --
 Tyler Hobbs
 DataStax http://datastax.com/

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo

one of the things I'd like to see happen is for Cassandra to support
queries with disjunction, exist, subqueries, joins and like. In theory CQL
could support these features in the future. Cassandra would need a new
query compiler and query planner. I don't see how the current design could
do these things without a significant redesign/enhancement. In a past life,
I implemented an inference rule engine, so I've spent over decade studying
and implementing query optimizers. All of these things can be done, it's
just a matter of people finding the time to do it.

I see what your saying. CQL started as a way to make slice easier but it is
not even a query language, retrofitting these things is going to be very
hard.


On Tue, Mar 11, 2014 at 7:45 PM, Peter Lin wool...@gmail.com wrote:


 I have no problems maintain my own fork :) or joining others forking
 cassandra.

 I'd be happy to work with you or anyone else to add features to thrift.
 That's the great thing about open source. Each person can scratch a
 technical itch and do what they love. I see lots of potential for Cassandra
 and many of them include improving thrift to make it happen. Some of the
 features in theory could be done in CQL, but not with the current design.

 one of the things I'd like to see happen is for Cassandra to support
 queries with disjunction, exist, subqueries, joins and like. In theory CQL
 could support these features in the future. Cassandra would need a new
 query compiler and query planner. I don't see how the current design could
 do these things without a significant redesign/enhancement. In a past life,
 I implemented an inference rule engine, so I've spent over decade studying
 and implementing query optimizers. All of these things can be done, it's
 just a matter of people finding the time to do it.




 On Tue, Mar 11, 2014 at 6:17 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Peter,

 My advice. Do not bother. I have become very active recently in
 attempting to add features to thrift. I had 4 open tickets I was actively
 working on. (I even found two bugs in the Cassandra in the process).

 People were aware of this and still called this vote. Several commit
 people have voted in a +1 and my -1 vote is non binding. It is a clear
 message: The committers are unwilling to accept new thrift features even if
 said features are contributed by others.

 Edward



 On Tue, Mar 11, 2014 at 5:51 PM, Peter Lin wool...@gmail.com wrote:


 My bias opinion, just because some member of cassandra develop want to
 abandon Thrift, I see benefits of continuing to improve it.

 The great thing about open source is that as long as some people want to
 keep working on it and improve it, it can happen. I plan to do my best to
 keep Thrift going, since it gives me fine grain control that I want and
 need. If the ultimate goal of Cassandra is to be as close to SQL as
 practical, my bias take is use a NewSQL database that gives you the full
 power of subqueries, like, exists and disjunction.

 When customers ask me which database to choose and they really want
 Relational model, I tell them use NewSql. I love that Cassandra sits
 between NoSql and NewSql. There are things I do in Cassandra today that are
 much harder in NewSql or NoSql document databases. NewSql database can
 scale to similar sizes, so the big part of big data won't be a
 significant advantage forever. Looking at some of the recent NewSql
 performance numbers, it's clear the gap is closing.

 peter



 On Tue, Mar 11, 2014 at 3:59 PM, Tyler Hobbs ty...@datastax.com wrote:


 On Tue, Mar 11, 2014 at 2:41 PM, Shao-Chuan Wang 
 shaochuan.w...@bloomreach.com wrote:


 So, does anyone know how to do describing the splits and describing
 the local rings using native protocol?


 For a ring description, you would do something like select peer,
 tokens from system.peers.  I'm not sure about describe_splits().



 Also, cqlsh uses python client, which is talking via thrift protocol
 too. Does it mean that it will be migrated to native protocol soon as 
 well?


 Yes: https://issues.apache.org/jira/browse/CASSANDRA-6307


 --
 Tyler Hobbs
 DataStax http://datastax.com/

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo

If were to run a fork it would do one thing:

Cassandra is a highly scalable, eventually consistent, distributed,
structured key-value store. Cassandra brings together the distributed
systems technologies from
Dynamohttp://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdfand
the data model from Google's
BigTable http://research.google.com/archive/bigtable-osdi06.pdf. Like
Dynamo, Cassandra is eventually
consistenthttp://www.allthingsdistributed.com/2008/12/eventually_consistent.html.
Like BigTable http://wiki.apache.org/cassandra/BigTable, Cassandra
provides a ColumnFamily http://wiki.apache.org/cassandra/ColumnFamily-based
data model richer than typical key/value systems.

I would provide an interface to access ColumnFamily based data models. In
other words, I would provide the Cassandra 0.8 API :)


On Tue, Mar 11, 2014 at 9:54 PM, Steven A Robenalt srobe...@stanford.eduwrote:

 I should add that I'm not trying to ignite a flame war. Just trying to
 understand your intentions.


 On Tue, Mar 11, 2014 at 6:50 PM, Steven A Robenalt 
 srobe...@stanford.eduwrote:

 Okay, I'm officially lost on this thread. If you plan on forking
 Cassandra to preserve and continue to enhance the Thrift interface, you
 would also want to add a bunch of relational features to CQL as part of
 that same fork?


 On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 one of the things I'd like to see happen is for Cassandra to support
 queries with disjunction, exist, subqueries, joins and like. In theory CQL
 could support these features in the future. Cassandra would need a new
 query compiler and query planner. I don't see how the current design could
 do these things without a significant redesign/enhancement. In a past life,
 I implemented an inference rule engine, so I've spent over decade studying
 and implementing query optimizers. All of these things can be done, it's
 just a matter of people finding the time to do it.

 I see what your saying. CQL started as a way to make slice easier but it
 is not even a query language, retrofitting these things is going to be very
 hard.



 On Tue, Mar 11, 2014 at 7:45 PM, Peter Lin wool...@gmail.com wrote:


 I have no problems maintain my own fork :) or joining others forking
 cassandra.

 I'd be happy to work with you or anyone else to add features to thrift.
 That's the great thing about open source. Each person can scratch a
 technical itch and do what they love. I see lots of potential for Cassandra
 and many of them include improving thrift to make it happen. Some of the
 features in theory could be done in CQL, but not with the current design.

 one of the things I'd like to see happen is for Cassandra to support
 queries with disjunction, exist, subqueries, joins and like. In theory CQL
 could support these features in the future. Cassandra would need a new
 query compiler and query planner. I don't see how the current design could
 do these things without a significant redesign/enhancement. In a past life,
 I implemented an inference rule engine, so I've spent over decade studying
 and implementing query optimizers. All of these things can be done, it's
 just a matter of people finding the time to do it.




 On Tue, Mar 11, 2014 at 6:17 PM, Edward Capriolo edlinuxg...@gmail.com
  wrote:

 Peter,

 My advice. Do not bother. I have become very active recently in
 attempting to add features to thrift. I had 4 open tickets I was actively
 working on. (I even found two bugs in the Cassandra in the process).

 People were aware of this and still called this vote. Several commit
 people have voted in a +1 and my -1 vote is non binding. It is a clear
 message: The committers are unwilling to accept new thrift features even 
 if
 said features are contributed by others.

 Edward



 On Tue, Mar 11, 2014 at 5:51 PM, Peter Lin wool...@gmail.com wrote:


 My bias opinion, just because some member of cassandra develop want
 to abandon Thrift, I see benefits of continuing to improve it.

 The great thing about open source is that as long as some people want
 to keep working on it and improve it, it can happen. I plan to do my best
 to keep Thrift going, since it gives me fine grain control that I want 
 and
 need. If the ultimate goal of Cassandra is to be as close to SQL as
 practical, my bias take is use a NewSQL database that gives you the full
 power of subqueries, like, exists and disjunction.

 When customers ask me which database to choose and they really want
 Relational model, I tell them use NewSql. I love that Cassandra sits
 between NoSql and NewSql. There are things I do in Cassandra today that 
 are
 much harder in NewSql or NoSql document databases. NewSql database can
 scale to similar sizes, so the big part of big data won't be a
 significant advantage forever. Looking at some of the recent NewSql
 performance numbers, it's clear the gap is closing.

 peter



 On Tue, Mar 11, 2014 at 3:59 PM, Tyler Hobbs ty

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo

Peter,
Solr is deeply integrated into DSE. Seemingly this can not efficiently be
done client side (CQL/Thrift whatever) but the Solandra approach was to
embed Solr in Cassandra. I think that is actually the future client dev,
allowing users to embedded custom server side logic into there own API.

Things like this take a while. Back in the day no one wanted cassandra to
be heavy-weight and rejected ideas like read-before write operations. The
common advice was do them client side. Now in the case of collections
sometimes they do read-before-write and it is the stuff users want.


On Tue, Mar 11, 2014 at 10:07 PM, Peter Lin wool...@gmail.com wrote:


 I'll give you a concrete example.

 One of the things we often need to do is do a keyword search on
 unstructured text. What we did in our tooling is we combined solr with
 cassandra, but we put an Object API infront of it. The API is inspired by
 JPA, but designed specifically to fit our needs.

 the user can do queries with like %blah% and behind the scenes we issues a
 query to solr to find the keys and then query cassandra for the records.

 With plain Cassandra, the developer has to manually do all of this stuff
 and integrate solr. Then they have to know which system to query and in
 what order.  Our tooling lets the user define the schema in a modeler. Once
 the model is done, it compiles the classes, configuration files, data
 access objects and unit tests.

 when the application makes a call, our query classes handle the details
 behind the scene. I know lots of people would like to see Solr integrated
 more deeply into Cassandra and CQL. I hope it happens in the future. If
 DataStax accepts my talk, we will be showing our temporal database and
 modeler in september.




 On Tue, Mar 11, 2014 at 9:54 PM, Steven A Robenalt 
 srobe...@stanford.eduwrote:

 I should add that I'm not trying to ignite a flame war. Just trying to
 understand your intentions.


 On Tue, Mar 11, 2014 at 6:50 PM, Steven A Robenalt srobe...@stanford.edu
  wrote:

 Okay, I'm officially lost on this thread. If you plan on forking
 Cassandra to preserve and continue to enhance the Thrift interface, you
 would also want to add a bunch of relational features to CQL as part of
 that same fork?


 On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 one of the things I'd like to see happen is for Cassandra to support
 queries with disjunction, exist, subqueries, joins and like. In theory CQL
 could support these features in the future. Cassandra would need a new
 query compiler and query planner. I don't see how the current design could
 do these things without a significant redesign/enhancement. In a past life,
 I implemented an inference rule engine, so I've spent over decade studying
 and implementing query optimizers. All of these things can be done, it's
 just a matter of people finding the time to do it.

 I see what your saying. CQL started as a way to make slice easier but
 it is not even a query language, retrofitting these things is going to be
 very hard.



 On Tue, Mar 11, 2014 at 7:45 PM, Peter Lin wool...@gmail.com wrote:


 I have no problems maintain my own fork :) or joining others forking
 cassandra.

 I'd be happy to work with you or anyone else to add features to
 thrift. That's the great thing about open source. Each person can scratch 
 a
 technical itch and do what they love. I see lots of potential for 
 Cassandra
 and many of them include improving thrift to make it happen. Some of the
 features in theory could be done in CQL, but not with the current 
 design.

 one of the things I'd like to see happen is for Cassandra to support
 queries with disjunction, exist, subqueries, joins and like. In theory CQL
 could support these features in the future. Cassandra would need a new
 query compiler and query planner. I don't see how the current design could
 do these things without a significant redesign/enhancement. In a past 
 life,
 I implemented an inference rule engine, so I've spent over decade studying
 and implementing query optimizers. All of these things can be done, it's
 just a matter of people finding the time to do it.




 On Tue, Mar 11, 2014 at 6:17 PM, Edward Capriolo 
 edlinuxg...@gmail.com wrote:

 Peter,

 My advice. Do not bother. I have become very active recently in
 attempting to add features to thrift. I had 4 open tickets I was actively
 working on. (I even found two bugs in the Cassandra in the process).

 People were aware of this and still called this vote. Several commit
 people have voted in a +1 and my -1 vote is non binding. It is a clear
 message: The committers are unwilling to accept new thrift features even 
 if
 said features are contributed by others.

 Edward



 On Tue, Mar 11, 2014 at 5:51 PM, Peter Lin wool...@gmail.com wrote:


 My bias opinion, just because some member of cassandra develop want
 to abandon Thrift, I see benefits of continuing to improve it.

 The great thing about open source

1 2 3 4 5 6 7 8 >

1 - 100 of 742 matches

Mail list logo