Re: thread architecture of Cassandra

2017-03-22 Thread Tyler Hobbs
It looks like your attachment didn't make it -- I'm not sure that the
mailing list supports them.  Maybe try posting it somewhere and linking it?

On Wed, Mar 22, 2017 at 3:10 PM, 杨苏立 Yang Su Li <yangs...@gmail.com> wrote:

> Hi,
>
> I am a graduate student working on scheduling on storage systems, and we
> are interested in how different threads in Cassandra interact with each
> other and how it might affect scheduling.
>
> I have written down my understanding on how Cassandra works based on its
> current thread architecture (attached). I am wondering if the developers of
> Cassandra could take a look at it and let me know if anything is incorrect
> or inaccurate, or if I have missed anything.
>
> Thanks a lot for your help!
>
> Suli
>
>
> --
> Suli Yang
>
> Department of Physics
> University of Wisconsin Madison
>
> 4257 Chamberlin Hall
> Madison WI 53703
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 3.0.12

2017-03-07 Thread Tyler Hobbs
+1

On Tue, Mar 7, 2017 at 10:15 AM, Michael Shuler <mich...@pbandjelly.org>
wrote:

> I propose the following artifacts for release as 3.0.12.
>
> This release addresses a possible 2.1->3.0 upgrade issue[3], along with
> a few fixes committed since 3.0.11.
>
> sha1: 50560aaf0f2d395271ade59ba9b900a84cae70f1
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.0.12-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1141/org/apache/cassandra/apache-cassandra/3.0.12/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1141/
>
> The Debian packages are available here: http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: (CHANGES.txt) https://goo.gl/RtBFVA
> [2]: (NEWS.txt) https://goo.gl/GGI0aq
> [3]: https://issues.apache.org/jira/browse/CASSANDRA-13294
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 2.2.9

2017-02-17 Thread Tyler Hobbs
+1

On Wed, Feb 15, 2017 at 7:16 PM, Michael Shuler <mich...@pbandjelly.org>
wrote:

> I propose the following artifacts for release as 2.2.9.
>
> sha1: 70a08f1c35091a36f7d9cc4816259210c2185267
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/2.2.9-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1139/org/apache/cassandra/apache-cassandra/2.2.9/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1139/
>
> The Debian packages are available here: http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: (CHANGES.txt) https://goo.gl/AYblr5
> [2]: (NEWS.txt) https://goo.gl/gIXxgR
>
> --
> Kind regards,
> Michael Shuler
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 2.1.17

2017-02-17 Thread Tyler Hobbs
+1

On Wed, Feb 15, 2017 at 7:16 PM, Michael Shuler <mich...@pbandjelly.org>
wrote:

> I propose the following artifacts for release as 2.1.17.
>
> sha1: 943db2488c8b62e1fbe03b132102f0e579c9ae17
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/2.1.17-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1140/org/apache/cassandra/apache-cassandra/2.1.17/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1140/
>
> The Debian packages are available here: http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: (CHANGES.txt) https://goo.gl/17RivH
> [2]: (NEWS.txt) https://goo.gl/axKXys
>
> --
> Kind regards,
> Michael Shuler
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 3.0.11

2017-02-17 Thread Tyler Hobbs
+1

On Wed, Feb 15, 2017 at 7:15 PM, Michael Shuler <mich...@pbandjelly.org>
wrote:

> I propose the following artifacts for release as 3.0.11.
>
> sha1: 338226e042a22242645ab54a372c7c1459e78a01
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.0.11-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1138/org/apache/cassandra/apache-cassandra/3.0.11/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1138/
>
> The Debian packages are available here: http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: (CHANGES.txt) https://goo.gl/ztSHQJ
> [2]: (NEWS.txt) https://goo.gl/nrengr
>
> --
> Kind regards,
> Michael Shuler
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: If reading from materialized view with a consistency level of quorum am I guaranteed to have the most recent view?

2017-02-10 Thread Tyler Hobbs
On Fri, Feb 10, 2017 at 1:13 PM, Benjamin Roth <benjamin.r...@jaumo.com>
wrote:

> Thanks a lot for that post. If I read the code right, then there is one
> case missing in your post.
> According to StorageProxy.mutateMV, local updates are NOT put into a batch
> and are instantly applied locally. So a batch is only created if remote
> mutations have to be applied and only for those mutations.
>

I can confirm that your understanding is correct here.  If the "paired" MV
replica happens to be the same node (which is guaranteed if and only if the
partition key is the same), the mutation is immediately applied locally.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: copy to stdout fails in cqlsh

2017-01-31 Thread Tyler Hobbs
_run()
>   File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
> 1544, in inner_run
> self.start_request(token_range, info)
>   File "/usr/lib/python2.7/dist-packages/cqlshlib/copyutil.py", line
> 1573, in start_request
> metadata =
> session.cluster.metadata.keyspaces[self.ks].tables[self.table]
> KeyError: 'spielplatz_5'
> Child process 14957 died with exit code 1
> Child process 14961 died with exit code 1
> Child process 14968 died with exit code 1
> Exported 5 ranges out of 513 total ranges, some records might be missing
> Processed: 0 rows; Rate:   0 rows/s; Avg. rate:   0 rows/s
> 0 rows exported to 0 files in 0.311 seconds.
>
>
>
>
> Any hints?
>
>
> cheers,
>  Michael
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: Batch read requests to one physical host?

2016-10-19 Thread Tyler Hobbs
There's a similar ticket focusing on range reads and secondary index
queries, but the work for these could be done together:
https://issues.apache.org/jira/browse/CASSANDRA-10414

On Tue, Oct 18, 2016 at 5:59 PM, Dikang Gu <dikan...@gmail.com> wrote:

> Hi there,
>
> We have couple use cases that are doing fanout read for their data, means
> one single read request from client contains multiple keys which live on
> different physical hosts. (I know it's not recommended way to access C*).
>
> Right now, on the coordinator, it will issue separate read commands even
> though they will go to the same physical host, which I think is causing a
> lot of overheads.
>
> I'm wondering is it valuable to provide a new read command, that
> coordinator can batch the reads to one datanode, and send to it in one
> message, and datanode will return the results for all keys belong to it?
>
> Any similar ideas before?
>
>
> --
> Dikang
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 2.1.16

2016-10-06 Thread Tyler Hobbs
+1

On Wed, Oct 5, 2016 at 6:09 PM, Michael Shuler <mich...@pbandjelly.org>
wrote:

> I propose the following artifacts for release as 2.1.16.
>
> sha1: 87034cd05964e64c6c925597279865a40a8c152f
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/2.1.16-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1129/org/apache/cassandra/apache-cassandra/2.1.16/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1129/
>
> The Debian packages are available here: http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: (CHANGES.txt) https://goo.gl/xc7jn6
> [2]: (NEWS.txt) https://goo.gl/O0C3Gb
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Accept dtests Donation Into Project

2016-09-30 Thread Tyler Hobbs
+1

On Fri, Sep 30, 2016 at 1:51 PM, Nate McCall <zzn...@apache.org> wrote:

> I propose we begin the process of accepting the contribution of the
> dtest codebase (https://github.com/riptano/cassandra-dtest) into the
> project.
>
> Background discussion thread here:
> https://lists.apache.org/thread.html/840fd900fb7f6568bfa008d122d437
> 5b708c1f7f1b5929018118d5d5@%3Cdev.cassandra.apache.org%3E
>
> Note: It won't be immediate as there are some steps to follow [0] for
> accepting outside code contributions.
>
> The vote will be open for 72 hours.
>
> -Nate
>
> [0] http://incubator.apache.org/ip-clearance/ip-clearance-template.html
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 3.9

2016-09-27 Thread Tyler Hobbs
+1

On Mon, Sep 26, 2016 at 10:12 AM, Michael Shuler <mich...@pbandjelly.org>
wrote:

> I propose the following artifacts for release as 3.9.
>
> sha1: c1fa21458777b51a9b21795330ed6f298103b436
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.9-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1127/org/apache/cassandra/apache-cassandra/3.9/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1127/
>
> The Debian packages are available here: http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: (CHANGES.txt) https://goo.gl/TEMHqi
> [2]: (NEWS.txt) https://goo.gl/1w6Ec1
>
> --
> Kind regards,
> Michael Shuler
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 3.8

2016-09-27 Thread Tyler Hobbs
+1

On Mon, Sep 26, 2016 at 9:52 AM, Michael Shuler <mich...@pbandjelly.org>
wrote:

> I propose the following artifacts for release as 3.8.
>
> sha1: ce609d19fd130e16184d9e6d37ffee4a1ebad607
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.8-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1126/org/apache/cassandra/apache-cassandra/3.8/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1126/
>
> The debian packages are available here: http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: (CHANGES.txt) https://goo.gl/b80Qe2
> [2]: (NEWS.txt) https://goo.gl/Aen2iN
>
> --
> Kind regards,
> Michael
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: Proposal - 3.5.1

2016-09-15 Thread Tyler Hobbs
On Thu, Sep 15, 2016 at 2:22 PM, Benedict Elliott Smith <bened...@apache.org
> wrote:

> Feature releases don't have to be on the same cadence as bug fixes. They're
> naturally different beasts.
>

With the exception of critical bug fixes (which can warrant an immediate
release), I think keeping a regular cadence makes us less likely to slip
and fall behind on releases.


>
> Why not stick with monthly feature releases, but mark every third (or
> sixth) as a supported release that gets quarterly updates for 2-3 quarters?
>

That's also a good idea.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: Proposal - 3.5.1

2016-09-15 Thread Tyler Hobbs
xpected error deserializing mutation; saved to
> > > > >>>> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4rgn/T/
> > > > mutation6313332720566971713dat.
> > > > >>>> This may be caused by replaying a mutation against a table with
> > the
> > > > same
> > > > >>>> name but incompatible schema.  Exception follows:
> > > > >>>> org.apache.cassandra.serializers.MarshalException: Expected 4
> byte
> > > > long for
> > > > >>>> date (0)
> > > > >>>>
> > > > >>>> I mean.. come on.  It's an easy fix.  It cleanly merges against
> > 3.5
> > > > (and
> > > > >>>> probably the other releases) and requires very little investment
> > > from
> > > > >>>> anyone.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa <
> > > > jeff.ji...@crowdstrike.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency
> > > > fixes,
> > > > >>>>> but we certainly didn’t/won’t go back and cut new releases from
> > > every
> > > > >>>>> branch for every critical bug in future releases, so I think we
> > > need
> > > > to
> > > > >>>>> draw the line somewhere. If it’s fixed in 3.7 and 3.0.x (x >=
> 6),
> > > it
> > > > seems
> > > > >>>>> like you’ve got options (either stay on the tick and go up to
> > 3.7,
> > > > or bail
> > > > >>>>> down to 3.0.x)
> > > > >>>>>
> > > > >>>>> Perhaps, though, this highlights the fact that tick/tock may
> not
> > be
> > > > the
> > > > >>>>> best option long term. We’ve tried it for a year, perhaps we
> > should
> > > > instead
> > > > >>>>> discuss whether or not it should continue, or if there’s
> another
> > > > process
> > > > >>>>> that gives us a better way to get useful patches into versions
> > > > people are
> > > > >>>>> willing to run in production.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On 9/14/16, 8:55 PM, "Jonathan Haddad" <j...@jonhaddad.com>
> > wrote:
> > > > >>>>>
> > > > >>>>>> Common sense is what prevents someone from upgrading to yet
> > > another
> > > > >>>>>> completely unknown version with new features which have
> probably
> > > > broken
> > > > >>>>>> even more stuff that nobody is aware of.  The folks I'm
> helping
> > > > right
> > > > >>>>>> deployed 3.5 when they got started because
> > > > >>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__
> > > > cassandra.apache.org=DQIBaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kq
> > > > hAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=
> > > > MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY=pLP3udocOcAG6k_
> > > > sAb9p8tcAhtOhpFm6JB7owGhPQEs=
> > > > >>>>> suggests
> > > > >>>>>> it's acceptable for production.  It turns out using 4 of the
> > built
> > > > in
> > > > >>>>>> datatypes of the database result in the server being unable to
> > > > restart
> > > > >>>>>> without clearing out the commit logs and running a repair.
> That
> > > > screams
> > > > >>>>>> critical to me.  You shouldn't even be able to install 3.5
> > without
> > > > the
> > > > >>>>>> patch I've supplied - that bug is a ticking time bomb for
> anyone
> > > > that
> > > > >>>>>> installs it.
> > > > >>>>>>
> > > > >>>>>> On Wed, Sep 14, 2016 at 8:12 PM Michael Shuler <
> > > > mich...@pbandjelly.org>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> What's preventing the use of the 3.6 or 3.7 releases where
> this
> > > > bug is
> > > > >>>>>>> already fixed? This is also fixed in the 3.0.6/7/8 releases.
> > > > >>>>>>>
> > > > >>>>>>> Michael
> > > > >>>>>>>
> > > > >>>>>>> On 09/14/2016 08:30 PM, Jonathan Haddad wrote:
> > > > >>>>>>>> Unfortunately CASSANDRA-11618 was fixed in 3.6 but was not
> > back
> > > > >>>>> ported to
> > > > >>>>>>>> 3.5 as well, and it makes Cassandra effectively unusable if
> > > > someone
> > > > >>>>> is
> > > > >>>>>>>> using any of the 4 types affected in any of their schema.
> > > > >>>>>>>>
> > > > >>>>>>>> I have cherry picked & merged the patch back to here and
> will
> > > put
> > > > it
> > > > >>>>> in a
> > > > >>>>>>>> JIRA as well tonight, I just wanted to get the ball rolling
> > asap
> > > > on
> > > > >>>>> this.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > com_rustyrazorblade_cassandra_tree_fix-5Fcommitlog-
> > > 5Fexception=DQIBaQ=
> > > > 08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=
> > > > yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=
> > > > MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY=ktY5tkT-
> > > > nO1jtyc0EicbgZHXJYl03DvzuxqzyyOgzII=
> > > > >>>>>>>>
> > > > >>>>>>>> Jon
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > http://twitter.com/tjake
> >
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 3.0.9

2016-09-15 Thread Tyler Hobbs
+1

On Thu, Sep 15, 2016 at 1:57 PM, Jake Luciani <j...@apache.org> wrote:

> I propose the following artifacts for release as 3.0.9.
>
> sha1: d600f51ee1a3eb7b30ce3c409129567b70c22012
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.0.9-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1124/org/apache/cassandra/apache-cassandra/3.0.9/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1124/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: https://goo.gl/JKkE05 (CHANGES.txt)
> [2]: https://goo.gl/Hi8X71 (NEWS.txt)
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: CQL Collections appear slow

2016-09-01 Thread Tyler Hobbs
On Wed, Aug 31, 2016 at 11:56 PM, Ben Frank <b...@airlust.com> wrote:

> Interestingly it's still dog slow while (presumably) doing the
> deserialization in python, so although the trace reports good results it's
> still taking ~3 seconds to load data into python wall clock time.
>

If you're not doing this already, I suggest building the python driver with
Cython (check the driver docs).  That makes serialization and
deserialization much more efficient.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: CQL Collections appear slow

2016-08-31 Thread Tyler Hobbs
The map version of the schema needs to deserialize, serialize, and then
deserialize about 85 times more cells, if your average map has 85
elements.  I would assume that's where most of the performance slowdown is
coming from.  If you can take the time to run that through a profiler, that
would be useful to see if there is some unexpected inefficiency.

I'll also point out that you could use a frozen map (e.g. frozen<map<text,
float>>) and you'd probably get performance that's somewhere in the middle
of the other two approaches.

On Tue, Aug 30, 2016 at 8:00 PM, Ben Frank <b...@airlust.com> wrote:

> Hi all, I posted this question on stackoverflow - I'm having an issue with
> CQL collections, anyone got any insight here?
>
> (http://stackoverflow.com/questions/39218180/cql-collections-appear-slow)
>
> I'm playing around with storing data in cassandra and I'm finding a
> significant performance problem with CQL collections. I started with this
> schema:
>
> CREATE TABLE TEST (
>   date DATE,
>   tranche TEXT,
>   id INT,
>   properties MAP<TEXT,FLOAT>,
>   PRIMARY KEY ((date,tranche), id))
>
> if I run a query for all data in this partition
>
> SELECT * FROM TEST where date = "2016-08-26" and tranche = "third"
>
> tracing reports it takes ~1.3 seconds to load 15K rows. There are about 85
> entries in the map. Wall clock time from python is ~5 seconds. This seems
> really slow to load just one 'partition'
>
> So I tried this schema instead and used message pack to store the entire
> map in a single cell
>
> CREATE TABLE TEST (
>   date DATE,
>   tranche TEXT,
>   id INT,
>   properties blob,
>   PRIMARY KEY ((date,tranche), id))
>
> Now the same query takes ~60ms (as reported by tracing) and ~500ms wall
> clock time (again using python)
>
> I get that there's more to do with the MAP version, but this seems like an
> unexpected performance degradation.
>
> One oddity I noticed while testing this was that in both cases tracing
> reported it was returning 15K cells (which corresponds to the number of
> rows). I'd expect this in the second schema, but my understanding was that
> each element in a map was stored in it's own cell in current versions of
> cassandra, so a bit surprised by this.
>
> I'm using version 3.7 of cassandra and the datastax python drivers. Anyone
> got any insight into what happening here?
>
> -Ben
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


CASSANDRA-10993 Approaches

2016-08-17 Thread Tyler Hobbs
In the spirit of the recent thread about discussing large changes on the
Dev ML, I'd like to talk about CASSANDRA-10993, the first step in the
"thread per core" work.

The goal of 10993 is to transform the read and write paths into an
event-driven model powered by event loops.  This means that each request
can be handled on a single thread (although typically broken up into
multiple steps, depending on I/O and locking) and the old mutation and read
thread pools can be removed.  So far, we've prototyped this with a couple
of approaches:

The first approach models each request as a state machine (or composition
of state machines).  For example, a single write request is encapsulated in
a WriteTask object which moves through a series of states as portions of
the write complete (allocating a commitlog segment, syncing the commitlog,
receiving responses from remote replicas).  These state transitions are
triggered by Events that are emitted by, e.g., the
CommitlogSegmentManager.  The event loop that manages tasks, events,
timeouts, and scheduling is custom and is (currently) closely tied to a
Netty event loop.  Here are a couple of example classes to take a look at:

WriteTask:
https://github.com/thobbs/cassandra/blob/CASSANDRA-10993-WIP/src/java/org/apache/cassandra/poc/WriteTask.java
EventLoop:
https://github.com/thobbs/cassandra/blob/CASSANDRA-10993-WIP/src/java/org/apache/cassandra/poc/EventLoop.java

The second approach utilizes RxJava and the Observable pattern.  Where we
would wait for emitted events in the state machine approach, we instead
depend on an Observable to "push" the data/result we're awaiting.
Scheduling is handled by an Rx scheduler (which is customizable).  The code
changes required for this are, overall, less intrusive.  Here's a quick
example of what this looks like for high-level operations:
https://github.com/thobbs/cassandra/blob/rxjava-rebase/src/java/org/apache/cassandra/service/StorageProxy.java#L1724-L1732
.

So far we've benchmarked both approaches on in-memory reads to get an idea
of the upper-bound performance of both approaches.  Throughput appears to
be very similar with both branches.

There are a few considerations up for debate as to which approach we should
go with that I would appreciate input on.

First, performance.  There are concerns that going with Rx (or something
similar) may limit the peak performance we can eventually attain in a
couple of ways.  First, we don't have as much control over the event loop,
scheduling, and chunking of tasks.  With the state machine approach, we're
writing all of this, so it's totally under our control.  With Rx, a lot of
things are customizable or already have decent tools, but this may come up
short in critical ways.  Second, the overhead of the Observable machinery
may become significant as other bottlenecks are removed.  Of course,
WriteTask et al have their own overhead, but once again, we have more
control there.

The second consideration is code style and ease of understanding.  I think
both of these approaches have downsides in different areas.  The state
machines are very explicit (an upside), but also very verbose and somewhat
disjointed.  Most of the complex operations in Cassandra can't cleanly be
represented as a single state machine, because they're logically multiple
state machines operating in parallel (e.g. the local write path and the
remote write path in WriteTask).  After working on the prototypes, I've
found the state machines to be harder to logically follow than I had
hoped.  Perhaps we could come up with better abstractions and patterns for
this, but that's the current state of things.  On the Rx side, the downside
is that the behavior is much less explicit.  Additionally, some find it
more difficult to mentally follow the flow of execution.  Based on my past
work with a large Twisted Python codebase, I'll agree that it's tough to
get used to, but not unmanageable with experience and good coding patterns.

A third consideration is code reuse.  A big advantage of Rx is that it
comes with many tools for transforming Observables, handling multiple
Observables, error handling, and tracing.  With the state machine approach,
we would need to write equivalents for these from scratch.  This is a
non-trivial amount of work that might make the project take significantly
longer to complete.  Combining this with fact that the Rx approach would be
less invasive, it seems like we would have an easier time introducing
incremental changes to the code base rather than having a big-bang commit.

If I can boil these concerns down to one tradeoff, it's this: do we want to
expend more effort and have more explicit code and complete control, or do
we want to piggyback on the Rx work, give up some control, and (hopefully)
get to the next, deeper optimizations sooner?

Thanks for any input on this topic.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 3.8

2016-07-28 Thread Tyler Hobbs
g a whole bunch of test failures and hoping they don't
>>> > >>> mean anything, because we
>>> > >> just
>>> > >>> had that thread about getting more rigorous about tests, not
>>> > >>> less.
>>> > >>>
>>> > >>> So I would recommend we go ahead and fix this before releasing,
>>> > >>> and to avoid a super compressed 3.9 window either retarget 3.8
>>> > >>> for August, or
>>> > >> 3.9
>>> > >>> for September.
>>> > >>>
>>> > >>> On Thu, Jul 21, 2016 at 9:58 AM, Aleksey Yeschenko
>>> > >>> <alek...@apache.org> wrote:
>>> > >>>
>>> > >>>> What we’d usually do is revert the offending ticket and push it
>>> > >>>> to the next release, if this indeed were significant enough.
>>> > >>>>
>>> > >>>> So option 4 would be to revert CDC fast (painful) and ship.
>>> > >>>> Option 5 would be to quickly fix the issue, retag, and revote,
>>> > >>>> with 3.9 still following up on schedule. Option 6 would be to
>>> > >>>> ignore the calendar entirely. Fix or revert the
>>> > >> issue
>>> > >>>> eventually, and release 3.8 then. Have 3.9 and 3.0.9 out at
>>> > >>>> whatever
>>> > >> time
>>> > >>>> we decide to, and go back to monthly cycles from there on.
>>> > >>>>
>>> > >>>> TBH I don’t think anybody is even going to notice, or care. So
>>> > >>>> I’m fine with 1, 4, 5, 6, but not reverting my +1 so far.
>>> > >>>>
>>> > >>>> -- AY
>>> > >>>>
>>> > >>>> On 21 July 2016 at 14:46:17, Sylvain Lebresne
>>> > >>>> (sylv...@datastax.com) wrote:
>>> > >>>>
>>> > >>>> On Thu, Jul 21, 2016 at 3:21 PM, Jonathan Ellis
>>> > >>>> <jbel...@gmail.com>
>>> > >> wrote:
>>> > >>>>
>>> > >>>>> I see the alternatives as:
>>> > >>>>>
>>> > >>>>> 1. Release this as 3.8 2. Skip 3.8 and release 3.9 next month
>>> > >>>>> on schedule 3. Skip this month and release 3.8 next month
>>> > >>>>> instead
>>> > >>>>>
>>> > >>>>
>>> > >>>> I've hopefully made it clear I don't really like 1. I'm totally
>>> > >>>> fine
>>> > >> with
>>> > >>>> either 2 or 3 though (with a very very small preference for 3.
>>> > >>>> because I suspect skipping a release might confuse a few users,
>>> > >>>> but also knowing
>>> > >> that
>>> > >>>> 2. has the small advantage of keeping the 3.0.x and 3.x
>>> > >>>> versions
>>> > >> released
>>> > >>>> more or less in lockstep).
>>> > >>>>
>>> > >>>>
>>> > >>>>
>>> > >>>>>
>>> > >>>>> On Thu, Jul 21, 2016 at 8:19 AM, Aleksey Yeschenko
>>> > >>>>> <alek...@apache.org
>>> > >>>
>>> > >>>>> wrote:
>>> > >>>>>
>>> > >>>>>> I still think the issue is minor enough, and with 3.8 being
>>> > >>>>>> extremely delayed, and being a non-odd release, at that,
>>> > >>>>>> we’d be better off just pushing it.
>>> > >>>>>>
>>> > >>>>>> Also, I know we’ve been easy on -1s when voting on
>>> > >>>>>> releases, but I
>>> > >> want
>>> > >>>>> to
>>> > >>>>>> remind people in general that release votes can not be
>>> > >>>>>> vetoed and only require a majority of binding votes,
>>> > >>>>>> http://www.apache.org/foundation/voting.html#ReleaseVotes
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>> -- AY
>>> > >>>>>>
>>> > >>>>>> On 21 July 2016 at 08:57:22, Sylvain Lebresne
>>> > >>>>>> (sylv...@datastax.com) wrote:
>>> > >>>>>>
>>> > >>>>>> Sorry but I'm (binding) -1 on this because of
>>> > >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-12236.
>>> > >>>>>>
>>> > >>>>>> I disagree that knowingly releasing a version that will
>>> > >>>>>> temporarily
>>> > >>>> break
>>> > >>>>>> in-flight queries during upgrade, even if it's for a very
>>> > >>>>>> short
>>> > >>>>> time-frame
>>> > >>>>>> until re-connection, is ok. I'll note in particular that in
>>> > >>>>>> the test report, there is 74! failures in the upgrade tests
>>> > >>>>>> (for reference the
>>> > >>>> 3.7
>>> > >>>>>> test report had only 2 upgrade tests failure both with open
>>> > >>>>>> tickets).
>>> > >>>>> Given
>>> > >>>>>> that we have a known problem during upgrade, I don't really
>>> > >>>>>> buy the
>>> > >> "We
>>> > >>>>> are
>>> > >>>>>> assuming these are due to a recent downsize in instance
>>> > >>>>>> size that
>>> > >> these
>>> > >>>>>> tests run on" and that suggest to me the problem is not too
>>> > >>>>>> minor.
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>> On Thu, Jul 21, 2016 at 6:18 AM, Dave Brosius <
>>> > >>>> dbros...@mebigfatguy.com>
>>> > >>>>>> wrote:
>>> > >>>>>>
>>> > >>>>>>> +1
>>> > >>>>>>>
>>> > >>>>>>>
>>> > >>>>>>> On 07/20/2016 05:48 PM, Michael Shuler wrote:
>>> > >>>>>>>
>>> > >>>>>>>> I propose the following artifacts for release as 3.8.
>>> > >>>>>>>>
>>> > >>>>>>>>
>>> > >>>>>>>> sha1: c3ded0551f538f7845602b27d53240cd8129265c Git:
>>> > >>>>>>>>
>>> > >>>>>>>>
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>
>>> > >>
>>> >
>>>
>>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.8-tentative
>>> > >>
>>> > >>>>>>>> Artifacts:
>>> > >>>>>>>>
>>> > >>>>>>>>
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>
>>> > >>
>>> >
>>>
>>> https://repository.apache.org/content/repositories/orgapachecassandra-1123/org/apache/cassandra/apache-cassandra/3.8/
>>> > >>
>>> > >>>>>>>> Staging repository:
>>> > >>>>>>>>
>>> > >>>>>>>>
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>
>>> > >>
>>> >
>>>
>>> https://repository.apache.org/content/repositories/orgapachecassandra-1123/
>>> > >>
>>> > >>>>>>>>
>>> > >>>>>>>> The debian packages are available here:
>>> > >>>>>>>> http://people.apache.org/~mshuler/
>>> > >>>>>>>>
>>> > >>>>>>>> The vote will be open for 72 hours (longer if needed).
>>> > >>>>>>>>
>>> > >>>>>>>>
>>> > >>>>>>>> [1]: http://goo.gl/oGNH0i (CHANGES.txt) [2]:
>>> > >>>>>>>> http://goo.gl/KjMtUn (NEWS.txt) [3]:
>>> > >>>>>>>> https://goo.gl/TxVLKo (3.8 Test Summary)
>>> > >>>>>>>>
>>> > >>>>>>>>
>>> > >>>>>>>
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> -- Jonathan Ellis Project Chair, Apache Cassandra co-founder,
>>> > >>>>> http://www.datastax.com @spyced
>>> > >>>>>
>>> > >>>>
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>
>>> > >>
>>> > >
>>> >
>>> >
>>>
>>>
>>> --
>>> http://twitter.com/tjake
>>>
>>>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: SSTable generation numbers

2016-06-28 Thread Tyler Hobbs
32 bit integer overflow is the only scenario where a single node would wrap
around.

However, when copying sstables from one node to another, there can easily
be conflicts, so this is something to be careful about.

On Mon, Jun 27, 2016 at 8:10 PM, Rajath Subramanyam <rajat...@gmail.com>
wrote:

> Hello Cassandra-dev,
>
> Are there any scenarios in which the generation numbers of SSTables (i.e.
> ksname-cfname--Data.db) can wrap around, without the admin
> dropping and re-creating the CF with the same name ?
>
> I believe that the answer must be version-agnostic, but in case it matters,
> I am specifically asking this question for C* 2.0/2.1.
>
> Thanks in advance for your help.
>
> - Rajath
> --------
> Rajath Subramanyam
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 3.6

2016-05-12 Thread Tyler Hobbs
Based on CASSANDRA-11613 (and CASSANDRA-11760), I'm changing my vote to a
(non-binding) -1.  There is a legit regression in upgrading non-frozen UDTs
that needs to be fixed before releasing 3.6.

On Thu, May 12, 2016 at 12:44 PM, Philip Thompson <
philip.thomp...@datastax.com> wrote:

> I've updated the TE report with the results of the upgrade testing we did.
> We experienced a higher than expected number of test failures, which
> prompted the filing of:
>
> CASSANDRA-11760
> CASSANDRA-11763
> CASSANDRA-11765
> CASSANDRA-11767
>
> Two errors were related to handling the legacy hint format after upgrades
> to 3.6 from either the 2.1 or 2.2 series. This should not affect upgrades
> from 3.0.x, or new 3.6 clusters. 11760 is an issue handling UDTs in
> mixed-versions 2.2.5 / 3.6 clusters.
>
> These four bugs were accompanied by other existing, known upgrade failures.
> We do not suspect that these four issues are 3.6 regressions, as we have
> tested this release with a large number of new upgrade tests, that were not
> run on previous tick-tock releases. I am re-running these tests against
> 3.4, to confirm that suspicion.
>
> On Tue, May 10, 2016 at 9:54 PM, Jake Luciani <j...@apache.org> wrote:
>
> > I propose the following artifacts for release as 3.6.
> >
> > sha1: c17cbe1875a974a00822ffbfad716abde363c8da
> > Git:
> >
> >
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.6-tentative
> > Artifacts:
> >
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1112/org/apache/cassandra/apache-cassandra/3.6/
> > Staging repository:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1112/
> >
> > The artifacts as well as the debian package are also available here:
> > http://people.apache.org/~jake
> >
> > The vote will be open for 72 hours (longer if needed).
> >
> > [1]: http://goo.gl/Yv15Qz (CHANGES.txt)
> > [2]: http://goo.gl/VyR9EG (NEWS.txt)
> > [3]: https://goo.gl/raz8ok (DataStax QA Report)
> >
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: Using writetime in CAS Lightweight transactions

2016-05-11 Thread Tyler Hobbs
On Wed, May 11, 2016 at 10:22 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> It is not (yet) possible to use functions in LWT predicates. LWT only
> supports = and != plus IF (NOT) EXISTS right now
>

You're correct about functions not being supported, but we do actually
support >, >=, <, <=, and IN operators (see
https://issues.apache.org/jira/browse/CASSANDRA-6839).

-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 3.0.6

2016-05-11 Thread Tyler Hobbs
+1

On Tue, May 10, 2016 at 8:54 PM, Jake Luciani <j...@apache.org> wrote:

> I propose the following artifacts for release as 3.0.6.
>
> sha1: 52447873a361647a5e80c547adea9cf5ee85254a
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.6-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1110/org/apache/cassandra/apache-cassandra/3.0.6/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1110/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/IiNyVb (CHANGES.txt)
> [2]: http://goo.gl/ZAr03L (NEWS.txt)
> [3]: https://goo.gl/2jPtss (DataStax QA Report)
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: [VOTE] Release Apache Cassandra 3.6

2016-05-11 Thread Tyler Hobbs
+1

On Tue, May 10, 2016 at 8:54 PM, Jake Luciani <j...@apache.org> wrote:

> I propose the following artifacts for release as 3.6.
>
> sha1: c17cbe1875a974a00822ffbfad716abde363c8da
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.6-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1112/org/apache/cassandra/apache-cassandra/3.6/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1112/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/Yv15Qz (CHANGES.txt)
> [2]: http://goo.gl/VyR9EG (NEWS.txt)
> [3]: https://goo.gl/raz8ok (DataStax QA Report)
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: Statistics.db file in Cassandra 3.0

2016-02-10 Thread Tyler Hobbs
Take a look at MetadataSerializer and the MetadataComponent subclasses.

On Tue, Feb 9, 2016 at 4:34 PM, Rajath Subramanyam <rajat...@gmail.com>
wrote:

> Hello Cassandra-Dev,
>
> I have noticed that in Cassandra 3.0 there is a new file in the
> // called
> ma-<gen#>-big-Statistics.db.
>
> What does this file contain ? Is it compressed ? How do I read it ?
>
> Thanks in advance for sharing some information on this.
>
> - Rajath
> --------
> Rajath Subramanyam
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: SSTable format in C* 2.2 and later

2016-01-19 Thread Tyler Hobbs
Primarily, CASSANDRA-8099.  If you look at the Version class in
o.a.c.io.sstable.format.big.BigFormat, there are comments that list the
different sstable versions along with what changes went into those.  You
can look at git blame to see what the related jira tickets are.

On Mon, Jan 18, 2016 at 7:48 PM, Rajath Subramanyam <rajat...@gmail.com>
wrote:

> Hello Cassandra-dev community,
>
> Does anyone know the JIRAs that affected the change in the SSTable format
> for C* 2.2 and later ?
>
> Thanks in advance.
>
> - Rajath
> ----
> Rajath Subramanyam
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: Repair when a replica is Down

2016-01-19 Thread Tyler Hobbs
On Fri, Jan 15, 2016 at 12:06 PM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

> Increase the gc grace period temporarily. Then we should have capacity
> planning to accomodate the extra storage needed for extra gc grace that may
> be needed in case of node failure scenarios.


I would do this.  Nodes that are down for longer than gc_grace_seconds
should not re-enter the cluster, because they may contain data that has
been deleted and the tombstone has already been purged (repairing doesn't
change this).  Bringing them back up will result in "zombie" data.

Also, I do think that the user mailing list is a better place for the first
round of this conversation.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: Repair when a replica is Down

2016-01-19 Thread Tyler Hobbs
On Tue, Jan 19, 2016 at 10:44 AM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

>
> Consider a scenario where I have a 20 node clsuter, RF=5, Read/Write
> Quorum, gc grace period=20. My cluster is fault tolerant and it can afford
> 2 node failure. Suddenly, one node goes down due to some hardware issue.
> Its 10 days since my node is down, none of the 19 nodes are being repaired
> and now its decision time. I am not sure how soon issue would be fixed may
> be 8 days before gc grace, so I shouldnt remove node early and add node
> back as it would cause unnecessary streaming. At the same time, if I dont
> remove the failed node, my entire system health would be in question and it
> would be a panic situation as no data got repaired in last 10 days and gc
> grace is approaching. I need sufficient time to repair 19 nodes.
>
> What looked like a fault tolerant system which can afford 2 node failure,
> required urgent attention and manual decision making when a single node
> went down. Why cant we just go ahead and repair remaining replicas if some
> replicas are down? If failed node comes up before gc grace period, we would
> run repair to fix inconsistencies and otheriwse we would discard data and
> bootstrap. I think that would be a really robust fault tolerant system.
>

That makes sense.  It seems like having the option to ignore down replicas
during repair could be at least somewhat helpful, although it may be tricky
to decide how this should interact with incremental repairs.  If there
isn't a jira ticket for this already, can you open one with the scenario
above?


-- 
Tyler Hobbs
DataStax <http://datastax.com/>


Re: Reconciling expiring cells and tombstones

2015-06-17 Thread Tyler Hobbs

 Why does Cassandra consistently prefer tombstones to other kinds of cells?


It's primarily to have deterministic conflict resolution.  I don't recall
any specific conversations about preferring tombstones expiring cells, and
the original ticket (https://issues.apache.org/jira/browse/CASSANDRA-699)
doesn't mention anything.

By modifying this behavior in this particular case, do we risk hitting
 bizarre corner cases?


I think this would be safe.  The only problem I can think of would happen
while your cluster has some patched nodes and some unpatched nodes.  If you
had any nodes that had both an expiring cell and a tombstone with the same
timestamp, then patched replicas would return different results/digests
than the unpatched nodes.  However, unless you're mixing TTLs with deletes,
that's not too likely to happen.  Maybe repair combined with clock skew
could result in that, but not much else.


On Wed, Jun 17, 2015 at 10:05 AM, Josef Lindman Hörnlund jo...@appdata.biz
wrote:


 Hello Sam,

 This is not answering your direct question but if you worry about clock
 skew take a look at this great two-part blogpost:


 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
 
 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
 

 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/
 
 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/
 


 Josef Lindman Hörnlund
 Chief Data Scientist
 AppData
 jo...@appdata.biz




  On 16 Jun 2015, at 20:45, Sam Klock skl...@akamai.com wrote:
 
  Hi folks,
 
  I have a question about a design choice on how expiring cells are
  reconciled with tombstones.  For two cells with the same timestamp, if
  one is expiring and one is a tombstone, Cassandra *always* prefers the
  tombstone.  This matches its behavior for normal/non-expiring cells, but
  the folks in my organization worry about what it may imply for nodes
  experiencing clock skew.  Specifically, we're concerned about scenarios
  like the following:
 
  1) An expiring cell is committed via some node with a non-skewed clock.
  2) Another replica for that cell experiences forward clock skew and
  decides that the cell is expired.  It eventually runs a compaction that
  converts the cell to a tombstone.
  3) The tombstone propagates to other nodes via, e.g., node repair.
  4) The other nodes all eventually run their own compactions.  Because of
  the reconciliation logic, the expiring cell is purged on all of the
  replicas, leaving behind only the tombstone.
 
  If the cell should have still been live at (4), the reconciliation logic
  will result in it being prematurely purged.  We have confirmed this
  behavior experimentally.
 
  My organization may be more concerned about clock skew than the larger
  community, so I don't think we're inclined to propose a patch at this
  time.  But to account for this kind of scenario we would like to patch
  our internal version of Cassandra to conditionally prefer expiring cells
  to tombstones if the node believes they should still be live; i.e., in
  reconcile() in *ExpiringCell.java, instead of:
 
 if (cell instanceof DeletedCell)
 return cell;
 
  use:
 
 if (cell instanceof DeletedCell)
 return isLive() ? this : cell;
 
  Before we do so, however, we'd like to understand the rationale for the
  existing behavior and the risks of making changes to it.  Why does
  Cassandra consistently prefer tombstones to other kinds of cells?  By
  modifying this behavior in this particular case, do we risk hitting
  bizarre corner cases?
 
  Thanks,
  SK




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Parsing SSTables containing CQL values

2015-05-26 Thread Tyler Hobbs
I would start by looking at sstable2json.  It may be simplest for you to
run sstable2json and then process the resulting json.  If that's not
adequate, modifying the sstable2json code is probably your best bet.

On Mon, May 25, 2015 at 11:12 AM, Malcolm Matalka malc...@spotify.com
wrote:

 Hello,

 For efficiency reasons I am trying to parse the raw SSTable files in
 order to transform them into another format.  I understand this is
 like poking a sleeping beast and there aren't many guarantees around
 this but I'm asking if anyone has any pointers to make this possible?
 In a search I have stumbled upon FullContact's SSTable parser, but it
 does not parse the complicated data structures that CQL supports.  In
 attempting to reverse engineer how Cassandra handles the actual data
 there are a few cases that are unclear and I'm concerned that my
 attempts to interpret them will result in a fragile result.

 Are there any suggestions?  Existing libraries?  Tips on how Cassandra
 parses the data itself?  Pointers into the code to read?  SSTable
 design doc?

 Thanks,
 /Malcolm




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Parsing SSTables containing CQL values

2015-05-26 Thread Tyler Hobbs
Trying to parse and export an sstable at a higher, CQL level with the
current codebase is going to be pretty tough.  Handling static columns,
collections (multi-cell columns), and the four minor variants of sstable
formats (sparse vs dense, composite vs simple) is not easy.  If you want to
handle things at a CQL level, you should probably go through the normal
read path.

With that said, CASSANDRA-8099 will substantially change the format of
sstables to more closely match CQL, making this more feasible.

On Tue, May 26, 2015 at 8:49 AM, Malcolm Matalka malc...@spotify.com
wrote:

 Thanks Tyler,

 The problem with sstable2json is that it does not support the CQL
 types as far as I can see and there isn't any indication as to modify
 it to do that.  It seems like the CQL things are a layer above the
 SSTable.

 2015-05-26 15:44 GMT+02:00 Tyler Hobbs ty...@datastax.com:
  I would start by looking at sstable2json.  It may be simplest for you to
  run sstable2json and then process the resulting json.  If that's not
  adequate, modifying the sstable2json code is probably your best bet.
 
  On Mon, May 25, 2015 at 11:12 AM, Malcolm Matalka malc...@spotify.com
  wrote:
 
  Hello,
 
  For efficiency reasons I am trying to parse the raw SSTable files in
  order to transform them into another format.  I understand this is
  like poking a sleeping beast and there aren't many guarantees around
  this but I'm asking if anyone has any pointers to make this possible?
  In a search I have stumbled upon FullContact's SSTable parser, but it
  does not parse the complicated data structures that CQL supports.  In
  attempting to reverse engineer how Cassandra handles the actual data
  there are a few cases that are unclear and I'm concerned that my
  attempts to interpret them will result in a fragile result.
 
  Are there any suggestions?  Existing libraries?  Tips on how Cassandra
  parses the data itself?  Pointers into the code to read?  SSTable
  design doc?
 
  Thanks,
  /Malcolm
 
 
 
 
  --
  Tyler Hobbs
  DataStax http://datastax.com/




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: OpsCenter with client to node encryption

2015-05-26 Thread Tyler Hobbs
Hi Jan,

The dev mailing list is for the development of Cassandra only.  The most
appropriate place for a question about OpsCenter is probably a
StackOverflow post tagged with datastax-opscenter.

On Tue, May 26, 2015 at 5:59 AM, Jan Kesten j...@dg6obo.de wrote:

 Hi all,

 I am trying to setup internode and client encryption on cassandra. I set
 up a small ca, generated the certificates, distributed them and configured
 the nodes to use them.

 Internode encryption worked straight forward, cqlsh after I added --ssl.

 But I am not able to setup OpsCenter (running 5.1.1). Two issues:

 1. I added the ca file path, for me /etc/opscenter/cassandra_ca.pem, as
 asked. I cant save the cluster until I add a keystore even if I did not set
 a mark for client verification - also I cant find any documentation which
 keystore is meant here. Since OpsCenter is python these are obviously not
 the jks keystores from cassandra.

 I guess that it is meant in that way, the individual nodes present thier
 certificate to opscenter which would verify it against the ca-store.

 2. Trying to connect gives me an error in opscenterd.log:

 2015-05-26 10:34:27+ []  INFO: Using SSL when checking thrift
 connection: /etc/opscenter/cassandra_ca.pem, client_pem=None,
 client_key=None,
 validate=True
 2015-05-26 10:34:27+ []  INFO: Starting factory
 opscenterd.ThriftService.NoReconnectCassandraClientFactory instance at
 0x7fa490ff97a0
 2015-05-26 10:34:27+ [] Unhandled Error
 Traceback (most recent call last):
   File
 /usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/python/log.py, line
 84, in callWithLogger
 return callWithContext({system: lp}, func, *args, **kw)
   File
 /usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/python/log.py, line
 69, in callWithContext
 return context.call({ILogContext: newCtx}, func, *args, **kw)
   File
 /usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/python/context.py,
 line 59, in callWithContext
 return self.currentContext().callWithContext(ctx, func, *args,
 **kw)
   File
 /usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/python/context.py,
 line 37, in callWithContext
 return func(*args,**kw)
 --- exception caught here ---
   File
 /usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/epollreactor.py,
 line 220, in _doReadOrWrite
 why = selectable.doWrite()
   File
 /usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/tcp.py,
 line 664, in doConnect
 self._connectDone()
   File
 /usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/ssl.py,
 line 160, in _connectDone
 self.startTLS(self.ctxFactory)
   File
 /usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/tcp.py,
 line 561, in startTLS
 if Connection.startTLS(self, ctx, client):
   File
 /usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/tcp.py,
 line 402, in startTLS
 self.socket = SSL.Connection(ctx.getContext(), self.socket)
   File /usr/lib/python2.7/dist-packages/opscenterd/SslUtils.py,
 line 54, in getContext

   File /usr/lib/python2.7/dist-packages/OpenSSL/SSL.py, line
 303, in load_verify_locations
 raise TypeError(cafile must be None or a byte string)
 exceptions.TypeError: cafile must be None or a byte string

 2015-05-26 10:34:27+ []  INFO: twisted.internet.ssl.Connector
 instance at 0x7fa490ff9a70 will retry in 2 seconds
 2015-05-26 10:34:27+ []  INFO: Unhandled error in Deferred:
 2015-05-26 10:34:27+ [] Unhandled Error
 Traceback (most recent call last):
 Failure: twisted.internet.error.ConnectError: An error occurred
 while connecting: [Failure instance: Traceback (failure with no frames):
 type 'exceptions.TypeError': cafile must be None or a byte string
 ].

 Any hints about this?

 Thanks in advance,
 Jan




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: cqlsh client side filtering

2015-05-07 Thread Tyler Hobbs
On Thu, May 7, 2015 at 10:42 AM, Jens Rantil jens.ran...@tink.se wrote:


 Are there any plans (or JIRA issue) for adding client-side filtering to
 cqlsh? It would hugely improve our experiences with it when debugging etc.
 I wouldn't be against adding some kind of auto LIMIT or warning when using
 it as I understand users could use it as an anti-pattern, too.


There are general plans to increase the types of filtering that Cassandra
can do server-side, but CASSANDRA-8099 is necessary for a lot of that work.

We prefer not to support things in cqlsh that can't be done through normal
cql queries (outside of basic admin-type operations).  What sort of API are
you envisioning?


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: March 2015 QA retrospective

2015-04-09 Thread Tyler Hobbs
 User
 visible metric not accurate, but only in one config. Possible to guess
 correct FP ratio and validate while exploring config space?  CASSANDRA-8532
 https://issues.apache.org/jira/browse/CASSANDRA-8532 Marcus Eriksson Fix
 calculation of expected write size during compaction Did this manifest as a
 user visible issue, could we have tested for that?  CASSANDRA-8537
 https://issues.apache.org/jira/browse/CASSANDRA-8537 Marcus Eriksson
 ConcurrentModificationException
 while executing 'nodetool cleanup' Nodetool cleanup not tested before
 release  CASSANDRA-8562
 https://issues.apache.org/jira/browse/CASSANDRA-8562 Marcus Eriksson Fix
 checking available disk space before compaction starts Is there a user
 visible negative impact, could it have been tested for?  CASSANDRA-8580
 https://issues.apache.org/jira/browse/CASSANDRA-8580 Marcus Eriksson
 AssertionErrors
 after activating unchecked_tombstone_compaction with leveled compaction How
 could this have been reproduced before release? No regression test
 CASSANDRA-8623 https://issues.apache.org/jira/browse/CASSANDRA-8623
 Marcus
 Eriksson sstablesplit fails *randomly* with Data component is missing
 Feature
 not tested before release? No regression test  CASSANDRA-8635
 https://issues.apache.org/jira/browse/CASSANDRA-8635 Marcus Eriksson
 STCS
 cold sstable omission does not handle overwrites without reads If this
 workload is a challenge for certain kinds of optimizations we should test
 it if we think it could happen again.  CASSANDRA-7538
 https://issues.apache.org/jira/browse/CASSANDRA-7538 Sam Tunnicliffe
 Truncate
 of a CF should also delete Paxos CF Truncate not tested with PAXOS, what
 else?  CASSANDRA-8280
 https://issues.apache.org/jira/browse/CASSANDRA-8280 Sam
 Tunnicliffe Cassandra crashing on inserting data over 64K into indexed
 strings Added tests are good example, could focusing on testing all access
 paths and boundary conditions per access path have prevented this
 CASSANDRA-8370 https://issues.apache.org/jira/browse/CASSANDRA-8370 Sam
 Tunnicliffe cqlsh doesn't handle LIST statements correctly cqlsh untested
 functionality, no regression test?  CASSANDRA-7801
 https://issues.apache.org/jira/browse/CASSANDRA-7801 Sylvain Lebresne A
 successful INSERT with CAS does not always store data in the DB after a
 DELETE Multiple access paths for data not tested together  CASSANDRA-8558
 https://issues.apache.org/jira/browse/CASSANDRA-8558 Sylvain Lebresne
 deleted
 row still can be selected out Validate that deleted data stays deleted
 under * conditions (big matrix of interactions here with different
 configurations, streaming, repair, cleanup, scrub). Deleted data coming
 back shows up a lot.  CASSANDRA-8332
 https://issues.apache.org/jira/browse/CASSANDRA-8332 T Jake Luciani Null
 pointer after droping keyspace Add/drop keyspace not tested under load,
 with server logs checked for errors  CASSANDRA-7910
 https://issues.apache.org/jira/browse/CASSANDRA-7910 Tyler Hobbs
 wildcard
 prepared statements are incorrect after a column is added to the table
 Alter
 table not tested concurrently with ?  CASSANDRA-8264
 https://issues.apache.org/jira/browse/CASSANDRA-8264 Tyler Hobbs
 Problems
 with multicolumn relations and COMPACT STORAGE How can we catch
 interactions like compact storage not being covered by the test
 CASSANDRA-8286 https://issues.apache.org/jira/browse/CASSANDRA-8286
 Tyler
 Hobbs Regression in ORDER BY There were tests that failed in some versions,
 but not all? Did this not ship?  CASSANDRA-8288
 https://issues.apache.org/jira/browse/CASSANDRA-8288 Tyler Hobbs cqlsh
 describe needs to show 'sstable_compression': '' Roundtrip test for
 describe schema?  CASSANDRA-8302
 https://issues.apache.org/jira/browse/CASSANDRA-8302 Tyler Hobbs
 Filtering
 for CONTAINS (KEY) on frozen collection clustering columns within a
 partition does not work More untested combinations, could we have spotted
 that there was an interaction and tested it? Or did this not ship?
 CASSANDRA-8408 https://issues.apache.org/jira/browse/CASSANDRA-8408
 Tyler
 Hobbs limit appears to replace page size under certain conditions No test
 that validates that paging returns the expected number of results? Another
 of the genre of queries we support but don't test all the combinations
 CASSANDRA-8410 https://issues.apache.org/jira/browse/CASSANDRA-8410
 Tyler
 Hobbs Select with many IN values on clustering columns can result in a
 StackOverflowError Another missing boundary conditions test, test maximum
 size in clause against *  CASSANDRA-8451
 https://issues.apache.org/jira/browse/CASSANDRA-8451 Tyler Hobbs NPE
 when
 writetime() or ttl() are nested inside function call Is this testable? Can
 we check that functions compose correctly or validate that they are
 inherently composable. No regression test.  CASSANDRA-8490
 https://issues.apache.org/jira/browse/CASSANDRA-8490 Tyler Hobbs
 DISTINCT
 queries with LIMITs or paging are incorrect when partitions are
 deleted

Re: [discuss] Modernization of Cassandra build system

2015-03-31 Thread Tyler Hobbs
Hi Łukasz,

I'm not very familiar with the build system, but I'll try to respond.

The Serializer dependencies on org.apache.cassandra.transport are almost
certainly uses of Server.CURRENT_VERSION and Server.VERSION_3.  These are
constants that represent the native protocol version in use, which affects
how certain types are serialized.  These constants could easily be moved.

The o.a.c.marshal dependency in MapSerializer is on AbstractType, but could
easily be replaced with java.util.Comparator.

In any case, I'm not necessarily opposed to improving the build system to
make these errors more apparent.  Would your proposal still allow us to
build with ant (and just change the way those artifacts are built)?

On Tue, Mar 24, 2015 at 7:58 PM, Łukasz Dywicki l...@code-house.org wrote:

 Dear cassandra commiters and development process followers,
 I would like to bring an important topic off build process of cassandra. I
 am an external user from community point of view, however I been walking
 around various  projects close to cassandra over past year or even more.
 What is worrying me a lot is how cassandra is publishing artifacts and how
 many problems are reported due that.

 First of all - I want to note that I am not born enemy of Ant itself. I
 never used it. I am also aware of problems with custom builds made with
 Maven, however I don’t really want to discuss any particular replacement,
 yet I want to note that Cassandra JIRA project contains about 116 issues
 related somehow to maven (http://bit.ly/1GRoXl5 http://bit.ly/1GRoXl5,
 project=CASSANDRA, text ~ maven). Depends on the point of view it might be
 a lot or a little. By simple statistics it is around 21 issues a year or
 almost 2 issues a month, many of them breaking maintanance/major releases
 from user point of view. From other hand it’s not bad considering how
 project is being built.

 Current structure has a very big disadvantage - ONE source root for
 multiple artifacts published in maven repositories and copying classes to
 jar AFTER they are compiled. Obviously ant copy task doesn’t follow import
 statements and does not include dependant classes. For example just by
 making test relocations and extraction of clientutil jar on master branch
 into separate source root I have found a bug where ListSerializer depends
 on org.apache.cassandra.transpor package. More over clientutil
 (MapSerializer) does depends on org.apache.cassandra.db.marshal package
 leading to the fact that it can not be used without cassandra-all present
 at classpath.
 Luckily for cassandra CQL as a new interface reduces thrift and clientutil
 usage reducing amount of issues reported around these, however this just
 hides a real problem in previous paragraph. I have found a handy tool and
 made a graph of circular dependencies in cassandra-all.jar. Graph of
 results can found here: http://grab.by/FRnO http://grab.by/FRnO. As you
 can see this graph has multiple levels and solving it is not a simple task.
 I am afraid a current way of building and packaging cassandra can create
 huge hiccups when it will come to code rafactorings cause entire cassandra
 will become a house of cards.
 Restructuring project into smaller pieces is also beneficiary for
 community since solving bugs in smaller units is definitelly easier.

 At the end of this mail I would like to propose moving Cassandra build
 system forward, regardless of tool which will be choosen for it. Personally
 I can volunteer in maven related changes to extract cassandra-thrift,
 cassandra-clientutil and cassandra-all to make regular maven build. It
 might be seen as a switch from one big XML into couple smaller. :-) All
 this depends on Cassandra developers decission to devide source roots or
 not.

 Kind regards,
 Łukasz Dywicki
 —
 l...@code-house.org
 Twitter: ldywicki
 Blog: http://dywicki.pl
 Code-House - http://code-house.org




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Cassandra-dtest versioning proposal

2015-01-09 Thread Tyler Hobbs
On Thu, Jan 8, 2015 at 2:23 PM, Philip Thompson 
philip.thomp...@datastax.com wrote:

 I expect the benefits to grow as we make more radical changes
 to cassandra-dtest for cassandra 3.0.


What kinds of changes are you planning?  Perhaps we can come up with good
alternatives.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: [jira] [Comment Edited] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration

2014-12-30 Thread Tyler Hobbs
On Tue, Dec 30, 2014 at 1:37 PM, Donald Smith (JIRA) j...@apache.org
wrote:


 What's odd is that the cassandra process continues running despite the
 OutOfMemory exception.  You'd expect it to exit.

 Prior to getting OutOfMemory, I notice that such nodes are slow in
 responding to commands and queries (e.g., jmx).


OOMs are handled in a better (more consistent) way with:
https://issues.apache.org/jira/browse/CASSANDRA-7507.  That ticket may
explain a few things for you.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: incrementally acquiring tokens

2014-12-16 Thread Tyler Hobbs
On Sat, Dec 13, 2014 at 11:33 AM, Jonathan Haddad j...@jonhaddad.com wrote:


 As a potential approach, would it be possible for a node to incrementally
 acquire tokens, and as a result incrementally stream?  You could have a
 node serving requests after acquiring 1 token, and it would gradually take
 ownership of more and more of the ring as it bootstraps.


Yes, it's definitely theoretically possible.  The main hurdle is adding
support for bootstrapping a single token to gossip.  Gossips is
unfortunately fairly inflexible, so this would probably be a little tricky,
and would mostly likely have to target 3.1.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: JIRA Label for Client-Impacting Changes/Decisions

2014-12-16 Thread Tyler Hobbs
I'm personally in favor of using a label.  Besides myself, Sylvain,
Benjamin, and Aleksey are probably the most likely to be keeping track of
this.  Any objections or alternatives from you guys?

On Fri, Dec 12, 2014 at 5:41 PM, Adam Holmberg adam.holmb...@datastax.com
wrote:

 As a Cassandra driver developer, I'm looking for a good way to keep track
 of client-impacting changes and decisions in Cassandra. I'm aware of super
 tasks like CASSANDRA-8043
 https://issues.apache.org/jira/browse/CASSANDRA-8043 (native v4), and
 others (schema migration
 https://issues.apache.org/jira/browse/CASSANDRA-6038, schema
 modernization
 https://issues.apache.org/jira/browse/CASSANDRA-6717) via ad hoc
 communication.

 Beyond feature changes, there is a class of decisions that might not change
 existing functionality, but imposes certain limitations on clients using
 new features. As an example, this week I learned of a decision to serialize
 collections in v3 encoding, regardless of protocol:
 https://issues.apache.org/jira/browse/CASSANDRA-8438

 Presently, there is no aggregate view of new features, changes, and
 decisions on issues that might impact client integration.

 In the discussion on CASSANDRA-8438 it was suggested that we might use
 labels to tag issues with implications to client integrators, so I wanted
 to float the idea here -- would maintainers be amenable to labeling
 client-impacting issues in JIRA?


 Adam




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: unable to get saved data after successfully calling batch_mutate

2014-12-16 Thread Tyler Hobbs
This user mailing list (u...@cassandra.apache.org) is a better place for
this.  The dev mailing list is for committers and others working on
Cassandra itself.

On Fri, Dec 12, 2014 at 5:02 PM, cong.ling cong.l...@happyelements.com
wrote:

 Hi,
 I met an issue that cassendra can't save some data, but can save others.
 I call batch_mutate to save user's (uid 51122) data, it returns success.
 But when i get it, it returns empty
 But other user (uid 22) , for the same operation, it can return data.
 It's so weird. could you help me for my problem?


 This issue happened after we restore our data to the cluster. But we
 forget to restore statistic.db at that time. Is it the main reason?


 Could you help me on this issue?
 Thx


 Regards,
 Cong Ling




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: JIRA Label for Client-Impacting Changes/Decisions

2014-12-16 Thread Tyler Hobbs
+1 on starting a new label.  Feel free to start labelling.

Other tickets that should be labelled:
* https://issues.apache.org/jira/browse/CASSANDRA-7536
* https://issues.apache.org/jira/browse/CASSANDRA-7523

Note: I also created https://issues.apache.org/jira/browse/CASSANDRA-8495
to document type serialization formats in the native protocol spec.

On Tue, Dec 16, 2014 at 2:44 PM, Adam Holmberg adam.holmb...@datastax.com
wrote:

 I think the issue that brought this up was more about collection
 serialization. Since collections existed before v3, it was surprising to
 see outer collections serialized with one format and inner with another.

 I'm not questioning that decision, nor do I want to operate in
 'undocumented territory'. What I do want to do is find a good way to know
 when these assumptions are made. I was hoping a label would help in being
 applied across different categories of decisions that impact clients.

 I didn't hear any strong opposition to trying the label. Is anyone allowed
 to create them? If so, I'll start applying it to the things I know about.
 Are there other issues that haven't been mentioned previously in this
 thread?

 Thanks,
 Adam

 On Tue, Dec 16, 2014 at 12:56 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:

  We already try to use labels (though we definitively haven't always done
 it
  in the past): all protocolv4 stuffs should have a protocolv4 label, and
 I'm
  all for continuing to stick to it. I'm fine having an additional driver
  impacting tag for stuff that are likely to need special driver handling
  (as it's probably not limited to protocol stuffs). And I'm not strongly
  against a section in the NEWS file, though we've already have a changelog
  in the spec itself and it kinds of feels like a better place.
 
  But honestly, regarding the issue that raised this, it's not so much that
  we made a decision to do something new for the protocol v2: UDT are just
  not natively supported by the protocol v2 (for the simple reason that
 they
  didn't existed when the v2 was done). As far as v2 is concerned, UDT are
 an
  opaque custom type. If you want to support UDT in your client, you
 should
  officially support v3. You're free to try to support UDT nonetheless in
 v2,
  and some drivers do, but you're in undocumented territory.
 
 
  On Tue, Dec 16, 2014 at 6:20 PM, Aleksey Yeschenko alek...@apache.org
  wrote:
 
   A label works for me.
  
   But we also need a separate .txt file, or a section in NEWS.txt, for
  those
   who can’t, or don’t want to follow the JIRA. Can’t realistically expect
   people to do that.
  
   --
   AY
  
   On December 16, 2014 at 8:15:43 PM, Tyler Hobbs (ty...@datastax.com)
   wrote:
  
   I'm personally in favor of using a label. Besides myself, Sylvain,
   Benjamin, and Aleksey are probably the most likely to be keeping track
 of
   this. Any objections or alternatives from you guys?
  
   On Fri, Dec 12, 2014 at 5:41 PM, Adam Holmberg 
  adam.holmb...@datastax.com
   
   wrote:
  
As a Cassandra driver developer, I'm looking for a good way to keep
  track
of client-impacting changes and decisions in Cassandra. I'm aware of
   super
tasks like CASSANDRA-8043
https://issues.apache.org/jira/browse/CASSANDRA-8043 (native v4),
  and
others (schema migration
https://issues.apache.org/jira/browse/CASSANDRA-6038, schema
modernization
https://issues.apache.org/jira/browse/CASSANDRA-6717) via ad hoc
communication.
   
Beyond feature changes, there is a class of decisions that might not
   change
existing functionality, but imposes certain limitations on clients
  using
new features. As an example, this week I learned of a decision to
   serialize
collections in v3 encoding, regardless of protocol:
https://issues.apache.org/jira/browse/CASSANDRA-8438
   
Presently, there is no aggregate view of new features, changes, and
decisions on issues that might impact client integration.
   
In the discussion on CASSANDRA-8438 it was suggested that we might
 use
labels to tag issues with implications to client integrators, so I
  wanted
to float the idea here -- would maintainers be amenable to labeling
client-impacting issues in JIRA?
   
   
Adam
   
  
  
  
   --
   Tyler Hobbs
   DataStax http://datastax.com/
  
 



-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Reads requiring response from both servers.

2014-11-18 Thread Tyler Hobbs
On Tue, Nov 18, 2014 at 4:46 AM, Jacob Rhoden jacob.rho...@me.com wrote:

 I was going to report a bug, but first figured I should check its not
 “working as expected”. Is it just me or is it wrong that the following
 query needs to talk to both nodes to build a response to this query?


It seems to be working as expected to me, unless I'm missing something.




 Why is this a bug? It seems that this behaviour of needing a response to
 both nodes only exists if you don’t query with a clustering key, or a key
 when RF=2. However you can change this behaviour, by, for example, changing
 the table from “primary key (uuid)” to “primary key ((a), uuid)” where the
 value of a always equals “a” ( so you can query 'where a=“a”’), at which
 point, cassandra decides it only needs results from one node.


Can you clarify what you mean?  It sounds like you're saying if I specify
a partition key, it only needs to query one node, which is also expected
behavior (assuming a consistency level of ONE).

By the way, this type of question is better suited for the user mailing
list than the dev mailing list.

-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Test coverage

2014-09-25 Thread Tyler Hobbs
I've updated the wiki page.  Let me know if that answers your questions.
Thanks!

On Thu, Sep 25, 2014 at 2:12 AM, Heiko Braun ike.br...@googlemail.com
wrote:



 Hi everybody,

 what's the proper way to execute the tests? The wiki page [1] seems to be
 outdated, test/system doesn't exist anymore in trunk. Do we simply use
 'ant test'? Does it cover everything needed?

 [1] http://wiki.apache.org/cassandra/HowToContribute

 Regards, Heiko




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Cluster config .

2014-09-18 Thread Tyler Hobbs
 of authorized representative of
 HCL is strictly prohibited. If you have received this email in error
 please delete it and notify the sender immediately.
 Before opening any email and/or attachments, please check them for viruses
 and other defects.


 




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: ISSUE: Cassandra Custom Secondary Index reading CQLType collection values

2014-08-27 Thread Tyler Hobbs
Maps (and other collections) are stored in multiple cells, one cell per
item.  To get the map keys, use cell.name().collectionElement() and
deserialize with the type for the keys (UTF8Type.instance).  For values,
use cell.value() and use the value type (also UTF8Type.instance, here).


On Wed, Aug 27, 2014 at 12:54 PM, Arindam Bose arindambos...@gmail.com
wrote:

 Added few more information:

 I need some help in reading CQL3 Collection type values while getting a
 callback to the Custom Secondary Index (cassandra 2.1.0).

 Column Family:

 CREATE TABLE IF NOT EXISTS test1(
 id text,
 mymap maptext,text,
 PRIMARY KEY(id)
 )

 Added value:

 Insert into test(id, mymap) values ('1', {'1':'value1'});

 Then in my custom class I am trying to read mymap Cell.value() and
 deserialize the ByteBuffervalue to get the full map content as below:

 To get the row data based on a rowKey in callback:

 DecoratedKey dkey = StorageService.getPartitioner().decorateKey(rowKey);
 QueryFilter qf =  QueryFilter.getIdentityFilter(dkey,
   baseCfs.metadata.cfName,
   Calendar.getInstance().getTimeInMillis());
 ColumnFamily cf = baseCfs.getColumnFamily(qf);

 for (Cell cell : cf)
 {
 if( cell name is mymap )
 {
 LOGGER.debug(for column mymap);

 MapString, String mymap= MapType.getInstance(UTF8Type.instance,
  UTF8Type.instance).compose(cell.value().duplicate());
 for(String key: mymap.keySet())
 {
 LOGGER.debug(mymapkey [{}]: [{}], key, mymap.get(key));
 }
 }
 }

 But I am not getting the values in the persisted in the Map. It says *Not
 enough bytes to read a map*

 Is there anyone who can help?



 Regards,
 Arindam Bose



 On Wed, Aug 27, 2014 at 11:21 AM, Arindam Bose arindambos...@gmail.com
 wrote:

  Hello,
 
  I need some help in reading CQL3 Collection type values while getting a
  callback to the Custom Secondary Index.
 
  Column Family:
 
  CREATE TABLE IF NOT EXISTS test1(
  id text,
  mymap maptext,text,
  PRIMARY KEY(id)
  )
 
  Added value:
 
  Insert into test(id, mymap) values ('1', {'1':'value1'});
 
  Then in my custom class I am trying to read mymap Cell.value() and
  deserialize the ByteBuffervalue to get the full map content.
 
  But I am not getting the values as persisted in the mymap Column within
  Cassandra.
 
  Is there anyone who can help?
 
 
  Regards,
  Arindam Bose
 




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: ISSUE: Cassandra Custom Secondary Index reading CQLType collection values

2014-08-27 Thread Tyler Hobbs
On Wed, Aug 27, 2014 at 3:28 PM, Arindam Bose arindambos...@gmail.com
wrote:


 Also, the code as below, is returning me a list of cells but there is only
 1 cell matching the name mymap. What is the missing piece in here?


The cell names are composites.  They look like (clustering_column_1,
clustering_column_2, ..., clustering_column_n, mymap, map_key).  To make
that work with your current code, you would do something like see if
cell.name().cql3ColumnName(baseCfs.metadata).toString().equals(mymap) is
true.

However, it would be more efficient to just limit your slice to that
collection.

-- 
Tyler Hobbs
DataStax


Re: how to generate log entries for columns getting deleted due to TTL expiration

2014-08-22 Thread Tyler Hobbs
I would use LazilyCompactedRow (probably in getReduced()).


On Fri, Aug 22, 2014 at 6:10 AM, Gaurav Bhatnagar gbhatna...@gmail.com
wrote:

 Hi,
 I have stored following data structure in cassandra

 RowKey: 119551747098

 = (name=c:per:@batchId, value=ad1, timestamp=1408345109805011,
 ttl=1436489)

 = (name=c:per:@currency, value=USD, timestamp=1408345109805009,
 ttl=1436489)

 = (name=c:per:@decimalValue, value=2, timestamp=1408345109805003,
 ttl=1436489)


 here Rowkey 119551747098 is a numeric number containing serial number of
 data


 These columns get expired when ttl value for that column is reached.

 I what to generate an log entry which contains value of RowKey along when
 column pointed by this Rowkey gets deleted due to ttl expiration.


 Where in code I can put log entries so that Rowkey corresponding to deleted
 columns gets printed in log file?


 Regards,

 Gaurav




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Getting to RC1

2014-05-21 Thread Tyler Hobbs
On Wed, May 21, 2014 at 11:34 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Which of those could someone else help with?


I'll take CASSANDRA-7120 (and CASSANDRA-7267, if needed).


-- 
Tyler Hobbs
DataStax http://datastax.com/


CQL unit tests vs dtests

2014-05-20 Thread Tyler Hobbs
Sylvain and I have been having a discussion about testing CQL in unit tests
vs dtests.  I'd like to hear if there are any other opinions on the topic.

We currently only test CQL queries through dtests.  I'd like to start
adding unit tests that exercise CQL where it makes sense.  To me, dtests
make sense when:
- Multiple nodes are needed
- Nodes need to be shutdown, replaced, etc
- We specifically want end-to-end testing

When we don't need those, I'd like to use unit tests because:
- They're typically quicker to run (especially with an IDE)
- Unit tests tend to be run earlier and more often than dtests
- There are fewer moving parts to break (no ccm or dtest machinery)
- It's easier to use a debugger

But Sylvain makes some good points about keeping all CQL tests in the
dtests:
- All of the related tests are in one place
- Python tends to be more concise and easier to read and write (especially
for tests)
- dtests are always fully end-to-end

I agree that Python can be nicer to work with, but Java hasn't been too bad
in my experience[1].  And we do need end-to-end tests, just not on every
test case.

Does anybody else have an opinion on starting to use unit tests for some
CQL testing vs keeping everything in dtests?

[1]
https://github.com/thobbs/cassandra/blob/CASSANDRA-6875-2.0/test/unit/org/apache/cassandra/cql3/MultiColumnRelationTest.java
-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: NPE in conditional updates w/ collections in 2.0.7

2014-05-16 Thread Tyler Hobbs
Hi Brian,

Thanks for the report.  This looks like
https://issues.apache.org/jira/browse/CASSANDRA-7155, which should be fixed
shortly.


On Thu, May 15, 2014 at 3:23 PM, Brian O'Neill b...@alumni.brown.eduwrote:


 OK ‹ we¹ve got some hyper data modeling going on, taking advantage of all
 the latest toys in CQL 2.  And we ran into some trouble using maps within
 conditional updates.  Specifically, when testing to see if a key exists in
 a
 map (with =null?), we encounter an NPE server-side.  We believe this worked
 in 2.0.4.

 With this schema:
 CREATE TABLE progress (
 key text,
 count int,
 partitions maptext, timestamp,
 primary key (key)
 );

 When executing the following:
 cqlsh:hms UPDATE foo SET count=4962 WHERE key='PA' IF
 partitions['a']=null;

  [applied]
 ---
  False

 cqlsh:hms UPDATE foo SET count=4962 WHERE key='PA';
 cqlsh:hms UPDATE foo SET count=4962 WHERE key='PA' IF
 partitions['a']=null;
 TSocket read 0 bytes

 We see the following NPE server-side:
 ERROR [Native-Transport-Requests:13353] 2014-05-15 15:10:00,154
 QueryMessage.java (line 131) Unexpected error during query
 java.lang.NullPointerException
 at

 org.apache.cassandra.cql3.ColumnCondition$WithVariables.collectionAppliesTo(
 ColumnCondition.java:168)
 at

 org.apache.cassandra.cql3.ColumnCondition$WithVariables.appliesTo(ColumnCond
 ition.java:142)
 at

 org.apache.cassandra.cql3.statements.CQL3CasConditions$ColumnsConditions.app
 liesTo(CQL3CasConditions.java:197)
 at

 org.apache.cassandra.cql3.statements.CQL3CasConditions.appliesTo(CQL3CasCond
 itions.java:108)

 Is there a better way to test for existence of a key?
 Or is this a bug?  (Regardless, we may want to protect against the NPE)
 Or am I missing something entirely?

 -brian

 ---
 Brian O'Neill
 Chief Technology Officer


 Health Market Science
 The Science of Better Results
 2700 Horizon Drive € King of Prussia, PA € 19406
 M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42   €
 healthmarketscience.com


 This information transmitted in this email message is for the intended
 recipient only and may contain confidential and/or privileged material. If
 you received this email in error and are not the intended recipient, or the
 person responsible to deliver it to the intended recipient, please contact
 the sender at the email above and delete this email and any attachments and
 destroy any copies thereof. Any review, retransmission, dissemination,
 copying or other use of, or taking any action in reliance upon, this
 information by persons or entities other than the intended recipient is
 strictly prohibited.






-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Replacing thrift calls in Hadoop input-split calculation with Java driver calls.

2014-04-01 Thread Tyler Hobbs
 maybe there
 is
something I don't know about native protocol yet.
   
So, does anyone know how to do describing the splits and
 describing
   the
local rings using native protocol?
   
Also, cqlsh uses python client, which is talking via thrift protocol
  too.
Does it mean that it will be migrated to native protocol soon as
 well?
   
Comments, pointers, suggestions are much appreciated.
   
Many thanks,
   
Shao-Chuan
   
  
 




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: [VOTE] Release Apache Cassandra 1.2.16

2014-03-25 Thread Tyler Hobbs
-1

I'm seeing a regression from 1.2.15 on secondary index queries (through
Thrift) with a LongType key validator.  Specifically, this test in pycassa
is failing against 1.2.16-tentative:
https://github.com/pycassa/pycassa/blob/master/tests/test_autopacking.py#L793

I haven't looked into the issue deeply yet to see what's going on.


On Mon, Mar 24, 2014 at 12:24 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 I propose the following artifacts for release as 1.2.16.

 sha1: 05fcfa2be4eba2cd6daeee62d943f48c45f42668
 Git:

 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.2.16-tentative
 Artifacts:

 https://repository.apache.org/content/repositories/orgapachecassandra-1008/org/apache/cassandra/apache-cassandra/1.2.16/
 Staging repository:
 https://repository.apache.org/content/repositories/orgapachecassandra-1008/

 The artifacts as well as the debian package are also available here:
 http://people.apache.org/~slebresne/

 The vote will be open for 72 hours (longer if needed).

 [1]: http://goo.gl/Cgiimu (CHANGES.txt)
 [2]: http://goo.gl/gvKkBm (NEWS.txt)




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: [VOTE] Release Apache Cassandra 1.2.16

2014-03-25 Thread Tyler Hobbs
Created https://issues.apache.org/jira/browse/CASSANDRA-6924 to investigate.


On Tue, Mar 25, 2014 at 1:50 PM, Tyler Hobbs ty...@datastax.com wrote:

 After digging a bit, the regression is that data inserted immediately
 after secondary index creation may never get indexed.

 The operation order goes like this:
 1) create CF
 2) create secondary index
 3) insert data
 4) query secondary index

 If I add a short sleep in between steps 2 and 3, the data gets indexed and
 the query is successful.

 If I only add a sleep in between steps 3 and 4, some of the data is never
 indexed and the query will return incomplete results.  This appears to be
 the case even if the sleep is relatively long (30s), which makes me think
 the data may never get indexed.


 On Tue, Mar 25, 2014 at 11:14 AM, Tyler Hobbs ty...@datastax.com wrote:

 -1

 I'm seeing a regression from 1.2.15 on secondary index queries (through
 Thrift) with a LongType key validator.  Specifically, this test in pycassa
 is failing against 1.2.16-tentative:
 https://github.com/pycassa/pycassa/blob/master/tests/test_autopacking.py#L793

 I haven't looked into the issue deeply yet to see what's going on.


 On Mon, Mar 24, 2014 at 12:24 PM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 I propose the following artifacts for release as 1.2.16.

 sha1: 05fcfa2be4eba2cd6daeee62d943f48c45f42668
 Git:

 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.2.16-tentative
 Artifacts:

 https://repository.apache.org/content/repositories/orgapachecassandra-1008/org/apache/cassandra/apache-cassandra/1.2.16/
 Staging repository:

 https://repository.apache.org/content/repositories/orgapachecassandra-1008/

 The artifacts as well as the debian package are also available here:
 http://people.apache.org/~slebresne/

 The vote will be open for 72 hours (longer if needed).

 [1]: http://goo.gl/Cgiimu (CHANGES.txt)
 [2]: http://goo.gl/gvKkBm (NEWS.txt)




 --
 Tyler Hobbs
 DataStax http://datastax.com/




 --
 Tyler Hobbs
 DataStax http://datastax.com/




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: [VOTE] Release Apache Cassandra 2.0.6

2014-03-07 Thread Tyler Hobbs
+1


On Fri, Mar 7, 2014 at 9:52 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 We're starting to have a really big changelog so I propose the following
 artifacts for release as 2.0.6.

 sha1: 656edc529db59f0002a7fb0eed93339071fd3974
 Git:

 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.6-tentative
 Artifacts:

 https://repository.apache.org/content/repositories/orgapachecassandra-1007/org/apache/cassandra/apache-cassandra/2.0.6/
 Staging repository:
 https://repository.apache.org/content/repositories/orgapachecassandra-1007/

 The artifacts as well as the debian package are also available here:
 http://people.apache.org/~slebresne/

 The vote will be open for 72 hours (longer if needed).

 [1]: http://goo.gl/dpGEUu (CHANGES.txt)
 [2]: http://goo.gl/6RPO71 (NEWS.txt)




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: How should clients handle the user defined types in 2.1?

2014-03-04 Thread Tyler Hobbs
On Sat, Mar 1, 2014 at 5:01 AM, Theo Hultberg t...@iconara.net wrote:

 Mikhail, thanks, but I meant the reverse of that. Say the user creates a
 prepared statement where one of the columns is a custom type, how do you
 serialize the arguments to the prepared statement? Do you accept anything
 and let C* complain, or do you make a best effort to shoehorn the object
 the user passed into something that looks like the custom type?


Just to be clear, by custom type, you still mean a user-defined type,
correct?

At least in the python driver, it's treated the same as any other
(parametrized) type.  For each Cassandra type (UTF8Type, Int32Type, etc),
the driver will accept values of one or more types.  If any of the subtypes
don't match this, the driver will raise an exception.

If you're actually talking about custom types and not user-defined types,
I'll explain what the python driver does.  If the typestring (e.g.
org.apache.cassandra.db.marshal.MyType) isn't recognized, the driver will
expect a binary string that it can pass directly to Cassandra for values of
that type.  If the user wants to add driver-level support for it (to enable
converting a python object to a binary string and vice-versa), they can
subclass cassandra.cqltypes.CassandraType and define a serialize() and
deserialize() method.  The only condition is that the python classname must
match the typestring from cassandra, so for
org.apache.cassandra.db.marshal.MyType, the user will create a
MyType(CassandraType) class.

-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Issue with SizeTieredCompactionStrategy in 2.0.3

2013-12-13 Thread Tyler Hobbs
Hi Graham, thanks for reporting this.  I should be able to get it fixed
shortly.

As far as how it's off by default, the default max_cold_reads_ratio is 0.0,
so filterColdSSTables() won't actually filter anything.  I can add a check
to skip that function if maxColdReadsRatio is 0.0 in the 6483 patch.


On Thu, Dec 12, 2013 at 7:00 PM, graham sanderson gra...@vast.com wrote:

 I just created https://issues.apache.org/jira/browse/CASSANDRA-6483 for
 an issue introduced it seems by
 https://issues.apache.org/jira/browse/CASSANDRA-6109

 Note that the latter feature claims to be “off” by default, however it
 isn’t immediately clear to me from the patch how that “off” is implemented,
 and whether it is supposed to go down that code path even when “off

 I’d be interested if anyone can shed some light on that, I’m happy to fix
 the issue (but would love to “turn it off” in the interim as it spams our
 ERROR monitoring - I’m guessing there is no other actual downside, since it
 just fails a subset of compaction runs

 Thanks,

 Graham.




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Named bind variables and one-off queries

2013-11-20 Thread Tyler Hobbs
On Wed, Nov 20, 2013 at 10:13 AM, Mathieu D'Amours math...@damours.orgwrote:


 Since C* expect a list of values for bind variables in queries, in
 non-prepared queries the driver has no direct information about the
 expected order of named variables. Is there a way to reliably predict the
 order of variable values C* will expect?


Your question isn't clear to me.  Are you concerned with prepared
statements or non-prepared statements?  What are you concerned about or
what are you trying to accomplish?


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Data serialization from clients

2013-11-19 Thread Tyler Hobbs
On Tue, Nov 19, 2013 at 12:15 PM, Mathieu D'Amours math...@damours.orgwrote:

 I just want to make sure, the set of bounds values provided along QUERY or
 EXECUTE requests coming from clients should be encoded in accordance with
 serializers found in org.apache.cassandra.serializers.* right?


That's correct.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: CQL 3 Binary protocol spec mismatch

2013-08-15 Thread Tyler Hobbs
Can you also describe the query you're running and paste the actual entire
binary response (from the header to the end)?

By the way, are you writing a new client?


On Wed, Aug 14, 2013 at 6:37 PM, Mosfeq Rashid mosfeq-cassan...@madrose.net
 wrote:


 Thanks for the quick response.

 This is what I am getting:

 row-countvalue-col1-lenvalue-col1value-col2-lenvalue-col2value-col1-lenvalue-col1...

 As you see, there is no metadata before the row-count.

 All the others messages I have tried like create, error, etc. are sending
 response as expected in the spec.

 By I am running Ubuntu 13.04 and the binary distribution.

 --
 Mosfeq

 At Wed, 14 Aug 2013 18:09:39 -0500,
 Tyler Hobbs ty...@datastax.com wrote:
 
  Can you provide more details on exactly what's being returned?  As far
 as I
  know, ResultMessages of type ROWS should always start with metadata,
 and
  I haven't seen a case where it's missing in 1.2.
 
 
  On Wed, Aug 14, 2013 at 6:05 PM, Mosfeq Rashid 
 mosfeq-cassan...@madrose.net
   wrote:
 
  
   The response to the Select query is supposed to have metadata before
 row
   data is provided.  But in case of C* 1.2.5 and 1.2.8, I only get the
 row
   data.  As far as I understand binary protocol, the message should have
 all
   the data to parse it.  Does anyone know what I am missing?
  
   --
   Mosfeq
  
  
 
 
  --
  Tyler Hobbs
  DataStax http://datastax.com/




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: CQL 3 Binary protocol spec mismatch

2013-08-14 Thread Tyler Hobbs
Can you provide more details on exactly what's being returned?  As far as I
know, ResultMessages of type ROWS should always start with metadata, and
I haven't seen a case where it's missing in 1.2.


On Wed, Aug 14, 2013 at 6:05 PM, Mosfeq Rashid mosfeq-cassan...@madrose.net
 wrote:


 The response to the Select query is supposed to have metadata before row
 data is provided.  But in case of C* 1.2.5 and 1.2.8, I only get the row
 data.  As far as I understand binary protocol, the message should have all
 the data to parse it.  Does anyone know what I am missing?

 --
 Mosfeq




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: CQL vs Thrift

2013-07-17 Thread Tyler Hobbs
Hi, I'm the maintainer of pycassa and the DataStax python-driver.  I just
broke some fingers, so I will be brief.

Regarding performance, the python driver is brand new and still has some
issues to be worked out around performance (C extension, locking and
signaling).  How you use it has a big impact, though; see the benchmarks/
dir.  Some are on par or better than pycassa for single-threaded rates with
fewer connections.

You can use all Thrift CFs through CQL3.  Some cql3 support may be
backported to pycassa to ease the transition, but I have done no work there
so far.

I'll leave it to somebody else to comment on adding collections, etc to
Thrift.




On Wed, Jul 17, 2013 at 3:18 PM, Vladimir Prudnikov
v.prudni...@gmail.comwrote:

 Hi all,

 This is may be is not the right place to ask, but I though developers can
 answer to my questions better than users.

 It looks clear that Cassandra dev team concentrates on CQL rather than
 Thrift interface. I'm considering using Cassandra as a storage for my
 current project which will replace MySQL. I still have problem choosing
 between Thrift (Pycassa) vs CQL (cqlengine, python-driver).

 Personally after using pycassa in test project I fall in love with it. I'd
 prefer to use pycassa rather than python-driver, cqlengine or write raw
 queries.

 1) What's going on with Thrift interface and pycassa? I read somewhere that
 it will be for backward compatibility, but does it mean that new features
 will not be added to the Thrift interface hence will not be available with
 pycassa? For example collections
 http://www.datastax.com/dev/blog/cql3_collections.

 2) Currently column families created using CQL is not visible through the
 Thrift interface and vice versa. If I start with pycassa and in future I
 decide to use CQL (due to lack of new features) will it be possible to use
 these CFs? Or convert them so they become visible and accessible using CQL?

 3) Also I've done some basic tests (pycassa vs. cqlengine, no prepared
 statements) and seems like pycassa performs almost 2 times better which
 makes it more preferable. It was simple inserts of couple thousands rows.

 Do I have to put up with all this and start using CQL?

 Thanks,
 --
 Vladimir Prudnikov




-- 
Tyler Hobbs
DataStax http://datastax.com/


TimeUUIDType Comparison

2012-04-16 Thread Tyler Hobbs
I just discovered an issue with how TimeUUIDs are compared in Cassandra
that was affecting pycassa and probably affects other clients.

To allow for easy slicing of rows with TimeUUIDType comparators, pycassa
lets you supply a timestamp for the column start and end.  To make sure
that all columns within that time range were collected, regardless of the
random portions of the UUIDs, I was filling the non-timestamp bytes with
0x00 bytes or 0xff bytes to make the lowest or highest possible TimeUUID,
respectively, with the same timestamp.

It turns out that the comparison that Cassandra does to break ties when the
timestamp components match performs a signed byte comparison on each byte.
This means 0x00 is not the lowest possible value, 0x80 is.  Likewise, 0x7f
is the actual highest byte value, not 0xff.

So, if your client does something similar, this commit may be a useful
reference for you in order to get the correct behavior:
https://github.com/pycassa/pycassa/commit/7df88df4b533c193f029834541be16ff2e4b75f5

-- 
Tyler Hobbs
DataStax http://datastax.com/


0.8 Thrift API Changes

2011-04-14 Thread Tyler Hobbs
As a heads-up, the Thrift API for Cassandra 0.8 has several changes, one of
which is backwards-incompatible:

KsDefs no longer have a replication_factor attribute; it has been moved into
the KsDef's strategy_options.

The other changes include new operations and types for counters as well as
several new attributes for CfDef.

Feel free to email me if you have any questions about the changes.

-- 
Tyler Hobbs
Software Engineer, DataStax http://datastax.com/
Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
Python client library


Re: State Of: CQL

2011-03-20 Thread Tyler Hobbs
YesQL is the only one that's made me laugh out loud so far.  I'm a fan of
that if we want to keep it light-hearted.

I think CassQL and Castle are both reasonable.  'seepless' has a great idea
behind it, but it sounds a lot like like 'sleepless'.

On Sun, Mar 20, 2011 at 11:06 AM, Jake Luciani jak...@gmail.com wrote:

 I for one still like YesQL

 On Sun, Mar 20, 2011 at 8:29 AM, Gary Dusbabek gdusba...@gmail.com
 wrote:

  Everybody is right.  The CQL-SQL naming ambiguity is a problem.  We
  need to do something about this before it gets out of hand.
 
  I've been thinking about alternatives all weekend.  Here's one thing I
  came up with that I think will do nicely.
 
  Using our thrift API (the *old* way of doing things) had a tendency to
  let low level API paradigms code seep and leak all over application
  logic.  But we're not going to have that problem using CQL.  So I
  thought seepless would be a good name because your data code would
  stop seeping.
 
  Then I realized that it didn't boil down to a cool acronym or even
  have a symbol in it.  In grand fashion, I added a plus to the end of
  seepless to arrive at seepless+.  I think it has a nice ring and
  will fit easily into Cassandra discussions:
 
  A great way to use Cassandra is write queries using seepless+.
  We've got seepless+ drivers for several languages including java and
  python.
  We're not using thrift anymore; we write all of our queries in seepless+
  now.
 
  Anyway, I'll keep thinking to see if I can come up with something
  better.  I'm full of ideas this weekend.
 
  Gary.
 
 
  On Fri, Mar 18, 2011 at 14:54, Eric Evans eev...@rackspace.com wrote:
  
   With 3 weeks and change until the branch-and-feature-freeze, I thought
   I'd take a few moments to update everyone on the current state of CQL.
  
   Goals and Progress[1]
   -
   The overarching goal of course, is to create a compelling replacement
   for the RPC interface, one that is less baroque, comparable in
   performance, and stable across Cassandra release versions.
  
   The goals for Cassandra 0.8 are to meet or exceed the point of minimum
   usability.  That is to say, a significant number of users/applications
   can make use of it.  I believe we're on track to achieve that.
  
   Already complete:
   * Complete data manipulation (SELECT, UPDATE, DELETE, TRUNCATE ...)
   * Partial DDL, enough to create a schema, (ALTER is missing).
   * Drivers for Python (including Twisted), and Java (JDBC).
   * Language documentation (doc/cql/CQL.html)
  
   Remaining for 0.8:
   * Support for typed keys[2].
   * Tests, tests, and more tests.
  
  
   What comes next (after 0.8)
   ---
  
   * Benchmarking and optimization
   * Completion of DDL (ALTER ...).
   * Prepared statements
   * Custom, line protocol (no more Thrift).
   * ... ?
  
  
   What you can do
   ---
  
   * Play/test/experiment, and file bug reports.  The Python driver's
   interactive interpreter is a good place to start (drivers/py/cqlsh).
   * Write system tests (test/system/test_cql.py).
   * Write language drivers.
   * Write documentation.
   * Pick up unclaimed tickets tagged cql[3].
   * Port libraries and applications (and file bug reports).
  
   Thoughts, comments, questions?
  
   [1]: https://issues.apache.org/jira/browse/CASSANDRA-1703
   [2]: https://issues.apache.org/jira/browse/CASSANDRA-2311
   [3]: http://goo.gl/cSPlc
  
   --
   Eric Evans
   eev...@rackspace.com
  
 



 --
 http://twitter.com/tjake




-- 
Tyler Hobbs
Software Engineer, DataStax http://datastax.com/
Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
Python client library


Re: [VOTE] 0.7.3 take #2

2011-03-07 Thread Tyler Hobbs
On Thu, Mar 3, 2011 at 11:24 AM, Shotaro Kamio kamios...@gmail.com wrote:

 Hi,

 I'm testing 0.7.3 take#2 with multi node installation.

 I tried to read data using hector with quorum consistency after
 millions of data were put into cassandra. It looked ok at first. But,
 after a while, the client received many TimedOutException. When I
 looked into cassandra log, the following error is recorded in many of
 nodes.
 Have anyone seen this error?

 ERROR [RequestResponseStage:67] 2011-03-04 00:10:32,062
 AbstractCassandraDaemon.java (line 114) Fatal exception in thread
 Thread[RequestResponseStage:67,5,main]
 java.lang.AssertionError
at
 org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
at
 org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


 Thanks,
 Shotaro


Hi Shotaro,

I'm encountering the same issue, and I've opened
CASSANDRA-2282https://issues.apache.org/jira/browse/CASSANDRA-2282.
Could you comment on the ticket to document what conditions produced this
issue for you?

-- 
Tyler Hobbs
Software Engineer, DataStax http://datastax.com/
Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
Python client library


Re: Reducing confusion around client libraries

2010-12-03 Thread Tyler Hobbs
Personally, I like the Mongo drivers page:
http://www.mongodb.org/display/DOCS/Drivers

I like the clear distinction between preferred and alternative clients
without a lot of clutter about release dates and supported versions.  How do
we make that distinction, though?  A supported by Riptano section is one
option, but that doesn't even encompass all of the preferred clients.  I
don't know that we have enough active users and maintainers for all of the
languages that we could put up the clients for a democratic vote.

Are client maintainers willing to voluntarily place their clients into
either the official list or the community list?  Perhaps all clients
should be considered community supported unless selected by, say, the
Cassandra committers as being both up to quality standards and the current
best client for that language.

- Tyler


On Fri, Dec 3, 2010 at 12:18 PM, Nate McCall n...@riptano.com wrote:

 On Fri, Dec 3, 2010 at 11:59 AM, Paul Brown paulrbr...@gmail.com wrote:
 
  One way that this could be accomplished with a relatively even hand is to
 ensure that the relative liveliness of the client libraries is apparent on
 the page, e.g., a most recent release date, the target language (and
 potentially any additional decoration like Spring or Rails or...), and a
 list of versions of Cassandra supported.
 

 I agree with Paul - I think some additional feature and project
 activity comparison is the way to go near term.

 Nothing against ASF, but we Hector folk are happy with Github and Google
 groups.



Re: Reducing confusion around client libraries

2010-12-03 Thread Tyler Hobbs
+1 Daniel.

I find the wiki to be completely unnavigable, and cleaner and clearer
documentation about the clients (including a possible per-language page)
might solve 50% of the problem.

- Tyler

On Fri, Dec 3, 2010 at 3:10 PM, Daniel Lundin d...@eintr.org wrote:

 On Fri, Dec 3, 2010 at 10:07 PM, Daniel Lundin d...@eintr.org wrote:
  ... for each language/library would work (like the mongodb language
  centers, but with ..

 .. but with fewer levels of hierarchy, perhaps.

 /d



PHP Client

2010-10-23 Thread Tyler Hobbs
Hello all,

I've been working for a while now on putting together a PHP client that
works with Cassandra 0.7.  It's at a decent state now, so I would like to
start getting feedback from PHP users out there.

It's available on github here: http://github.com/thobbs/phpcassa

and the API documentation can be found here:
http://thobbs.github.com/phpcassa/

It's compatible with the current trunk (or RC, as it so happens).  The
client itself is based heavily on pycassa.http://github.com/pycassa/pycassa

I welcome any and all feedback, especially negative :)

- Tyler


Re: Status of LT LTE without EQ op for 0.7 release

2010-10-12 Thread Tyler Hobbs
I'm not completely sure I follow your scheme, but it's fairly to support
GT, LT, etc with your own index.

Use a row for your index where the columns names are the data values
you want to index.  If you set the comparator type (in your example, this
would be LongType), you can perform a LT or GT query just by getting a
slice of the index columns.  Store the original data row keys as the column
values, and you're there.

- Tyler

On Tue, Oct 12, 2010 at 9:33 PM, Todd Nine t...@spidertracks.com wrote:

 Thanks Johnathan,

 A follow up question.  Will it be possible to migrate existing indexes
 in a future release as part of the upgrade path to support LT and LTE
 ops without equal?   In the meantime in my Datanucleus Plugin I was
 thinking I could do something like the following.  It's not efficient
 for space, but it will work and should hopefully be relatively efficient
 for querying.


 LT and LTE ops can be though of as the distance from the MAX value of
 any given data type.  For instance, if I had a data type :ubershort,
 which goes from -200 to 200, I could say that an expression of = 0 is
 really = (distance) 200 from the maximum.  I could use this equation to
 calculate the distance to persist a distance value in a column named
 colName_reverse. Which would effectively give me a reverse index.


 Then the value would simply be

 storedValue = MAXVALUE-userVal.

 From there, whenever the user issues a  = query, I would simply
 translate the value via the above equation and  becomes  and =
 becomes =.  Aside from the space issue of storage, do you see any other
 problems with this approach for a 0.7 compatible version of my plugin?

 Thanks,
 Todd





 On Wed, 2010-10-13 at 14:00 +1300, Todd Nine wrote:

  Fair enough!
 
 
  Thanks Jonathan.
 
 
  todd
  SENIOR SOFTWARE ENGINEER
 
  todd nine| spidertracks ltd |  117a the square
  po box 5203 | palmerston north 4441 | new zealand
  P: +64 6 353 3395 | M: +64 210 255 8576
  E: t...@spidertracks.co.nz W: www.spidertracks.com
 
 
 
 
 
 
 
  On Tue, 2010-10-12 at 18:47 -0500, Jonathan Ellis wrote:
 
   On Tue, Oct 12, 2010 at 6:34 PM, Todd Nine t...@spidertracks.com
 wrote:
Currently there is only indexing for LT and LTE expression when an EQ
operator is present.  Will it be possible to use the LT and LTE ops
without an EQ by the 0.7.0 release?
  
   No.
  
 If not, which of the following
would be more efficient?
   
1. Creating a dummy column of 1 byte that is indexed.
  
   This is basically the same as doing a full range scan, only less
 efficient.
  
2. Use my previous indexing scheme of 2 Super CF for longs and
 strings
to get my  = operations.  Where I use the following scheme.
  
   I'm not sure I follow but if it's better than doing a full range scan
   then it is better than 1. :)