from:"Tyler Hobbs"

Re: Frozen type sin cassandra.

2017-03-07 Thread Tyler Hobbs

On Sun, Mar 5, 2017 at 11:53 PM, anuja jain <anujaja...@gmail.com> wrote:

> Is there is difference between creating column of type
> frozen<list> and frozen where list_double is UDT of
> type frozen<list> ?
>

Yes, there is a difference in serialization format: the first will be
serialized directly as a list, the second will be serialized as a
single-field UDT containing a list.

Additionally, the second form supports altering the type by adding fields
to the UDT.  This can't be done with the first form.  If you don't need
this capability, I recommend going with the simpler option of
frozen<list>.

> Also how to create a solr index on such columns?
>

I have no idea, sorry.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Frozen type support

2017-01-24 Thread Tyler Hobbs

There are no plans to remove support for frozen types.  I don't expect that
would ever happen.

On Tue, Jan 24, 2017 at 9:38 AM, Ahmed Eljami <ahmed.elj...@gmail.com>
wrote:

> Hi,
>
> I would like to know if the Frozen type will no longer be supported in
> the future versions of Cassandra ?
>
>
> Thx.
> Ahmed
>
>
>

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Import failure for use python cassandra-driver

2016-10-26 Thread Tyler Hobbs

This was fixed by the 3.7.1 release of the python driver:
https://groups.google.com/a/lists.datastax.com/forum/#!topic/python-driver-user/1UbvYc_h9KQ

On Wed, Oct 26, 2016 at 4:35 AM, Stefano Ortolani <ostef...@gmail.com>
wrote:

> Did you try the workaround they posted (aka, downgrading Cython)?
>
> Cheers,
> Stefano
>
> On Wed, Oct 26, 2016 at 10:01 AM, Zao Liu <zao...@gmail.com> wrote:
> > Same happen to my ubuntu boxes.
> >
> >   File
> > "/home/jasonl/.pex/install/cassandra_driver-3.7.0-cp27-
> none-linux_x86_64.whl.ebfb31ab99650d53ad134e0b312c74
> 94296cdd2b/cassandra_driver-3.7.0-cp27-none-linux_x86_64.
> whl/cassandra/cqlengine/connection.py",
> > line 20, in 
> >
> > from cassandra.cluster import Cluster, _NOT_SET, NoHostAvailable,
> > UserTypeDoesNotExist
> >
> > ImportError:
> > /home/jasonl/.pex/install/cassandra_driver-3.7.0-cp27-
> none-linux_x86_64.whl.ebfb31ab99650d53ad134e0b312c74
> 94296cdd2b/cassandra_driver-3.7.0-cp27-none-linux_x86_64.
> whl/cassandra/cluster.so:
> > undefined symbol: PyException_Check
> >
> >
> > And there is someone asked the same question in stack overflow:
> >
> > http://stackoverflow.com/questions/40251893/datastax-
> python-cassandra-driver-build-fails-on-ubuntu#
> >
> >
> >
> > On Wed, Oct 26, 2016 at 1:49 AM, Zao Liu <zao...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> Suddenly I start to get this following errors when use python cassandra
> >> driver 3.7.0 in my macbook pro running OS X EI Capitan. Tries to
> reinstall
> >> the package and all the dependencies, unfortunately no luck. I was able
> to
> >> run it a few days earlier. Really can't recall what I changed could
> cause
> >> this.
> >>
> >>   File
> >> "/Library/Python/2.7/site-packages/cassandra/cqlengine/connection.py",
> line
> >> 20, in 
> >> from cassandra.cluster import Cluster, _NOT_SET, NoHostAvailable,
> >> UserTypeDoesNotExist
> >> ImportError:
> >> dlopen(/Library/Python/2.7/site-packages/cassandra/cluster.so, 2):
> Symbol
> >> not found: _PyException_Check
> >>   Referenced from: /Library/Python/2.7/site-
> packages/cassandra/cluster.so
> >>   Expected in: flat namespace
> >>  in /Library/Python/2.7/site-packages/cassandra/cluster.so
> >>
> >> Thanks,
> >> Jason
> >>
> >>
> >
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cannot set TTL in COPY command

2016-10-26 Thread Tyler Hobbs

On Wed, Oct 26, 2016 at 10:07 AM, techpyaasa . <techpya...@gmail.com> wrote:

> Can some one please tell me how to set TTL using COPY command?


It looks like you're using Cassandra 2.0.  I don't think COPY supports the
TTL option until at least 2.1.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: CASSANDRA-5376: CQL IN clause on last key not working when schema includes set,list or map

2016-09-15 Thread Tyler Hobbs

That ticket was just to improve the error message.  From the comments on
the ticket:

"Unfortunately, handling collections is slightly harder than what
CASSANDRA-5230 <https://issues.apache.org/jira/browse/CASSANDRA-5230> aimed
for, because we can't do a name query. So this will have to wait for
CASSANDRA-4762 <https://issues.apache.org/jira/browse/CASSANDRA-4762>. In
the meantime, we should obviously not throw an assertion error so attaching
a patch to improve validation."

However, it seems like this would be possible to support in Cassandra 3.x.
We probably just need to remove the check and verify that it actually
works.  Can you open a new JIRA ticket for this?

On Thu, Sep 15, 2016 at 12:49 PM, Samba <saas...@gmail.com> wrote:

> any update on this issue?
>
> the quoted JIRA issue (CASSANDRA-5376) is resolved as fixed in 1.2.4 but
> it is still not possible (even in 3.7)  to use IN operator in queries that
> fetch collection columns.
>
> is the fix only to report better error message that this is not possible
> or was it fixed then but the issue resurfaced in regression?
>
> could you please confirm one way or the other?
>
> Thanks and Regards,
> Samba
>
>
> On Tue, Sep 6, 2016 at 6:34 PM, Samba <saas...@gmail.com> wrote:
>
>> Hi,
>>
>> "CASSANDRA-5376: CQL IN clause on last key not working when schema
>> includes set,list or map"
>>
>> is marked resolved in 1.2.4 but i still see the issue (not an Assertion
>> Error, but an query validation message)
>>
>> was the issue resolved only to report proper error message or was it
>> fixed to support retrieving collections when query contains IN clause of
>> partition/cluster (last) columns?
>>
>> If it was fixed properly to support retrieving collections with IN
>> clause, then is it a bug in 3.7 release that i get the same message?
>>
>> Could you please explain, if it not fixed as intended, if there are plans
>> to support this in future?
>>
>> Thanks & Regards,
>> Samba
>>
>
>

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: race condition for quorum consistency

2016-09-14 Thread Tyler Hobbs

On Wed, Sep 14, 2016 at 3:49 PM, Nicolas Douillet <
nicolas.douil...@gmail.com> wrote:

> -
> - during read requests, cassandra will ask to one node the data and to
> the others involved in the CL a digest, and if all digests do not match,
> will ask for them the entire data, handle the merge and finally will ask to
> those nodes a background repair. Your write may have succeed during this
> time.

This is very good info, but as a minor correction, the repair here will
happen in the foreground before the response is returned to the client.
So, at least from a single client's perspective, you get monotonic reads.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: ServerError: An unexpected error occurred server side; in cassandra java driver

2016-09-01 Thread Tyler Hobbs

There should be a corresponding error and stacktrace in your cassandra logs
on 10.0.230.25.  Please find that and post it, if you can.

On Thu, Sep 1, 2016 at 7:23 AM, Siddharth Verma <
verma.siddha...@snapdeal.com> wrote:

> Debugged the issue a little.
> AbstractFuture.get() throws java.util..concurrent.ExecutionException
> in, Uninterruptables.getUninterruptibly interrupted gets set to true,
> which does Thread.interrupt()
> thus in DefaultResultSetFuture 
> (ResultSet)Uninterruptibles.getUninterruptibly(this)
> throws exception.
>
> If someone who might have faced a similar issue could provide his/her
> views.
>
> Thanks
> Siddharth
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: How to create a TupleType/TupleValue in a UDF

2016-08-19 Thread Tyler Hobbs

On Thu, Aug 18, 2016 at 12:57 PM, Drew Kutcharian <d...@venarc.com> wrote:

> I’m running 3.0.8, so it probably wasn’t fixed? ;)
>

Hmm, would you mind opening a new JIRA ticket about that and linking it to
CASSANDRA-11033?

>
> The CodecNotFoundException is very random, when I get it, if I re-run the
> same exact query then it works! I’ll see if I can reproduce it more
> consistently.
>

Thanks.  If you can reproduce, please go ahead and open a ticket for that
as well.

> BTW, is there a way to get the CodecRegistry and the ProtocolVersion from
> the UDF environment so I don’t have to create them?
>

At least in 3.0.8, I don't think so.  It's worth pointing out
https://issues.apache.org/jira/browse/CASSANDRA-10818, which makes it much
easier to create tuples and UDTs in 3.6+.  Check out the bottom of the UDF
section of the docs for some examples and details:
http://cassandra.apache.org/doc/latest/cql/functions.html#user-defined-functions

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: How to create a TupleType/TupleValue in a UDF

2016-08-18 Thread Tyler Hobbs

The logback-related error is due to
https://issues.apache.org/jira/browse/CASSANDRA-11033, which is fixed in
3.0.4 and 3.4.

I'm not sure about the CodecNotFoundException, can you reproduce that one
reliably?

On Thu, Aug 18, 2016 at 10:52 AM, Drew Kutcharian <d...@venarc.com> wrote:

> Hi All,
>
> I have a UDF/UDA that returns a map of date -> TupleValue.
>
> CREATE OR REPLACE FUNCTION min_max_by_timestamps_udf(state map<date,
> frozen<tuple<timestamp, timestamp>>>, flake blob)
> RETURNS NULL ON NULL INPUT
> RETURNS map<date, frozen<tuple<timestamp, timestamp>>>
> LANGUAGE java
>
> CREATE OR REPLACE AGGREGATE min_max_by_timestamps(blob)
> SFUNC min_max_by_timestamps_udf
> STYPE map<date, frozen<tuple<timestamp, timestamp>>>
> INITCOND {};
>
> I’ve been using the following syntax to build the TupleType/TupleValue in
> my UDF:
>
> TupleType tupleType = TupleType.of(com.datastax.
> driver.core.ProtocolVersion.NEWEST_SUPPORTED, CodecRegistry.DEFAULT_INSTANCE,
> DataType.timestamp(), DataType.timestamp());
> tupleType.newValue(new java.util.Date(timestamp), new
> java.util.Date(timestamp)));
>
> But “randomly" I get errors like the following:
> FunctionFailure: code=1400 [User Defined Function failure]
> message="execution of ’testdb.min_max_by_timestamps_udf[map<date,
> frozen<tuple<timestamp, timestamp>>>, blob]' failed: 
> java.security.AccessControlException:
> access denied ("java.io.FilePermission" "/etc/cassandra/logback.xml"
> "read”)"
>
> Or CodecNotFoundException for Cassandra not being able to find a codec for
> "map<date, frozen<tuple<timestamp, timestamp>>>”.
>
> Is this a bug or I’m doing something wrong?
>
>
> Thanks,
>
> Drew
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: migrating from 2.1.2 to 3.0.8 log errors

2016-08-10 Thread Tyler Hobbs

That just means that a client/driver disconnected.  Those log messages are
supposed to be suppressed, but perhaps that stopped working in 3.x due to
another change.

On Wed, Aug 10, 2016 at 10:33 AM, Adil <adil.cha...@gmail.com> wrote:

> Hi guys,
> We have migrated our cluster (5 nodes in DC1 and 5 nodes in DC2) from
> cassandra 2.1.2 to 3.0.8, all seems fine, executing nodetool status shows
> all nodes UN, but in each node's log there is this log error continuously:
> java.io.IOException: Error while read(...): Connection reset by peer
> at io.netty.channel.epoll.Native.readAddress(Native Method)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.channel.epoll.EpollSocketChannel$
> EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.
> epollInReady(EpollSocketChannel.java:714) ~[netty-all-4.0.23.Final.jar:
> 4.0.23.Final]
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.23.Final.jar:
> 4.0.23.Final]
> at io.netty.util.concurrent.DefaultThreadFactory$
> DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
>
> we have installed java-8_101
>
> anya idea what woud be the problem?
>
> thanks
>
> Adil
> does anyone
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: (C)* stable version after 3.5

2016-07-13 Thread Tyler Hobbs

On Wed, Jul 13, 2016 at 11:32 AM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

> Why do you think that skipping 2.2 is not recommended when NEWS.txt
> suggests otherwise? Can you elaborate?

We test upgrading from 2.1 -> 3.x and upgrading from 2.2 -> 3.x
equivalently.  There should not be a difference in terms of how well the
upgrade is supported.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: C* 2.2.7 ?

2016-06-29 Thread Tyler Hobbs

2.2.7 just got tentatively tagged yesterday.  So, there should be a vote on
releasing it shortly.

On Wed, Jun 29, 2016 at 8:24 AM, Dominik Keil <dominik.k...@movilizer.com>
wrote:

> +1
>
> there's some bugs fixed we might be or sure are affected by and the
> change log has become quite large already mind voting von 2.2.7 soon?
>
>
> Am 21.06.2016 um 15:31 schrieb horschi:
>
> Hi,
>
> are there any plans to release 2.2.7 any time soon?
>
> kind regards,
> Christian
>
>
> --
> *Dominik Keil*
> Phone: + 49 (0) 621 150 207 31
> Mobile: + 49 (0) 151 626 602 14
>
> Movilizer GmbH
> Konrad-Zuse-Ring 30
> 68163 Mannheim
> Germany
>
> movilizer.com
>
> [image: Visit company website] <http://movilizer.com/>
> *Reinvent Your Mobile Enterprise*
>
> *-Movilizer is moving*
> After June 27th 2016 Movilizer's new headquarter will be
>
>
>
>
> *EASTSITE VIIIKonrad-Zuse-Ring 3068163 Mannheim*
>
> <http://movilizer.com/training>
> <http://movilizer.com/training>
>
> *Be the first to know:*
> Twitter <https://twitter.com/Movilizer> | LinkedIn
> <https://www.linkedin.com/company/movilizer-gmbh> | Facebook
> <https://www.facebook.com/Movilizer> | stack overflow
> <http://stackoverflow.com/questions/tagged/movilizer>
>
> Company's registered office: Mannheim HRB: 700323 / Country Court:
> Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche
> Please inform us immediately if this e-mail and/or any attachment was
> transmitted incompletely or was not intelligible.
>
> This e-mail and any attachment is for authorized use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be
> copied, disclosed to, retained or used by any other party. If you are not
> an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender.




-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Read operation can read uncommitted data?

2016-06-28 Thread Tyler Hobbs

Reads at CL.SERIAL will complete any in-progress paxos writes, so the
behavior you're seeing is expected.

On Mon, Jun 27, 2016 at 1:55 AM, Yuji Ito <y...@imagine-orb.com> wrote:

> Hi,
>
> I'm testing Cassandra CAS operation.
>
> Can a read operation read uncommitted data which is being updated by CAS
> in the following case?
>
> I use Cassandra 2.2.6.
> There are 3 nodes (A, B and C) in a cluster.
> Replication factor of keyspace is 3.
> CAS operation on node A starts to update row X (updating the value in row
> from 0 to 1).
>
> 1. prepare/promise phase succeeds on node A
> 2. node C is down
> 3. read/results phase in node A sends read requests to node B and C and
> waits for read responses from them.
> 4. (unrelated) read operation (CL: SERIAL) reads the same row X and gets
> the value "1" in the row!!
> 5. read/results phase fails by ReadTimeoutException caused by failure of
> node C
>
> Thanks,
> Yuji Ito
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Adding column to materialized view

2016-06-28 Thread Tyler Hobbs

This is expected.  It's something we plan to support, but it hasn't been
done yet: https://issues.apache.org/jira/browse/CASSANDRA-9736

On Mon, Jun 27, 2016 at 4:25 PM, Jason J. W. Williams <
jasonjwwilli...@gmail.com> wrote:

> Hey Guys,
>
> Running Cassandra 3.0.5. Needed to add a column to a materialized view,
> but ALTER MATERIALIZED VIEW doesn't seem to allow that. So we ended up
> dropping the view and recreating it. Is that expected or did I miss
> something in the docs?
>
> -J
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Token Ring Question

2016-06-24 Thread Tyler Hobbs

On Fri, Jun 24, 2016 at 2:31 PM, Anubhav Kale <anubhav.k...@microsoft.com>
wrote:

> So, can someone educate me on how token aware policies in drivers really
> work ? It appears that it’s quite possible that the data may live on nodes
> that don’t own the tokens for it. By “own” I mean the ownership as defined
> in system.local / peers and is fed back to drivers.
>

The tokens in system.local/peers are accurate.  Combined with the
replication settings for a keyspace, drivers can accurately determine which
nodes are replicas for a given partition.

Even if the driver's calculation is incorrect for some reason, token-aware
routing is just an optimization.  Nothing will break if a query is sent to
a node that's not a replica.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Installing Cassandra from Tarball

2016-06-14 Thread Tyler Hobbs

On Mon, Jun 13, 2016 at 11:49 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

>
> WARN  15:41:58 Cassandra server running in degraded mode. Is swap
>> disabled? : true,  Address space adequate? : true,  nofile limit adequate?
>> : false, nproc limit adequate? : false
>>
> You need to disable swap in order to avoid this message, using swap space
> can have serious performance implications. Make sure you disable fstab
> entry as well for swap partition.
>

It looks like swap is actually disabled, but the nofile and nproc limits
are too low.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Tyler Hobbs

Is 'id' your partition key? I'm not familiar with the stratio indexes, but
it looks like the primary key columns are both indexed.  Perhaps this is
related?

On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <atul.sar...@snapdeal.com>
wrote:

> After further debug, this issue is found in in-memory memtable as doing
> nodetool flush + compact resolve the issue. And there is no batch write
> used for this table which is showing issue.
> Table properties:
>
> WITH CLUSTERING ORDER BY (f_name ASC)
>> AND bloom_filter_fp_chance = 0.01
>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>> AND comment = ''
>> AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>> AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>> '1', 'schema': '{
>> fields : {
>> id  : {type : "bigint"},
>> f_d_name : {
>> type   : "string",
>> indexed: true,
>> sorted : false,
>> validated  : true,
>> case_sensitive : false
>> }
>> }
>> }'};
>>
>
>
>
> -
> Atul Saroha
> *Lead Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>
> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
> verma.siddha...@snapdeal.com> wrote:
>
>> No, all rows were not the same.
>> Querying only on the partition key gives 20 rows.
>> In the erroneous result, while querying on partition key and clustering
>> key, we got 16 of those 20 rows.
>>
>> And for "*tombstone_threshold"* there isn't any entry at column family
>> level.
>>
>> Thanks,
>> Siddharth Verma
>>
>>
>>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Tick Tock version numbers

2016-06-14 Thread Tyler Hobbs

On Mon, Jun 13, 2016 at 11:59 AM, Francisco Reyes <li...@natserv.net> wrote:

>
>
> Can I upgrade them to 3.6 from 3.2? Or is it advisable to upgrade to each
> intermediary version?
>

You can (and should) upgrade directly to 3.6 or 3.7.  The 3.7 release is
just 3.6 + bugfixes.

>
> Based on what I have gather seems like it is matter of:
> bring node down
> install new version
> bring up
> run nodetool upgradesstables -a
>

For upgrades within the 3.x line, you don't need to run upgradesstables.
Other than that, this is correct.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Lightweight Transactions during datacenter outage

2016-06-07 Thread Tyler Hobbs

You can set the serial_consistency_level to LOCAL_SERIAL to tolerate a DC
failure:
http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level.
It defaults to SERIAL, which ignores DCs.

On Tue, Jun 7, 2016 at 12:26 PM, Jeronimo de A. Barros <
jeronimo.bar...@gmail.com> wrote:

> Hi,
>
> I have a cluster spreaded among 2 datacenters (DC1 and DC2), two server on
> each DC and I have a keyspace with NetworkTopologyStrategy (DC1:2 and
> DC2:2) with the following table:
>
> CREATE TABLE test (
>   k1 int,
>   k2 timeuuid,
>   PRIMARY KEY ((k1), k2)
> ) WITH CLUSTERING ORDER BY (k2 DESC)
>
> During a datacenter outage, as soon as a datacenter goes offline, I get
> this error during a lightweight transaction:
>
> cqlsh:devtest> insert into test (k1,k2) values(1,now()) if not exists;
> Request did not complete within rpc_timeout.
>
>
> And a short time after the on-line DC verify the second DC is off-line:
>
> cqlsh:devtest> insert into test (k1,k2) values(1,now()) if not exists;
> Unable to complete request: one or more nodes were unavailable.
>
>
> So, my question is: Is there any way to keep lightweight transactions
> working during a datacenter outage using the C* Python driver 2.7.2 ?
>
> I was thinking about catch the exception and do a simple insert (without
> "IF") when the error occur, but having the lightweight transactions working
> even during a DC outage/split would be nice.
>
> Thanks in advance for any help/hints.
>
> Best regards, Jero
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Token Ring Question

2016-06-03 Thread Tyler Hobbs

There really is only one token ring, but conceptually it's easiest to think
of it like multiple rings, as OpsCenter shows it.  The only difference is
that every token has to be unique across the whole cluster.

Now, if the token for a particular write falls in the “primary range” of a
> node living in DC2, does the code check for such conditions and instead put
> it on some node in DC1 ?
>

Yes.  It will continue searching around the token ring until it hits a
token that belongs to a node in the correct datacenter.

What is the true meaning of “primary” token range in such scenarios ?
>

There's not really any such thing as a "primary token range", it's just a
convenient idea for some tools.  In reality, it's just the replica that
owns the first (clockwise) token.  I'm not sure what you're really asking,
though -- what are you concerned about?

On Wed, Jun 1, 2016 at 2:40 PM, Anubhav Kale <anubhav.k...@microsoft.com>
wrote:

> Hello,
>
>
>
> I recently learnt that regardless of number of Data Centers, there is
> really only one token ring across all nodes. (I was under the impression
> that there is one per DC like how Datastax Ops Center would show it).
>
>
>
> Suppose we have 4 v-nodes, and 2 DCs (2 nodes in each DC) and a key space
> is set to replicate in only one DC – say DC1.
>
>
>
> Now, if the token for a particular write falls in the “primary range” of a
> node living in DC2, does the code check for such conditions and instead put
> it on some node in DC1 ? What is the true meaning of “primary” token range
> in such scenarios ?
>
>
>
> Is this how things works roughly speaking or am I missing something ?
>
>
>
> Thanks !
>

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Blob or columns

2016-06-03 Thread Tyler Hobbs

On Fri, Jun 3, 2016 at 10:43 AM, Abhinav Solan <abhinav.so...@gmail.com>
wrote:

> Should we store these inconsequential data as blob or JSON in one column
> or create separate columns for them, which one should be the preferred way
> here ?

A blob will be more compact and require less server and driver resources
for serialization and deserialization.  Since you don't need to update
anything in the blob individually, I recommend going with that.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Get clustering column in Custom cassandra trigger

2016-05-26 Thread Tyler Hobbs

Try:

unfilteredRowIterator.next().clustering().toString(update.metadata())

To get the raw values, you can use:

unfilteredRowIterator.next().clustering().getRawValues()

On Thu, May 26, 2016 at 7:25 AM, Siddharth Verma <
verma.siddha...@snapdeal.com> wrote:

> Hi Sam,
> Sorry, I couldn't understand.
>
> I am already using
> UnfilteredRowIterator unfilteredRowIterator
> =partition.unfilteredIterator();
>
> while(unfilteredRowIterator.hasNext()){
> next.append(unfilteredRowIterator.next().toString()+"\001");
> }
>
> Is there another way to access it?
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Setting bloom_filter_fp_chance < 0.01

2016-05-26 Thread Tyler Hobbs

On Thu, May 26, 2016 at 4:36 AM, Adarsh Kumar <adarsh0...@gmail.com> wrote:

>
> 1). Is there any other way to configure no of buckets along with
> bloom_fileter_fp_chance, to avoid this exception?
>

No, it's hard coded, although we could theoretically hard code it to
support a higher number of buckets.

> 2). If this validation is hard coaded then why it is even allowed to set
> such value of bloom_fileter_fp_chance, that can prevent ssTable generation.
>

You're right, we should be validating this upfront when the probability is
set.  Can you open a ticket here for that?
https://issues.apache.org/jira/browse/CASSANDRA

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Internal Handling of Map Updates

2016-05-25 Thread Tyler Hobbs

If you replace an entire collection, whether it's a map, set, or list, a
range tombstone will be inserted followed by the new collection.  If you
only update a single element, no tombstones are generated.

On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
matthias.nieh...@codecentric.de> wrote:

> Hi,
>
> we have a table with a Map Field. We do not delete anything in this table,
> but to updates on the values including the Map Field (most of the time a
> new value for an existing key, Rarely adding new keys). We now encounter a
> huge amount of thumbstones for this Table.
>
> We used sstable2json to take a look into the sstables:
>
>
> {"key": "Betty_StoreCatalogLines:7",
>
>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>
>["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
> 08:40Z",1463820040628001],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>
>
> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>
> . . .
>
>   
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","0154d265c6b0",1463820040628001],
>
>
> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
> Code\":\"276\"}}",1463820040628001]
>
>
>
> Looking at the SStables it seem like every update of a value in a Map
> breaks down to a delete and insert in the corresponding SSTable (see all
> the thumbstone flags „t“ in the extract of sstable2json above).
>
> We are using Cassandra 2.2.5.
>
> Can you confirm this behavior?
>
> Thanks!
> --
> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
> 172.1702676
> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
> www.more4fi.de
>
> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz
>
> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
> und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
> Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
> bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
> beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
> evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
> nicht gestattet
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Low cardinality secondary index behaviour

2016-05-12 Thread Tyler Hobbs

On Tue, May 10, 2016 at 6:41 AM, Atul Saroha <atul.sar...@snapdeal.com>
wrote:

> I have concern over using secondary index on field with low cardinality.
> Lets say I have few billion rows and each row can be classified in 1000
> category. Lets say we have 50 node cluster.
>
> Now we want to fetch data for a single category using secondary index over
> a category. And query is paginated too with fetch size property say 5000.
>
> Since query on secondary index works as scatter and gatherer approach by
> coordinator node. Would it lead to out of memory on coordinator or timeout
> errors too much.
>

Paging will prevent the coordinator from using excessive memory.  With the
type of data that you described, timeouts shouldn't be huge problem because
it will only take a few token ranges (assuming you're using vnodes) to get
enough matching rows to hit the page size.

>
> How does pagination (token level data fetch) behave in scatter and
> gatherer approach?
>

Secondary index queries fetch token ranges in sequential order [1],
starting with the minimum token.  When you fetch a new page, it resumes
from the last token (and primary key) that it returned in the previous page.

[1] As an optimization, multiple token ranges will be fetched in parallel
based on estimates of how many token ranges it will take to fill the page.

>
> Secondly, What If we create an inverted table with partition key as
> category. Then this will led to lots of data on single node. Then it might
> led to hot shard issue and performance issue of data fetching from single
> node as a single partition has  millions of rows.
>
> How should we tackle such low cardinality index in Cassandra?

The data distribution that you described sounds like a reasonable fit for
secondary indexes.  However, I would also take into account how frequently
you run this query and how fast you need it to be.  Even ignoring the
scatter-gather aspects of a secondary index query, they are still expensive
because they fetch many non-contiguous rows from an SSTable.  If you need
to run this query very frequently, that may add too much load to your
cluster, and some sort of inverted table approach may be more appropriate.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra 3.0.6 Release?

2016-05-10 Thread Tyler Hobbs

On Mon, May 9, 2016 at 2:48 PM, Drew Kutcharian <d...@venarc.com> wrote:

>
>
> What’s the 3.0.6 release date? Seems like the code has been frozen for a
> few days now. I ask because I want to install Cassandra on Ubuntu 16.04 and
> CASSANDRA-10853 is blocking it.
>

We've been holding it up to sync it with the 3.6 release.  There were a
couple of bugs in the first 3.6-tentative tag that forced us to re-roll and
restart test runs.  The release vote for 3.0.6 and 3.6 should start within
the next couple of days, and takes 72 hours to complete.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Discrepancy while paging through table, and static column updated inbetween

2016-04-19 Thread Tyler Hobbs

This sounds similar to https://issues.apache.org/jira/browse/CASSANDRA-10010,
but that only affected 2.x.  Can you open a Jira ticket with your table
schema, the problematic query, and the details you posted here?

On Tue, Apr 19, 2016 at 10:25 AM, Siddharth Verma <
verma.siddha...@snapdeal.com> wrote:

> Hi,
>
> We are using cassandra(dsc3.0.3) on production.
>
> For some purpose, we were doing a full table scan (setPagingState and
> getPagingState used on ResultSet in java program), and there has been some
> discrepancy when we ran the same job multiple times.
> Each time some new data was added to the output, and some was left out.
>
> Side Note 1 :
> Table structure
> col1, col2, col3, col4, col5, col6
> Primary key(col1, col2)
> col5 is static column
> col6 static column. Used to explicitly store updated time when col5 changed
>
>
> Sample Data
> 1,A,AA,AAA,STATIC,T1
> 1,B,BB,BBB,STATIC,T1
> 1,C,CC,CCC,STATIC,T1
> 1,D,DD,DDD,STATIC,T1
>
> For some key, sometime col6 was updated while the job was running, so some
> values were not printed for that partition key.
>
> Side Note 2 :
> we did -> select col6, writetime(col6) from ... where col1=... and col2=...
> For the data that was missed out to make sure that particular entry wasn't
> added later.
>
>
> Side Note 3:
> The above scenario that some col6 was updated while job was running,
> therefore some entry for that partition key was ignored, is an assumption
> from our end.
> We can't understand why some entries were not printed in the table scan.
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Proper use of COUNT

2016-04-19 Thread Tyler Hobbs

On Tue, Apr 19, 2016 at 9:51 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

>
> 1. Another clarification: All of the aggregate functions, AVG, SUM, MIN,
> MAX are in exactly the same boat as COUNT, right?
>

Yes.


>
> 2. Is the paging for COUNT, et al, done within the coordinator node?
>

Yes.


>
> 3. Does dedupe on the coordinator node consume memory proportional to the
> number of rows on all nodes? I mean, you can't dedupe using only partition
> keys of the coordinator node, right? What I'm wondering is if the usability
> of COUNT (et al) is memory limited as well as time.
>

Deduping (i.e. normal conflict resolution) happens per-page, so in the
worst case the memory requirements for the coordinator are RF * page size.




-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Compaction Error When upgrading from 2.1.9 to 3.0.2

2016-04-14 Thread Tyler Hobbs

On Thu, Apr 14, 2016 at 2:08 PM, Anthony Verslues <
anthony.versl...@mezocliq.com> wrote:

> It was an older upgrade plan so I went ahead and tried to upgrade to 3.0.5
> and I ran into the same error.
>

Okay, good to know.  Please include that info in the ticket when you open
it.

>
>
> Do you know what would cause this error? Is it something  to do with
> tombstoned or deleted rows?
>
>
>

I'm not sure, I haven't looked into it too deeply yet.  From the stacktrace
it looks related to reading the static columns of a row.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Leak Detected while bootstrap

2016-04-13 Thread Tyler Hobbs

This looks like it might be
https://issues.apache.org/jira/browse/CASSANDRA-11374.  Can you comment on
that ticket and share your logs leading up to the error?

On Wed, Apr 13, 2016 at 3:37 PM, Anubhav Kale <anubhav.k...@microsoft.com>
wrote:

> Hello,
>
>
>
> Since we upgraded to Cassandra 2.1.12, we are noticing that * below*
> happens when we are trying to bootstrap nodes, and the process just gets
> stuck. Restarting the process / VM does not help. Our nodes are around ~300
> GB and run on local SSDs and we haven’t seen this problem on older versions
> (specifically 2.1.9).
>
>
>
> Is this a known issue / any workarounds ?
>
>
>
> *ERROR [Reference-Reaper:1] 2016-04-13 20:33:53,394  Ref.java:179 - LEAK
> DETECTED: a reference
> (org.apache.cassandra.utils.concurrent.Ref$State@15e611a3) to class
> org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@203187780:[[OffHeapBitSet]]
> was not released before the reference was garbage collected*
>
>
>
> Thanks !
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Compaction Error When upgrading from 2.1.9 to 3.0.2

2016-04-13 Thread Tyler Hobbs

Can you open a ticket here with your schema and the stacktrace?
https://issues.apache.org/jira/browse/CASSANDRA

I'm also curious why you're not upgrading to 3.0.5 instead of 3.0.2.

On Wed, Apr 13, 2016 at 4:37 PM, Anthony Verslues <
anthony.versl...@mezocliq.com> wrote:

> I got this compaction error when running ‘nodetool upgradesstable –a’
> while upgrading from 2.1.9 to 3.0.2. According to documentation this
> upgrade should work.
>
>
>
> Would upgrading to another intermediate version help?
>
>
>
>
>
> This is the line number:
> https://github.com/apache/cassandra/blob/cassandra-3.0.2/src/java/org/apache/cassandra/db/LegacyLayout.java#L1124
>
>
>
>
>
> error: null
>
> -- StackTrace --
>
> java.lang.AssertionError
>
> at
> org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1124)
>
> at
> org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1099)
>
> at
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:444)
>
> at
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:423)
>
> at
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:289)
>
> at
> org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:134)
>
> at
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:57)
>
> at
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:329)
>
> at
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>
> at
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
>
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:109)
>
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:100)
>
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:206)
>
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:159)
>
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:150)
>
> at
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
>
> at
> org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:226)
>
> at
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177)
>
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>
> at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
>
> at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>
> at
> org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:572)
>
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Adding Options to Create Statements...

2016-04-01 Thread Tyler Hobbs

I'm not sure which driver you're referring to, but if it's the java driver,
it has its own mailing list that may be more helpful:
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user

On Thu, Mar 31, 2016 at 4:40 PM, James Carman <ja...@carmanconsulting.com>
wrote:

> No thoughts? Would an upgrade of the driver "fix" this?
>
> On Wed, Mar 30, 2016 at 10:42 AM James Carman <ja...@carmanconsulting.com>
> wrote:
>
>> I am trying to perform the following operation:
>>
>> public Create createCreate() {
>>   Create create =
>> SchemaBuilder.createTable("foo").addPartitionColumn("bar",
>> varchar()).addClusteringColumn("baz", varchar);
>>   if(descending) {
>> create.withOptions().clusteringOrder("baz", Direction.DESC);
>>   return create;
>> }
>>
>> I don't want to have to return the Create.Options object from this method
>> (as I may need to add other columns).  Is there a way to have the options
>> "decorate" the Create directly without having to return the Create.Options?
>>
>>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Thrift composite partition key to cql migration

2016-03-31 Thread Tyler Hobbs

Also, can you paste the results of the relevant portions of "SELECT * FROM
system.schema_columns" and "SELECT * FROM system.schema_columnfamilies"?

On Thu, Mar 31, 2016 at 2:35 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> In the Thrift schema, is the key_validation_class actually set to
> CompositeType(UTF8Type, UTF8Type), or is it just BytesType?  What Cassandra
> version?
>
> On Wed, Mar 30, 2016 at 4:44 PM, Jan Kesten <j.kes...@enercast.de> wrote:
>
>> Hi,
>>
>> while migrating the reminder of thrift operations in my application I
>> came across a point where I cant find a good hint.
>>
>> In our old code we used a composite with two strings as row / partition
>> key and a similar composite as column key like this:
>>
>> public Composite rowKey() {
>> final Composite composite = new Composite();
>> composite.addComponent(key1, StringSerializer.get());
>> composite.addComponent(key2, StringSerializer.get());
>> return composite;
>> }
>>
>> public Composite columnKey() {
>> final Composite composite = new Composite();
>> composite.addComponent(key3, StringSerializer.get());
>> composite.addComponent(key4, StringSerializer.get());
>> return composite;
>> }
>>
>> In cql this columnfamiliy looks like this:
>>
>> CREATE TABLE foo.bar (
>> key blob,
>> column1 text,
>> column2 text,
>> value blob,
>> PRIMARY KEY (key, column1, column2)
>> )
>>
>> For the columns key3 and key4 became column1 and column2 - but the old
>> rowkey is presented as blob (I can put it into a hex editor and see that
>> key1 and key2 values are in there).
>>
>> Any pointers to handle this or is this a known issue? I am using now
>> DataStax Java driver for CQL, old connector used thrift. Is there any way
>> to get key1 and key2 back apart from completly rewriting the table? This is
>> what I had expected it to be:
>>
>> CREATE TABLE foo.bar (
>> key1 text,
>> key2 text,
>> column1 text,
>> column2 text,
>> value blob,
>> PRIMARY KEY ((key1, key2), column1, column2)
>> )
>>
>> Cheers,
>> Jan
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Thrift composite partition key to cql migration

2016-03-31 Thread Tyler Hobbs

In the Thrift schema, is the key_validation_class actually set to
CompositeType(UTF8Type, UTF8Type), or is it just BytesType?  What Cassandra
version?

On Wed, Mar 30, 2016 at 4:44 PM, Jan Kesten <j.kes...@enercast.de> wrote:

> Hi,
>
> while migrating the reminder of thrift operations in my application I came
> across a point where I cant find a good hint.
>
> In our old code we used a composite with two strings as row / partition
> key and a similar composite as column key like this:
>
> public Composite rowKey() {
> final Composite composite = new Composite();
> composite.addComponent(key1, StringSerializer.get());
> composite.addComponent(key2, StringSerializer.get());
> return composite;
> }
>
> public Composite columnKey() {
> final Composite composite = new Composite();
> composite.addComponent(key3, StringSerializer.get());
> composite.addComponent(key4, StringSerializer.get());
> return composite;
> }
>
> In cql this columnfamiliy looks like this:
>
> CREATE TABLE foo.bar (
> key blob,
> column1 text,
> column2 text,
> value blob,
> PRIMARY KEY (key, column1, column2)
> )
>
> For the columns key3 and key4 became column1 and column2 - but the old
> rowkey is presented as blob (I can put it into a hex editor and see that
> key1 and key2 values are in there).
>
> Any pointers to handle this or is this a known issue? I am using now
> DataStax Java driver for CQL, old connector used thrift. Is there any way
> to get key1 and key2 back apart from completly rewriting the table? This is
> what I had expected it to be:
>
> CREATE TABLE foo.bar (
> key1 text,
> key2 text,
>     column1 text,
> column2 text,
> value blob,
> PRIMARY KEY ((key1, key2), column1, column2)
> )
>
> Cheers,
> Jan
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Inconsistent query results and node state

2016-03-31 Thread Tyler Hobbs

On Thu, Mar 31, 2016 at 11:53 AM, Jason Kania <jason.ka...@ymail.com> wrote:

>
> To me it just seems like the timestamp column value is sometimes not being
> set somewhere in the pipeline and the result is the epoch 0 value.
>

I agree, especially since you can't directly query this row and that
timestamp doesn't fit in the normal ordering.

>
> Thoughts on how to proceed?
>

Please open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and
include your schema and queries.  If possible, it would also be extremely
helpful if you can upload the sstables for that table.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Inconsistent query results and node state

2016-03-30 Thread Tyler Hobbs

>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(-4908797801227889951, 4a41534b414e)
> (6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)


That key matches the row you mentioned, so it seems like all of the
replicas should have converged on the same value for that row.  Do you
consistently get the *1969-12-31 19:00 *timestamp back now?  If not, try
selecting both "time" and "writetime(time)}" from that row and see what
write timestamps each of the values have.

The ArrayIndexOutOfBoundsException in response to nodetool compact looks
like a bug.  What version of Cassandra are you running?

On Wed, Mar 30, 2016 at 9:59 AM, Kai Wang <dep...@gmail.com> wrote:

> Do you have NTP setup on all nodes?
>
> On Tue, Mar 29, 2016 at 11:48 PM, Jason Kania <jason.ka...@ymail.com>
> wrote:
>
>> We have encountered a query inconsistency problem wherein the following
>> query returns different results sporadically with invalid values for a
>> timestamp field looking like the field is uninitialized (a zero timestamp)
>> in the query results.
>>
>> Attempts to repair and compact have not changed the results.
>>
>> select "subscriberId","sensorUnitId","sensorId","time" from
>> "sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND
>> "sensorId"=0 ORDER BY "time" LIMIT 10;
>>
>> Invalid Query Results
>> subscriberIdsensorUnitIdsensorIdtime
>> JASKAN002015-05-24 2:09
>> JASKAN00*1969-12-31 19:00*
>> JASKAN002016-01-21 2:10
>> JASKAN002016-01-21 2:10
>> JASKAN002016-01-21 2:10
>> JASKAN002016-01-21 2:11
>> JASKAN002016-01-21 2:22
>> JASKAN002016-01-21 2:22
>> JASKAN002016-01-21 2:22
>> JASKAN002016-01-21 2:22
>>
>> Valid Query Results
>> subscriberIdsensorUnitIdsensorIdtime
>> JASKAN002015-05-24 2:09
>> JASKAN002015-05-24 2:09
>> JASKAN002015-05-24 2:10
>> JASKAN002015-05-24 2:10
>> JASKAN002015-05-24 2:10
>> JASKAN002015-05-24 2:10
>> JASKAN002015-05-24 2:11
>> JASKAN002015-05-24 2:13
>> JASKAN002015-05-24 2:13
>> JASKAN002015-05-24 2:14
>>
>> We have confirmed that the 1969-12-31 timestamp is not within the data
>> based on running and number of queries so it looks like the invalid
>> timestamp value is generated by the query. The query below returns no row.
>>
>> select * from "sensorReadingIndex" where "subscriberId"='JASKAN' AND
>> "sensorUnitId"=0 AND "sensorId"=0 AND time='1969-12-31 19:00:00-0500';
>>
>> No logs are coming out but the following was observed intermittently in
>> the tracing output, but not correlated to the invalid query results:
>>
>>  Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
>> Mismatch for key DecoratedKey(-7563144029910940626,
>> 00064a41534b414e040400)
>> (be22d379c18f75c2f51dd6942d2f9356 vs da4e95d571b41303b908e0c5c3fff7ba)
>> [ReadRepairStage:3179] | 2016-03-29 23:12:35.025000 | 192.168.10.10 |
>>
>> An error from the debug log that might be related is:
>>
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>> DecoratedKey(-4908797801227889951, 4a41534b414e)
>> (6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)
>> at
>> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85)
>> ~[apache-cassandra-3.0.3.jar:3.0.3]
>> at
>> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
>> ~[apache-cassandra-3.0.3.jar:3.0.3]
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_74]
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_74]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
>>
>> The tracing files are attached and seem to show that in the failed case,
>> content is skipped because of tombstones if we understand it correctly.
>> This could be an inconsistency problem on 192.168.10.9 Unfortunately,
>> attempts to compact on 192.168.10.9 only give the following error without
>> any stack trace detail and are not fixed with repair.
>>
>> root@cutthroat:/usr/local/bin/analyzer/bin# nodetool compact
>> error: null
>> -- StackTrace --
>> java.lang.ArrayIndexOutOfBoundsException
>>
>> Any suggestions on how to fix or what to search for would be much
>> appreciated.
>>
>> Thanks,
>>
>> Jason
>>
>>
>>
>>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Drop and Add column with different datatype in Cassandra

2016-03-29 Thread Tyler Hobbs

On Tue, Mar 29, 2016 at 10:31 AM, Bhupendra Baraiya <
bhupendra.bara...@continuum.net> wrote:

> Does it mean Cassandra does not allow adding of the same column in the
> Table even though it does not exists in the Table
>

As the error message says, you can't re-add a *collection* column with the
same name.  Other types of columns are fine.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Python to type field

2016-03-19 Thread Tyler Hobbs

This should be useful:
http://datastax.github.io/python-driver/user_defined_types.html

On Wed, Mar 16, 2016 at 1:18 PM, Rakesh Kumar <rakeshkumar46...@gmail.com>
wrote:

> Hi
>
> I have a type defined as follows
>
> CREATE TYPE etag (
> ttype int,
> tvalue text
> );
>
> And this is used in a col of a table as follows
>
>  evetag list  >
>
> I have the following value in a file
> [{ttype: 3 , tvalue: '90A1'}]
>
> This gets inserted via COPY command with no issues.
>
> However when I try to insert the same via a python program which I am
> writing. where I prepare and then bind, I get this error while executing
>
> TypeError: Received an argument of invalid type for column "evetag".
> Expected:  VarcharType))'>, Got: ; (Received a string for a type that
> expects a sequence)
>
> I tried casting the variable in python to list, tuple, but same error.
>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Understanding SELECT * paging/ordering

2016-03-18 Thread Tyler Hobbs

On Fri, Mar 18, 2016 at 4:58 PM, Dan Checkoway <dchecko...@gmail.com> wrote:

> Say I have a table with 50M rows in a keyspace with RF=3 in a cluster of
> 15 nodes (single local data center).  When I do "SELECT * FROM table" and
> page through those results (with a fetch size of say 1000), I'd like to
> understand better how that paging works.
>
> Specifically, what determines the order in which which rows are returned?
>

Results are returned in token order (murmur3 hash of the partition key),
and within a single partition, rows are ordered by the clustering key.

>   And what's happening under the hood...i.e. is the coordinator fetching
> pages of 1000 from each node, passing some sort of paging state to each
> node, and the coordinator merges the per-node sorted result sets?
>

The coordinator sequentially[1] queries each token range until it has
enough rows to meet the page size.  When the next page is fetched, it
resumes this process, but starts at the last-used token (which is in the
paging state that the driver passes to the coordinator) rather than the
start of the ring.

> I'm also curious how consistency level comes into play.  i.e. if I use ONE
> vs. QUORUM vs. ALL, how that impacts where the results come from and how
> they're ordered, merged, and who knows what else I don't know...  :-)
>

The only difference between ONE and QUORUM is that the coordinator will
query multiple replicas for each token range and perform the standard
conflict resolution.

[1] In reality, based on estimates of how many token ranges it will need to
query in order to meet the page size, it will query multiple token ranges
in parallel.  See CASSANDRA-1337 for details.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Automatically connect to any up node via cqlsh

2016-03-09 Thread Tyler Hobbs

On Wed, Mar 9, 2016 at 8:09 AM, Rakesh Kumar <dcrunch...@aim.com> wrote:

>
> Is it possible to set up cassandra/cqlsh so that if any node is down,
> cqlsh will automatically try to connect to the other surviving nodes,
> instead of erroring out. I know it is possible to supply ip_address and
> port of the UP node as arguments to cqlsh, but I am looking at automatic
> detection.
>

No, right now cqlsh is designed to connect to only a single node.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Isolation for atomic batch on the same partition key

2016-03-01 Thread Tyler Hobbs

On Mon, Feb 22, 2016 at 3:58 PM, Yawei Li <yawei...@gmail.com> wrote:

>
> 1. If  an atomic batch (logged batch) contains a bunch of row mutations
> and all of them have the same partition key, can I assume all those changes
> have the same isolation as the row-level isolation? According to the post
> here http://www.mail-archive.com/user%40cassandra.apache.org/msg42434.html,
> it seems that we can get strong isolation.
> e.g.
> *BEGIN BATCH*
> *  UPDATE a IF condition_1;*
> *  INSERT b;*
> *  INSERT c;*
> *APPLY BATCH*
>
> So at any replica, we expect isolation for the three changes on *a*, *b*,
> *c*  (*a* , *b*, *c* have the same partition key *k1*) -- i.e. either
> none or all of them are visible. Can someone help confirm?
>

That is correct.


>
> 2. Say in the above batch, we include two extra row mutations d and e for
> another partition key *k2*.  Will the changes on (*a*, *b*, *c*)  and (*d*
> , *e*) still atomic respectively in terms of isolation? I understand
> there is no isolation between (*a*, *b*, *c*) and (*d*, *e*).  I.e. is
> there a per-parition-key isolation guaranteed?
>

You can't use LWT conditions (i.e. "IF condition_1") in batches that span
multiple partitions keys.  If you did not include the condition, then you
would get per-partition isolation, as you describe.


>
>
> 3. I assume CL SERIAL or LOCAL_SERIAL on reads will try applying the above
> logged batch if it is committed but not applied. Right?
>

Correct.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: IF NOT EXISTS with multiple static columns confusion

2016-03-01 Thread Tyler Hobbs

What version of Cassandra are you using?  I just tested this out against
trunk and got reasonable behavior:


cqlsh:ks1> CREATE TABLE test (k int, s1 int static, s2 int static, c int, v
int, PRIMARY KEY (k, c));
cqlsh:ks1> INSERT INTO test (k, c, v) VALUES (0, 0, 0);
cqlsh:ks1> UPDATE test SET s1 = 0 WHERE k = 0 IF s1 = null;

 [applied]
---
  True

cqlsh:ks1> TRUNCATE test;
cqlsh:ks1> INSERT INTO test (k, c, v) VALUES (0, 0, 0);
cqlsh:ks1> INSERT INTO test (k, s1) VALUES (0, 0) IF NOT EXISTS;

 [applied]
---
  True



On Tue, Feb 23, 2016 at 6:15 PM, Nimi Wariboko Jr <n...@channelmeter.com>
wrote:

> I have a table with 2 static columns, and I write to either one of them,
> if I then write to the other one using IF NOT EXISTS, it fails even though
> it has never been written too before. Is it the case that all static
> columns share the same "written too" marker?
>
> Given a table like so:
>
> CREATE TABLE test (
>   id timeuuid,
>   foo int static,
>   bar int static,
>   baz int,
>   baq int
>   PRIMARY KEY (id, baz)
> )
>
> I'm seeing some confusing behavior see the statements below -
>
> """
> INSERT INTO cmpayments.report_payments (id, foo) VALUES (NOW(), 1) IF NOT
> EXISTS; // succeeds
> TRUNCATE test;
> INSERT INTO cmpayments.report_payments (id, baq) VALUES
> (99c3-b01a-11e5-b170-0242ac110002, 1);
> UPDATE cmpayments.report_payments SET foo = 1 WHERE
> id=99c3-b01a-11e5-b170-0242ac110002 IF foo=null; // fails, even though
> foo=null
> TRUNCATE test;
> INSERT INTO cmpayments.report_payments (id, bar) VALUES
> (99c3-b01a-11e5-b170-0242ac110002, 1); // succeeds
> INSERT INTO cmpayments.report_payments (id, foo) VALUES (NOW(), 1) IF NOT
> EXISTS; // fails, even though foo=null, and has never been written too
> """
>
> Nimi
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: copy and rename sstable files as keyspace migration approach

2016-02-23 Thread Tyler Hobbs

On Tue, Feb 23, 2016 at 12:36 PM, Robert Coli <rc...@eventbrite.com> wrote:

> [1] In some very new versions of Cassandra, this may not be safe to do
> with certain meta information files which are sadly no longer immutable.

I presume you're referring to the index summary (i.e Summary.db files).
These just contain a sampling of the (immutable) Index.db files, and are
safe to hardlink in the way that you've described.  The sampling level of
the summary (which is what can change over time) is serialized at the start
of the Summary.db file.

If you're truly paranoid, you can skip the Summary.db files and they'll be
rebuilt on startup.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: High Bloom filter false ratio

2016-02-18 Thread Tyler Hobbs

You can try slightly lowering the bloom_filter_fp_chance on your table.

Otherwise, it's possible that you're repeatedly querying one or two
partitions that always trigger a bloom filter false positive.  You could
try manually tracing a few queries on this table (for non-existent
partitions) to see if the bloom filter rejects them.

Depending on your Cassandra version, your false positive ratio could be
inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525

There are also a couple of recent improvements to bloom filters:
* https://issues.apache.org/jira/browse/CASSANDRA-8413
* https://issues.apache.org/jira/browse/CASSANDRA-9167


On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal <anis...@gmail.com> wrote:

> Hello,
>
> We have a table with composite partition key with humungous cardinality,
> its a combination of (long,long). On the table we have
> bloom_filter_fp_chance=0.01.
>
> On doing "nodetool cfstats" on the 5 nodes we have in the cluster we are
> seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.
>
> I thought over time the bloom filter would adjust to the key space
> cardinality, we have been running the cluster for a long time now but have
> added significant traffic from Jan this year, which would not lead to
> writes in the db but would lead to high reads to see if are any values.
>
> Are there any settings that can be changed to allow better ratio.
>
> Thanks
> Anishek
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: „Using Timestamp“ Feature

2016-02-18 Thread Tyler Hobbs

2016-02-18 2:00 GMT-06:00 Matthias Niehoff <matthias.nieh...@codecentric.de>
:

>
> * is the 'using timestamp' feature (and providing statement timestamps)
> sufficiently robust and mature to build an application on?
>

Yes.  It's been there since the start of CQL3.


> * In a BatchedStatement, can different statements have different
> (explicitly provided) timestamps, or is the BatchedStatement's timestamp
> used for them all? Is this specified / stable behaviour?
>

Yes, you can separate timestamps per statement.  And, in fact, if you
potentially mix inserts and deletes on the same rows, you *should *use
explicit timestamps with different values.  See the timestamp notes here:
http://cassandra.apache.org/doc/cql3/CQL.html#batchStmt


> * cqhsh reports a syntax error when I use 'using timestamp' with an update
> statement (works with 'insert'). Is there a good reason for this, or is it
> a bug?
>

The "USING TIMESTAMP" goes in a different place in update statements.  It
should be something like:

UPDATE mytable USING TIMESTAMP ? SET col = ? WHERE key = ?


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Duplicated key with an IN statement

2016-02-04 Thread Tyler Hobbs

On Thu, Feb 4, 2016 at 9:57 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> there's a bug in CHANGES.TXT for this issue. It says: "Duplicate rows
> returned when in clause has repeated values (CASSANDRA-6707)", but the
> issue number is really 6706.
>

Thanks, I've fixed this.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cqlsh hangs & closes automatically

2016-02-02 Thread Tyler Hobbs

The default page size in cqlsh is 100, so perhaps something is going on
there?  Try running cqlsh with --debug to see if there are any errors.

On Tue, Feb 2, 2016 at 11:21 AM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

> My cqlsh prompt hangs and closes if I try to fetch just 100 rows using
> select * query. Cassandra-cli does the job. Any solution?
>
>
>
> Thanks
> Anuj
>

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Tyler Hobbs

itLogReplayException:
> Unexpected error deserializing mutation; saved to
> /tmp/mutation7465380878750576105dat.  This may be caused by replaying a
> mutation against a table with the same name but incompatible schema.
> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
> enough bytes to read a map
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677)
> [apache-cassandra-3.1.1.jar:3.1.1]
>
> I can no longer start my nodes.
>
> How can I restart my cluster?
> Is this problem known?
> Is there a better Cassandra 3 version which would behave better with
> respect to this problem?
> Would there be a better memory configuration to select for my nodes?
> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>
>
> Thank you very much for your advice.
>
> Kind regards
>
> Jean
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Sorting & pagination in apache cassandra 2.1

2016-01-07 Thread Tyler Hobbs

On Thu, Jan 7, 2016 at 6:45 AM, anuja jain <anujaja...@gmail.com> wrote:

> My question is, what is the alternative if we need to order by col3 or
> col4 in my above example without including col2 in order by clause.
>

The server-side alternative is to create a second table (or a materialized
view, if you're using 3.0+) that uses a different clustering order.
Cassandra purposefully only supports simple and efficient queries that can
be handled quickly (with a few exceptions), and arbitrary ordering is not
part of that, especially if you consider complications like paging.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra 3.1 - Aggregation query failure

2015-12-29 Thread Tyler Hobbs

>
>
> 1. Is it possible to "tune" the page size or is it hard-coded internally ?
>

If a page size is set for the request at the driver level, that page size
will be used internally.  Otherwise, it defaults to something reasonable
(probably ~5k rows).


> 2. Is read-repair performed on EACH page or is it done on the whole
> requested rows once they are fetched ?
>

It's performed on each page as it's read.  Do note that read repair doesn't
happen for multi-partition range reads, regardless of paging or aggregation.


>
> Question 2. is relevant in some particular scenarios when the user is
> using CL QUORUM (or more) and some replicas are out-of-sync. Even in the
> case of aggregation over a single partition, if this partition is wide and
> spans many fetch pages, the time the coordinator performs all the
> read-repair and reconcile over QUORUM replicas, the query may timeout very
> quickly.
>

Yes, that's possible.  Timeouts for these queries should be adjusted
accordingly.  It's worth noting that the read_request_timeout_in_ms setting
applies per-page, so coordinator-level timeouts shouldn't be severely
affected by this.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra 3.1 - Aggregation query failure

2015-12-18 Thread Tyler Hobbs

On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Cassandra will perform a full table scan and fetch all the data in memory
> to apply the aggregate function.

Just to clarify for others on the list: when executing aggregation
functions, Cassandra *will* use paging internally, so at most one page
worth of data will be held in memory at a time.  However, if your
aggregation function retains a large amount of data, this may contribute to
heap pressure.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: [RELEASE] Apache Cassandra 3.1 released

2015-12-11 Thread Tyler Hobbs

On Fri, Dec 11, 2015 at 1:59 AM, Janne Jalkanen <janne.jalka...@ecyrd.com>
wrote:

>
> So there is no reason why you would ever want to run 3.1 then?
>

Probably not.

>  Why was it released?
>

For consistency.  It's the first release in the new tick-tock release
scheme.  Skipping that would have been a bit strange (although I'll agree
it's also strange to have 3.0.1 == 3.1).

>  What is the lifecycle of 3.0.x? Will it become obsolete once 3.3 comes
> out?
>

3.0.x will continue until 4.0.

>
>
>- If you want access to the new features introduced in even release
>versions of 3.x (3.2, 3.4, 3.6), you'll want to run the latest odd version
>(3.3, 3.5, 3.7, etc) after the release containing the feature you want
>access to (so, if the feature's introduced in 3.4 and we haven't dropped
>3.5 yet, obviously you'd need to run 3.4).
>
>
> Are there going to be minor releases of the even releases, i.e. 3.2.1?
>

Not unless we discover critical bugs in 3.2, such as security
vulnerabilities or corruption issues.

>  Or will they all be delegated to 3.3.x -series?  Or will there be a
> series of identical releases like 3.1 and 3.0.1 with 3.2.1 and 3.3?
>

There's not going to be a 3.3.x series, there will be one 3.3 release
(unless there is a critical bug, as mentioned above).

There are two separate release lines going on:

3.0.1 -> 3.0.2 -> 3.0.3 -> 3.0.4 -> ... (every release is a bugfix)

3.1 -> 3.2 -> 3.3 -> 3.4 -> ... (odd numbers are bugfix releases, even
numbers may contain new features)

>
> This is only going to be the case during the transition phase from old
> release cycles to tick-tock. We're targeting changes to CI and quality
> focus going forward to greatly increase the stability of the odd releases
> of major branches (3.1, 3.3, etc) so, for the 4.X releases, our
> recommendation would be to run the highest # odd release for greatest
> stability.
>
>
> So here you tell to run 3.1, but above you tell to run 3.0.1?  Why is
> there a different release scheme specifically for 3.0.x instead of putting
> those fixes to 3.1?
>

We don't know how well the tick-tock release scheme will stabilize yet.  As
a safety net, we're doing our traditional release scheme for 3.0.x.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: [RELEASE] Apache Cassandra 3.1 released

2015-12-09 Thread Tyler Hobbs

This explains the new release plans in detail:
http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/

3.0.1 and 3.1 are a special case, because they happen to be identical.
However, 3.0.2 will not be the same as 3.2.  The 3.0.2 will only contain
bugfixes, while 3.2 will introduce new features.  There will not be a 3.1.1
or 3.2.1 unless a very critical bug is discovered in 3.1 or 3.2.

If you "just want to run the most stable 3.0", stick with 3.0.x for now
(which is 3.0.1).  If you want to use bleeding-edge features, try out 3.2
when it's released (but be warned that it may not be as stable).

On Wed, Dec 9, 2015 at 8:27 AM, Hannu Kröger <hkro...@gmail.com> wrote:

> Hi,
>
> I feel the same as well. Would you skip 3.2 when you release another round
> of bug fixes after one round of bug fixes? Or would 3.2 be released after
> 3.3.? :P
>
> BR,
> Hannu
>
> On 09 Dec 2015, at 16:05, Kai Wang <dep...@gmail.com> wrote:
>
> Janne,
>
> You are not alone. I am also confused by that "Under normal conditions
> ..." statement. I can really use some examples such as:
> 3.0.0 = ?
> 3.0.1 = ?
> 3.1.0 = ?
> 3.1.1 = ? (this should not happen under normal conditions because the fix
> should be in 3.3.0 - the next bug fix release?)
>
> On Wed, Dec 9, 2015 at 3:05 AM, Janne Jalkanen <janne.jalka...@ecyrd.com>
> wrote:
>
>>
>> I’m sorry, I don’t understand the new release scheme at all. Both of
>> these are bug fixes on 3.0? What’s the actual difference?
>>
>> If I just want to run the most stable 3.0, should I run 3.0.1 or 3.1?
>> Will 3.0 gain new features which will not go into 3.1, because that’s a bug
>> fix release on 3.0? So 3.0.x will contain more features than 3.1, as
>> even-numbered releases will be getting new features? Or is 3.0.1 and 3.1
>> essentially the same thing? Then what’s the role of 3.1? Will there be more
>> than one 3.1? 3.1.1? Or is it 3.3? What’s the content of that? 3.something
>> + patches = 3.what?
>>
>> What does this statement in the referred blog post mean? "Under normal
>> conditions, we will NOT release 3.x.y stability releases for x > 0.” Why
>> are the normal conditions being violated already by releasing 3.1 (since 1
>> > 0)?
>>
>> /Janne, who is completely confused by all this, and suspects he’s the
>> target of some hideous joke.
>>
>> On 8 Dec 2015, at 22:26, Jake Luciani <j...@apache.org> wrote:
>>
>>
>> The Cassandra team is pleased to announce the release of Apache Cassandra
>> version 3.1. This is the first release from our new Tick-Tock release
>> process[4].
>> It contains only bugfixes on the 3.0 release.
>>
>> Apache Cassandra is a fully distributed database. It is the right choice
>> when you need scalability and high availability without compromising
>> performance.
>>
>>  http://cassandra.apache.org/
>>
>> Downloads of source and binary distributions are listed in our download
>> section:
>>
>>  http://cassandra.apache.org/download/
>>
>> This version is a bug fix release[1] on the 3.x series. As always, please
>> pay
>> attention to the release notes[2] and Let us know[3] if you were to
>> encounter
>> any problem.
>>
>> Enjoy!
>>
>> [1]: http://goo.gl/rQJ9yd (CHANGES.txt)
>> [2]: http://goo.gl/WBrlCs (NEWS.txt)
>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>> [4]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
>>
>>
>>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra 3.0.0 connection problem

2015-11-19 Thread Tyler Hobbs

On Thu, Nov 19, 2015 at 1:13 AM, Enrico Sola <sola.enrico...@gmail.com>
wrote:

> Hi, I'm new to Cassandra and I've recently upgraded to 3.0.0 on Ubuntu
> Linux 14.04 LTS, through apt-get upgrade not manual installation, after the
> update all was fine so I could access to my keyspaces using cqlsh but I
> can't access to Cassandra using DataStax PHP Driver because I get this
> error: "No hosts available for the control connection”.
> The connection parameters are the same of 2.2.3 version (and was working
> fine).
> I don't know if is this a bug or a problem of the PHP driver but my
> systems use Cassandra and are now offline, so it's a known issue with a
> solution?
>

I don't think the PHP driver supports Cassandra 3.0 yet.  There were some
changes to the system schema tables that are probably preventing it from
connecting successfully.

> I tried also to downgrade to 2.2.3 version but after that Cassandra didn't
> start due to keyspace loading problem, I'm just looking for a quick
> solution so doesn't matter if I have to downgrade to 2.2.3, so how can I do
> the downgrade without lose my datas?
>

Downgrading major versions isn't supported, which is why we recommend that
you take a snapshot before upgrading.  Your only real option for
downgrading without data loss is to dump your data (using cqlsh's COPY TO
or something similar) and then re-load it on 2.2 (using cqlsh's COPY FROM
or something similar).

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cqlsh copy to and copy from

2015-11-19 Thread Tyler Hobbs

If the fields are null, COPY TO should just be generating "{field1: null,
field2: null}".

Would you mind opening a ticket here with steps to reproduce:
https://issues.apache.org/jira/browse/CASSANDRA

On Thu, Nov 19, 2015 at 1:05 AM, Vova Shelgunov <vvs...@gmail.com> wrote:

> Hi all,
>
> I have a trouble with copy functionality in cassandra 3.0.
>
> When I am trying to copy my table to file, some of UDTs have the following
> representation:
>
> {field1: , field2: }
>
> They have no values, and when I tried to restore this table, this rows was
> not imported.
>
> Do you plan to fix that, e.g. fill with default values or exclude them?
>
> Thanks.
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Timeout with static column

2015-11-12 Thread Tyler Hobbs

 Processing response from /
> 192.168.169.20 [SharedPool-Worker-2] | 2015-11-11 19:38:40.754000 |
> 192.168.169.10 | 330177
>
>   Request complete | 2015-11-11 19:38:40.813963 | 192.168.169.10 |
>389963
>
> This specific key has about 1900 records of around 50/100 bytes each which
> makes it quite large (compared to others), and the `used` static column is
> True.
>
> I know this is a C* anti-pattern, but regularly, smaller (older)
> `sequence_nr` are deleted.
> I think this isn't a problem since most of the read requests are bounded
> by sequence_nr (and are pretty fast), so there are certainly many
> tombstones (even though the trace above doesn't tell that).
>
> What's strange is that it seems the query scans the whole set of records,
> even though it should return only the static column (whose by definition
> has only one value indepedently of the number of records), so it should be
> pretty fast, isn't it?
>
> Note that using `SELECT DISTINCT` doesn't seem to change anything
> regarding speed (I understand that it is the recommended way of doing this
> kind of queries).
>
> Anyone can explain me how this problem can be solved, or what could be its
> root cause?
>
> Thanks for any answers,
> --
> Brice Figureau
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Multi-column slice restrictions not respected by the returned result

2015-11-11 Thread Tyler Hobbs

Correct, it's a full tuple comparison.

On Wed, Nov 11, 2015 at 1:43 PM, Yuri Shkuro <y...@uber.com> wrote:

> Thanks, Tyler.
>
> I also realized that I misunderstood multi-column restriction. Evidently,
> (a, b) > (x, y) does not imply component-wise restriction (a>x && b>y) in
> CQL, it only implies full tuple comparison. That explains why my condition
> (a, b) > (2, 10) was matching row (2, 11).
>
> On Wed, Nov 11, 2015 at 2:31 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>
>> This is a known problem with multi-column slices and mixed ASC/DESC
>> clustering orders.  See
>> https://issues.apache.org/jira/browse/CASSANDRA-7281 for details.
>>
>> On Tue, Nov 10, 2015 at 11:02 PM, Yuri Shkuro <y...@uber.com> wrote:
>>
>>> According to this blog:
>>> http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
>>>
>>> I should be able to do multi-column restrictions on clustering columns,
>>> as in the blog example: WHERE (server, time) >= (‘196.8.0.0’, 12:00) AND
>>> (server, time) <= (‘196.8.255.255’, 14:00)
>>>
>>> However, I am getting data returned from such query that does not match
>>> the restrictions. Tried on Cassandra 2.17 and 2.2.3. Here's an example:
>>>
>>> CREATE TABLE IF NOT EXISTS dur (
>>> s  text,
>>> nd bigint,
>>> ts bigint,
>>> tidbigint,
>>> PRIMARY KEY (s, nd, ts)
>>> ) WITH CLUSTERING ORDER BY (nd ASC, ts DESC);
>>>
>>> insert INTO dur (s, nd, ts, tid) values ('x', 1, 10, 99);
>>> insert INTO dur (s, nd, ts, tid) values ('x', 2, 11, 98) ;
>>> insert INTO dur (s, nd, ts, tid) values ('x', 3, 10, 97) ;
>>> insert INTO dur (s, nd, ts, tid) values ('x', 1, 11, 96) ;
>>> insert INTO dur (s, nd, ts, tid) values ('x', 1, 12, 95) ;
>>> insert INTO dur (s, nd, ts, tid) values ('x', 2, 10, 94) ;
>>> insert INTO dur (s, nd, ts, tid) values ('x', 2, 12, 93) ;
>>> insert INTO dur (s, nd, ts, tid) values ('x', 3, 11, 92) ;
>>> insert INTO dur (s, nd, ts, tid) values ('x', 3, 12, 91) ;
>>>
>>> select * from dur where s='x' and (nd,ts) > (2, 11);
>>>
>>>  s | nd | ts | tid
>>> ---+++-
>>>  x |  2 | 10 |  94
>>>  x |  3 | 12 |  91
>>>  x |  3 | 11 |  92
>>>  x |  3 | 10 |  97
>>> (4 rows)
>>>
>>> The first row in the result does not satisfy the restriction (nd,ts) >
>>> (2, 11). Am I doing something incorrectly?
>>>
>>> Thanks,
>>> --Yuri
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Multi-column slice restrictions not respected by the returned result

2015-11-11 Thread Tyler Hobbs

This is a known problem with multi-column slices and mixed ASC/DESC
clustering orders.  See https://issues.apache.org/jira/browse/CASSANDRA-7281
for details.

On Tue, Nov 10, 2015 at 11:02 PM, Yuri Shkuro <y...@uber.com> wrote:

> According to this blog:
> http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
>
> I should be able to do multi-column restrictions on clustering columns, as
> in the blog example: WHERE (server, time) >= (‘196.8.0.0’, 12:00) AND
> (server, time) <= (‘196.8.255.255’, 14:00)
>
> However, I am getting data returned from such query that does not match
> the restrictions. Tried on Cassandra 2.17 and 2.2.3. Here's an example:
>
> CREATE TABLE IF NOT EXISTS dur (
> s  text,
> nd bigint,
> ts bigint,
> tidbigint,
> PRIMARY KEY (s, nd, ts)
> ) WITH CLUSTERING ORDER BY (nd ASC, ts DESC);
>
> insert INTO dur (s, nd, ts, tid) values ('x', 1, 10, 99);
> insert INTO dur (s, nd, ts, tid) values ('x', 2, 11, 98) ;
> insert INTO dur (s, nd, ts, tid) values ('x', 3, 10, 97) ;
> insert INTO dur (s, nd, ts, tid) values ('x', 1, 11, 96) ;
> insert INTO dur (s, nd, ts, tid) values ('x', 1, 12, 95) ;
> insert INTO dur (s, nd, ts, tid) values ('x', 2, 10, 94) ;
> insert INTO dur (s, nd, ts, tid) values ('x', 2, 12, 93) ;
> insert INTO dur (s, nd, ts, tid) values ('x', 3, 11, 92) ;
> insert INTO dur (s, nd, ts, tid) values ('x', 3, 12, 91) ;
>
> select * from dur where s='x' and (nd,ts) > (2, 11);
>
>  s | nd | ts | tid
> ---+++-
>  x |  2 | 10 |  94
>  x |  3 | 12 |  91
>  x |  3 | 11 |  92
>  x |  3 | 10 |  97
> (4 rows)
>
> The first row in the result does not satisfy the restriction (nd,ts) >
> (2, 11). Am I doing something incorrectly?
>
> Thanks,
> --Yuri
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread Tyler Hobbs

>
> the program use datastax driver 2.1.8 and use 5 thread to insert data to
> cassandra on the same machine


The client with five threads is probably your bottleneck.  Try running the
cassandra stress tool for comparison.  You should see at least double the
throughput.

On Thu, Nov 5, 2015 at 9:56 AM, Eric Stevens <migh...@gmail.com> wrote:

> > 512G memory , 128core cpu
>
> This seems dramatically oversized for a Cassandra node.  You'd do *much* 
> better
> to have a much larger cluster of much smaller nodes.
>
>
> On Thu, Nov 5, 2015 at 8:25 AM Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
>> I don't know what current numbers are, but last year the idea of getting
>> 1 million writes per second on a 96 node cluster was considered a
>> reasonable achievement. That would be roughly 10,000 writes per second per
>> node and you are getting twice that.
>>
>> See:
>> http://www.datastax.com/1-million-writes
>>
>> Or this Google test which hit 1 million writes per second with 330 nodes,
>> which would be roughly 3,000 writes per second per node:
>>
>> http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html
>>
>> So, is your question why your throughput is so good or are you
>> disappointed that it wasn't better?
>>
>> Cassandra is designed for clusters with lots of nodes, so if you want to
>> get an accurate measure of per-node performance you need to test with a
>> reasonable number of nodes and then divide aggregate performance by the
>> number of nodes, not test a single node alone. In short, testing a single
>> node in isolation is not a recommended approach to testing Cassandra
>> performance.
>>
>>
>> -- Jack Krupansky
>>
>> On Thu, Nov 5, 2015 at 9:05 AM, 郝加来 <ha...@neusoft.com> wrote:
>>
>>> hi
>>> veryone
>>> i setup cassandra 2.2.3 on a node , the machine 's environment is
>>> openjdk-1.8.0 , 512G memory , 128core cpu , 3T ssd .
>>> the token num is 256 on a node , the program use datastax driver 2.1.8
>>> and use 5 thread to insert data to cassandra on the same machine , the data
>>> 's capcity is 6G  and 1157000 line .
>>>
>>> why is the throughput 2/s on the node ?
>>>
>>>
>>> # Per-thread stack size.
>>>
>>> JVM_OPTS="$JVM_OPTS -Xss512k"
>>>
>>>
>>>
>>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"
>>>
>>>
>>>
>>> # GC tuning options
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>>>
>>> JVM_OPTS="$JVM_OPTS
>>> -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"
>>>
>>>
>>>
>>> memtable_heap_space_in_mb: 1024
>>>
>>> memtable_offheap_space_in_mb: 10240
>>>
>>> memtable_cleanup_threshold: 0.55
>>>
>>> memtable_allocation_type: heap_buffers
>>>
>>>
>>>
>>>
>>> 以上
>>> 谢谢
>>> --
>>>
>>> *郝加来*
>>>
>>> 金融华东事业部
>>>
>>> 东软集团股份有限公司
>>> 上海市闵行区紫月路1000号东软软件园
>>> Postcode：200241
>>> Tel：(86 21) 33578591
>>> Fax：(86 21) *23025565-111*
>>> Mobile：13764970711
>>> Email：ha...@neusoft.com
>>> Http://www.neusoft.com <http://www.neusoft.com/>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---
>>> Confidentiality Notice: The information contained in this e-mail and any
>>> accompanying attachment(s)
>>> is intended only for the use of the intended recipient and may be
>>> confidential and/or privileged of
>>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any
>>> reader of this communication is
>>> not the intended recipient, unauthorized use, forwarding, printing,
>>> storing, disclosure or copying
>>> is strictly prohibited, and may be unlawful.If you have received this
>>> communication in error,please
>>> immediately notify the sender by return e-mail, and delete the original
>>> message and all copies from
>>> your system. Thank you.
>>>
>>> ---
>>>
>>
>>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Error code=1000

2015-11-03 Thread Tyler Hobbs

When you say "I am using cassandra standalone", do you mean that you're
running a single-node cluster?  If that's the case, then I'm guessing your
problem is that the replication factor for the keyspace is set to 2 or 3
(instead of 1).

On Sat, Oct 31, 2015 at 3:00 PM, Ricardo Sancho <sancho.rica...@gmail.com>
wrote:

> One or more of your nodes, depending on your replication factor, is not
> answering in time. Either they are down or have too much load that they are
> not able to answer every request before the timeout expires.
> On 31 Oct 2015 20:35, "Eduardo Alfaia" <eduardocalf...@gmail.com> wrote:
>
>> Hi guys,
>>
>> Could you help me with this error?
>>
>> cassandra.Unavailable: code=1000 [Unavailable exception] message="Cannot
>> achieve consistency level LOCAL_QUORUM" info={'required_replicas': 2,
>> 'alive_replicas': 1, 'consistency': 'LOCAL_QUORUM’}
>>
>> I am using cassandra standalone
>>
>> Thanks
>>
>>

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Error Code

2015-10-29 Thread Tyler Hobbs

That means the driver could not decode a Result message from Cassandra.
Can you post the query that's failing along with your schema for that table
to the Python driver mailing list?  Here's a link:
https://groups.google.com/a/lists.datastax.com/forum/#!forum/python-driver-user

On Thu, Oct 29, 2015 at 9:43 AM, Eduardo Alfaia <eduardocalf...@gmail.com>
wrote:

> I am using a python driver from DataStax. Cassandra driver 2.7.2
>
> On 29 Oct 2015, at 15:26, Chris Lohfink <clohfin...@gmail.com> wrote:
>
> It means a response (opcode 8) message couldn't be decoded. What driver
> are you using? What version? What version of C*?
>
> Chris
>
> On Thu, Oct 29, 2015 at 9:19 AM, Eduardo Alfaia <eduardocalf...@gmail.com>
> wrote:
>
>> yes, but what does it mean?
>>
>> On 29 Oct 2015, at 15:18, Kai Wang <dep...@gmail.com> wrote:
>>
>>
>> https://github.com/datastax/python-driver/blob/75ddc514617304797626cc69957eb6008695be1e/cassandra/connection.py#L573
>>
>> Is your error message complete?
>>
>> On Thu, Oct 29, 2015 at 9:45 AM, Eduardo Alfaia <eduardocalf...@gmail.com
>> > wrote:
>>
>>> Hi Guys,
>>>
>>> Does anyone know what error code in cassandra is?
>>>
>>> Error decoding response from Cassandra. opcode: 0008;
>>>
>>> Thanks
>>>
>>
>>
>>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Is there any configuration so that local program on C* node can connect using localhost and remote program using IP/name?

2015-10-20 Thread Tyler Hobbs

On Mon, Oct 19, 2015 at 7:35 PM, Ravi <ravi.ga...@gmail.com> wrote:

>
> I am using apache-cassandra-2.2.0.

You should upgrade to 2.2.3.  There were some bugs that you probably want
to avoid in 2.2.0.

>
> Is there any configuration so that local program on C* node can connect
> using localhost as connection url and remote program's using IP/name in
> connection url?

Set rpc_address to 0.0.0.0 to bind all interfaces.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Tyler Hobbs

y$DroppableRunnable.run(StorageProxy.java:2187)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_60]
> at
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-2.2.2.jar:2.2.2]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException:
> java.io.IOException: Seek position 182054 is not within mmap segment (seg
> offs: 0, length: 182054)
> at
> org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:250)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.io.sstable.format.SSTableReader.getPosition(SSTableReader.java:1558)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.(SSTableSliceIterator.java:42)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:75)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2004)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1808)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:360)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1537)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2183)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> ... 4 common frames omitted
> Caused by: java.io.IOException: Seek position 182054 is not within mmap
> segment (seg offs: 0, length: 182054)
> at
> org.apache.cassandra.io.util.ByteBufferDataInput.seek(ByteBufferDataInput.java:47)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.io.util.AbstractDataInput.skipBytes(AbstractDataInput.java:33)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.io.util.FileUtils.skipBytesFully(FileUtils.java:405)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.RowIndexEntry$Serializer.skipPromotedIndex(RowIndexEntry.java:164)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.RowIndexEntry$Serializer.skip(RowIndexEntry.java:155)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:244)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
>
>
>
>
> On Oct 9, 2015, at 9:26 AM, Carlos Alonso <i...@mrcalonso.com> wrote:
>
> Yeah, I was about to suggest the compaction strategy too. Leveled
> compaction sounds like a better fit when records are being updated
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 8 October 2015 at 22:35, Tyler Hobbs <ty...@datastax.com> wrote:
>
>> Upgrade to 2.2.2.  Your sstables are probably not compacting due to
>> CASSANDRA-10270 <https://issues.apache.org/jira/browse/CASSANDRA-10270>,
>> which was fixed in 2.2.2.
>>
>> Additionally, you may want to look into using leveled compaction (
>> http://www.datastax.com/dev/blog/when-to-use-leveled-compaction).
>>
>> On Thu, Oct 8, 2015 at 4:27 PM, Nazario Parsacala <dodongj...@gmail.com>
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> so we are developing a system that computes profile of things that it
>>> observes. The observation comes in form of events. Each thing that it
>>> observe has an id and each thing has a set of subthings in it which has
>>> measurement of some kind. Roughly there are about 500 subthings within each

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Tyler Hobbs

ocalReadRunnable.runMayThrow(StorageProxy.java:1537)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2183)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> ... 4 common frames omitted
> Caused by: java.io.IOException: Seek position 182054 is not within mmap
> segment (seg offs: 0, length: 182054)
> at
> org.apache.cassandra.io.util.ByteBufferDataInput.seek(ByteBufferDataInput.java:47)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.io.util.AbstractDataInput.skipBytes(AbstractDataInput.java:33)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.io.util.FileUtils.skipBytesFully(FileUtils.java:405)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.RowIndexEntry$Serializer.skipPromotedIndex(RowIndexEntry.java:164)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.db.RowIndexEntry$Serializer.skip(RowIndexEntry.java:155)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
> at
> org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:244)
> ~[apache-cassandra-2.2.2.jar:2.2.2]
>
>
>
>
> On Oct 9, 2015, at 9:26 AM, Carlos Alonso <i...@mrcalonso.com> wrote:
>
> Yeah, I was about to suggest the compaction strategy too. Leveled
> compaction sounds like a better fit when records are being updated
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 8 October 2015 at 22:35, Tyler Hobbs <ty...@datastax.com> wrote:
>
>> Upgrade to 2.2.2.  Your sstables are probably not compacting due to
>> CASSANDRA-10270 <https://issues.apache.org/jira/browse/CASSANDRA-10270>,
>> which was fixed in 2.2.2.
>>
>> Additionally, you may want to look into using leveled compaction (
>> http://www.datastax.com/dev/blog/when-to-use-leveled-compaction).
>>
>> On Thu, Oct 8, 2015 at 4:27 PM, Nazario Parsacala <dodongj...@gmail.com>
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> so we are developing a system that computes profile of things that it
>>> observes. The observation comes in form of events. Each thing that it
>>> observe has an id and each thing has a set of subthings in it which has
>>> measurement of some kind. Roughly there are about 500 subthings within each
>>> thing. We receive events containing measurements of these 500 subthings
>>> every 10 seconds or so.
>>>
>>> So as we receive events, we  read the old profile value, calculate the
>>> new profile based on the new value and save it back. We use the following
>>> schema to hold the profile.
>>>
>>> CREATE TABLE myprofile (
>>> id text,
>>> month text,
>>> day text,
>>> hour text,
>>> subthings text,
>>> lastvalue double,
>>> count int,
>>> stddev double,
>>>  PRIMARY KEY ((id, month, day, hour), subthings)
>>> ) WITH CLUSTERING ORDER BY (subthings ASC) );
>>>
>>>
>>> This profile will then be use for certain analytics that can use in the
>>> context of the ‘thing’ or in the context of specific thing and subthing.
>>>
>>> A profile can be defined as monthly, daily, hourly. So in case of
>>> monthly the month will be set to the current month (i.e. ‘Oct’) and the day
>>> and hour will be set to empty ‘’ string.
>>>
>>>
>>> The problem that we have observed is that over time (actually in just a
>>> matter of hours) we will see a huge degradation of query response  for the
>>> monthly profile. At the start it will be respinding in 10-100 ms and after
>>> a couple of hours it will go to 2000-3000 ms . If you leave it for a couple
>>> of days you will start experiencing readtimeouts . The query is basically
>>> just :
>>>
>>> select * from myprofile where id=‘1’ and month=‘Oct’ and day=‘’ and
>>> hour=‘'
>>>
>>> This will have only about 500 rows or so.
>>>
>>>
>>> I believe that this is cause by the fact there are multiple updates done
>>> to this specific partition. So what do we think can be done to resolve this
>>> ?
>>>
>>> BTW, I am using Cassandra 2.2.1 . And since this is a test , this is
>>> just running on a single node.
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra query degradation with high frequency updated tables.

2015-10-08 Thread Tyler Hobbs

Upgrade to 2.2.2.  Your sstables are probably not compacting due to
CASSANDRA-10270 <https://issues.apache.org/jira/browse/CASSANDRA-10270>,
which was fixed in 2.2.2.

Additionally, you may want to look into using leveled compaction (
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction).

On Thu, Oct 8, 2015 at 4:27 PM, Nazario Parsacala <dodongj...@gmail.com>
wrote:

>
> Hi,
>
> so we are developing a system that computes profile of things that it
> observes. The observation comes in form of events. Each thing that it
> observe has an id and each thing has a set of subthings in it which has
> measurement of some kind. Roughly there are about 500 subthings within each
> thing. We receive events containing measurements of these 500 subthings
> every 10 seconds or so.
>
> So as we receive events, we  read the old profile value, calculate the new
> profile based on the new value and save it back. We use the following
> schema to hold the profile.
>
> CREATE TABLE myprofile (
> id text,
> month text,
> day text,
> hour text,
> subthings text,
> lastvalue double,
> count int,
> stddev double,
>  PRIMARY KEY ((id, month, day, hour), subthings)
> ) WITH CLUSTERING ORDER BY (subthings ASC) );
>
>
> This profile will then be use for certain analytics that can use in the
> context of the ‘thing’ or in the context of specific thing and subthing.
>
> A profile can be defined as monthly, daily, hourly. So in case of monthly
> the month will be set to the current month (i.e. ‘Oct’) and the day and
> hour will be set to empty ‘’ string.
>
>
> The problem that we have observed is that over time (actually in just a
> matter of hours) we will see a huge degradation of query response  for the
> monthly profile. At the start it will be respinding in 10-100 ms and after
> a couple of hours it will go to 2000-3000 ms . If you leave it for a couple
> of days you will start experiencing readtimeouts . The query is basically
> just :
>
> select * from myprofile where id=‘1’ and month=‘Oct’ and day=‘’ and hour=‘'
>
> This will have only about 500 rows or so.
>
>
> I believe that this is cause by the fact there are multiple updates done
> to this specific partition. So what do we think can be done to resolve this
> ?
>
> BTW, I am using Cassandra 2.2.1 . And since this is a test , this is just
> running on a single node.
>
>
>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: CQL error when adding multiple conditional update statements in the same batch

2015-10-08 Thread Tyler Hobbs

I assume you're running Cassandra 2.0?

In 2.1.1 the check for "incompatible" conditions was removed (see this
comment
<https://issues.apache.org/jira/browse/CASSANDRA-6839?focusedCommentId=14097793=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14097793>
for details).  I wouldn't be surprised if that check didn't work properly
for batch statements in 2.0.

On Thu, Oct 8, 2015 at 3:22 PM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> could you also provide the columnfamily schema.
>
> On Thu, Oct 8, 2015 at 4:13 PM, Peddi, Praveen <pe...@amazon.com> wrote:
>
>> Hi,
>>
>> I am trying to understand this error message that CQL is throwing when I
>> try to update 2 different rows with different values on same conditional
>> columns. Doesn't CQL support that? I am wondering why CQL has this
>> restriction (since condition applies to each row independently, why does
>> CQL even care if the values of the condition is same or different).
>>
>> BEGIN BATCH
>> UPDATE activities SET state='CLAIMED',version=11 WHERE key='Key1' IF 
>> version=10;
>> UPDATE activities SET state='ALLOCATED',version=2 WHERE key='Key2' IF 
>> version=1;
>> APPLY BATCH;
>>
>> gives the following error
>>
>> Bad Request: Duplicate and incompatible conditions for column version
>>
>> Is there anyway to update more than 1 row with different conditional
>> value for each row (other than executing these statements individually)?
>> -Praveen
>>
>>
>

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: How are writes handled while adding nodes to cluster?

2015-10-06 Thread Tyler Hobbs

When a node is joining, writes are sent to both the current replicas *and*
the joining replica.  However, the joining replica does not count towards
the consistency level.  So, for example, if you write at
ConsistencyLevel.TWO, and only one existing replica and the joining replica
respond, the write will be considered a failure.

On Tue, Oct 6, 2015 at 4:43 AM, Erik Forsberg <forsb...@opera.com> wrote:

> Hi!
>
> How are writes handled while I'm adding a node to a cluster, i.e. while
> the new node is in JOINING state?
>
> Are they queued up as hinted handoffs, or are they being written to the
> joining node?
>
> In the former case I guess I have to make sure my max_hint_window_in_ms
> is long enough for the node to become NORMAL or hints will get dropped
> and I must do repair. Am I right?
>
> Thanks,
> \EF
>

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Repair corrupt SSTable from power outage?

2015-10-02 Thread Tyler Hobbs

>> ... 14 common frames omitted
>>
>>
>> I found some people recommending scrubbing the sstable so I attempted
>> that and got the following error:
>>
>> bin/sstablescrub system sstable_activity -v
>>
>>
>> ERROR 17:26:03 Exiting forcefully due to file system exception on
>> startup, disk failure policy "stop"
>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>> java.io.EOFException
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> [na:1.8.0_60]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [na:1.8.0_60]
>>     at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_60]
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_60]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
>> Caused by: java.io.EOFException: null
>> at
>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>> ~[na:1.8.0_60]
>> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>> ~[na:1.8.0_60]
>> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>> ~[na:1.8.0_60]
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> ... 14 common frames omitted
>>
>>
>> Is there a fix for this?
>>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: JSON Order By

2015-10-02 Thread Tyler Hobbs

On Thu, Oct 1, 2015 at 9:11 AM, Ashish Soni <asoni.le...@gmail.com> wrote:

> I have a below structure stored in cassandra and i would like to get the
> internal array sorted by a property when i select it , Please let me know
> if there is way to do that .
>
> I need to sort the rules Array by property ruleOrder when i select
>

Unfortunately, that's not possible. Cassandra can only order result rows by
the clustering columns.  The new JSON functionality doesn't change this, it
just adds a new input/output format.  You'll need to sort the results
client-side.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Secondary index is causing high CPU load

2015-09-29 Thread Tyler Hobbs

See https://issues.apache.org/jira/browse/CASSANDRA-10414 for an overview
of why vnodes are currently less efficient for secondary index queries.

On Tue, Sep 29, 2015 at 12:45 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Sep 15, 2015 at 7:44 AM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> Read queries on a secondary index are somehow causing an excessively high
>> CPU load on all nodes in my DC.
>>
> ...
>
>> What really surprised me is that executing a single query on this
>> secondary index makes the "Local read count" in the cfstats for the index
>> go up with almost 20! When doing the same query on one of my "good"
>> nodes, it only increases with a small number, as I would expect.
>>
>> Could it be that the use of vnodes is causing these problems?
>>
>
> I am not too surprised to hear of this performance degradation.
>
> Yes, it is relatively likely to be the use of vnodes which is causing this
> problem. You could verify by having one of your nodes use 64 vnodes instead
> of the default 256... you will get less even distribution with current
> vnode random allocation, but you will pay less of a penalty for having
> multiple ranges...
>
> =Rob
>
>
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: who does generate timestamp during the write?

2015-09-08 Thread Tyler Hobbs

On Sat, Sep 5, 2015 at 8:32 AM, ibrahim El-sanosi <ibrahimsaba...@gmail.com>
wrote:

> So in this scenario, the latest data that wrote to the replicas is [K1,
> V2] which should be the correct one, but it reads [K1,V1] because of divert
> clock.
>
> Can such scenario occur?
>

Yes, it most certainly can.  There are a couple of pieces of advice for
this.  First, run NTP on all of your servers.  Second, if clock drift of a
second or so would cause problems for your data model (like your example),
change your data model.  Usually this means creating separate rows for each
version of the value (by adding a timuuid to the primary key, for example),
but in some cases lightweight transactions may also be suitable.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tyler Hobbs

See https://issues.apache.org/jira/browse/CASSANDRA-9753

On Tue, Sep 8, 2015 at 10:22 AM, Tom van den Berge <
tom.vandenbe...@gmail.com> wrote:

> I've been bugging you a few times, but now I've got trace data for a query
> with LOCAL_QUORUM that is being sent to a remove data center.
>
> The setup is as follows:
> NetworkTopologyStrategy: {"DC1":"1","DC2":"2"}
> Both DC1 and DC2 have 2 nodes.
> In DC2, one node is currently being rebuilt, and therefore does not
> contain all data (yet).
>
> The client app connects to a node in DC1, and sends a SELECT query with CL
> LOCAL_QUORUM, which in this case means ((1/2)+1=1.
> If all is ok, the query always produces a result, because the requested
> rows are guaranteed to be available in DC1.
>
> However, the query sometimes produces no result. I've been able to record
> the traces of these queries, and it turns out that the coordinator node in
> DC1 sometimes sends the query to DC2, to the node that is being rebuilt,
> and does not have the requested rows. I've included an example trace below.
>
> The coordinator node is 10.55.156.67, which is in DC1. The 10.88.4.194 node
> is in DC2.
> I've verified that the  CL=LOCAL_QUORUM by printing it when the query is
> sent (I'm using the datastax java driver).
>
>  activity
>  | source   | source_elapsed | thread
>
> ---+--++-
>Message received from /10.55.156.67
> |  10.88.4.194 | 48 | MessagingService-Incoming-/10.55.156.67
>  Executing single-partition query on aggregate
> |  10.88.4.194 |286 | SharedPool-Worker-2
>   Acquiring sstable references
> |  10.88.4.194 |306 | SharedPool-Worker-2
>Merging memtable tombstones
> |  10.88.4.194 |321 | SharedPool-Worker-2
> Partition index lookup allows skipping sstable 107
> |  10.88.4.194 |458 | SharedPool-Worker-2
> Bloom filter allows skipping sstable 1
> |  10.88.4.194 |489 | SharedPool-Worker-2
>  Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones
> |  10.88.4.194 |496 | SharedPool-Worker-2
> Merging data from memtables and 0 sstables
> |  10.88.4.194 |500 | SharedPool-Worker-2
>  Read 0 live and 0 tombstone cells
> |  10.88.4.194 |513 | SharedPool-Worker-2
>Enqueuing response to /10.55.156.67
> |  10.88.4.194 |613 | SharedPool-Worker-2
>   Sending message to /10.55.156.67
> |  10.88.4.194 |672 | MessagingService-Outgoing-/10.55.156.67
> Parsing SELECT * FROM Aggregate WHERE type=? AND typeId=?;
> | 10.55.156.67 | 10 | SharedPool-Worker-4
>Sending message to /10.88.4.194
> | 10.55.156.67 |   4335 |  MessagingService-Outgoing-/10.88.4.194
> Message received from /10.88.4.194
> | 10.55.156.67 |   6328 |  MessagingService-Incoming-/10.88.4.194
>Seeking to partition beginning in data file
> | 10.55.156.67 |  10417 | SharedPool-Worker-3
>  Key cache hit for sstable 389
> | 10.55.156.67 |  10586 | SharedPool-Worker-3
>
> My question is: how is it possible that the query is sent to a node in
> DC2?
> Since DC1 has 2 nodes and RF 1, the query should always be sent to the
> other node in DC1 if the coordinator does not have a replica, right?
>
> Thanks,
> Tom
>
>
>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: who does generate timestamp during the write?

2015-09-04 Thread Tyler Hobbs

Timestamps can come from three different places, in order of precedence
from highest to lowest:
* The CQL query itself through the "USING TIMESTAMP" clause
* The driver (or maybe application) at the protocol level when using the v3
native protocol or higher (which is available in Cassandra 2.1+).  This is
what I recommend using in most cases, because the driver can safely retry
idempotent writes.
* The coordinator node

On Fri, Sep 4, 2015 at 1:06 PM, Andrey Ilinykh <ailin...@gmail.com> wrote:

> I meant thrift based api. If we are talking about CQL then timestamps are
> generated by node you are connected to. This is a "client".
>
> On Fri, Sep 4, 2015 at 10:49 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Hi Andrey,
>>
>> I just came across this articale "
>>
>> "Each cell in a CQL table has a corresponding timestamp
>> which is taken from the clock on *the Cassandra node* *that orchestrates the
>> write.* When you are reading from a Cassandra cluster the node that
>> coordinates the read will compare the timestamps of the values it fetches.
>> Last write(=highest timestamp) wins and will be returned to the client."
>>
>> What do you think?
>>
>> "
>>
>> On Fri, Sep 4, 2015 at 6:41 PM, Andrey Ilinykh <ailin...@gmail.com>
>> wrote:
>>
>>> Coordinator doesn't generate timestamp, it is generated by client.
>>>
>>> On Fri, Sep 4, 2015 at 10:37 AM, ibrahim El-sanosi <
>>> ibrahimsaba...@gmail.com> wrote:
>>>
>>>> Ok, why coordinator does generate timesamp, as the write is a part of
>>>> Cassandra process after client submit the request to Cassandra?
>>>>
>>>> On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh <ailin...@gmail.com>
>>>> wrote:
>>>>
>>>>> Your application.
>>>>>
>>>>> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi <
>>>>> ibrahimsaba...@gmail.com> wrote:
>>>>>
>>>>>> Dear folks,
>>>>>>
>>>>>> When we hear about the notion of Last-Write-Wins in Cassandra
>>>>>> according to timestamp, *who does generate this timestamp during the
>>>>>> write, coordinator or each individual replica in which the write is going
>>>>>> to be stored?*
>>>>>>
>>>>>>
>>>>>> *Regards,*
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Ibrahim*
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Order By limitation or bug?

2015-09-04 Thread Tyler Hobbs

This query would be reasonable to support, so I've opened
https://issues.apache.org/jira/browse/CASSANDRA-10271 to fix that.

On Thu, Sep 3, 2015 at 7:48 PM, Alec Collier <alec.coll...@macquarie.com>
wrote:

> You should be able to execute the following
>
>
>
> SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY
> type, id DESC;
>
>
>
> Essentially the order by clause has to specify the clustering columns in
> order in full. It doesn’t by default know that you have already essentially
> filtered by type.
>
>
>
> *Alec Collier* | Workplace Service Design
>
> Corporate Operations Group - Technology | Macquarie Group Limited £
>
>
>
> *From:* Robert Wille [mailto:rwi...@fold3.com]
> *Sent:* Friday, 4 September 2015 7:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Order By limitation or bug?
>
>
>
> If you only specify the partition key, and none of the clustering columns,
> you can order by in either direction:
>
>
>
> SELECT data FROM import_file WHERE roll = 1 order by type;
>
> SELECT data FROM import_file WHERE roll = 1 order by type DESC;
>
>
>
> These are both valid. Seems like specifying the prefix of the clustering
> columns is just a specialization of an already-supported pattern.
>
>
>
> Robert
>
>
>
> On Sep 3, 2015, at 2:46 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>
>
> Limitation, not bug. The reason ?
>
>
>
> On disk, data are sorted by type first, and FOR EACH type value, the data
> are sorted by id.
>
>
>
> So to do an order by Id, C* will need to perform an in-memory re-ordering,
> not sure how bad it is for performance. In any case currently it's not
> possible, maybe you should create a JIRA to ask for lifting the limitation.
>
>
>
> On Thu, Sep 3, 2015 at 10:27 PM, Robert Wille <rwi...@fold3.com> wrote:
>
> Given this table:
>
>
>
> CREATE TABLE import_file (
>
>   roll int,
>
>   type text,
>
>   id timeuuid,
>
>   data text,
>
>   PRIMARY KEY ((roll), type, id)
>
> )
>
>
>
> This should be possible:
>
>
>
> SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY id
> DESC;
>
>
>
> but it results in the following error:
>
>
>
> Bad Request: Order by currently only support the ordering of columns
> following their declared order in the PRIMARY KEY
>
>
>
> I am ordering in the declared order in the primary key. I don’t see why
> this shouldn’t be able to be supported. Is this a known limitation or a bug?
>
>
>
> In this example, I can get the results I want by omitting the ORDER BY
> clause and adding WITH CLUSTERING ORDER BY (id DESC) to the schema.
> However, now I can only get descending order. I have to choose either
> ascending or descending order. I cannot get both.
>
>
>
> Robert
>
>
>
>
>
>
>
> This email, including any attachments, is confidential. If you are not the
> intended recipient, you must not disclose, distribute or use the
> information in this email in any way. If you received this email in error,
> please notify the sender immediately by return email and delete the
> message. Unless expressly stated otherwise, the information in this email
> should not be regarded as an offer to sell or as a solicitation of an offer
> to buy any financial product or service, an official confirmation of any
> transaction, or as an official statement of the entity sending this
> message. Neither Macquarie Group Limited, nor any of its subsidiaries,
> guarantee the integrity of any emails or attached files and are not
> responsible for any changes made to them by any other person.
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: TTLs on tables with only primary keys?

2015-08-05 Thread Tyler Hobbs

You can set the TTL on a row when you create it using an INSERT statement.
For example:

INSERT INTO mytable (partitionkey, clusteringkey) VALUES (0, 0) USING TTL
100;

However, Cassandra doesn't support the ttl() function on primary key
columns yet.  The ticket to support this is
https://issues.apache.org/jira/browse/CASSANDRA-9312.

On Tue, Aug 4, 2015 at 9:22 PM, Kevin Burton bur...@spinn3r.com wrote:

 I have a table which just has primary keys.

 basically:

 create table foo (

 sequence bigint,
 signature text,
 primary key( sequence, signature )
 )

 I need these to eventually get GCd however it doesn’t seem to work.

 If I then run:

 select ttl(sequence) from foo;

 I get:

 Cannot use selection function ttl on PRIMARY KEY part sequence

 …

 I get the same thing if I do it on the second column .. (signature).

 And the value doesn’t seem to be TTLd.

 What’s the best way to proceed here?


 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Thrift to cql : mixed static and dynamic columns with secondary index

2015-07-16 Thread Tyler Hobbs

This schema is something that we're providing a better CQL conversion for
in 3.0.  The one column you defined will become a static column, meaning
there is only one copy of it per partition.  The schema will look something
like this:

CREATE TABLE ref_file (
key text,
folder text static,
column1 text,
value text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE;

The column1 column will hold your dynamic field names, and the value
column will hold your dynamic field values.

Unfortunately, we probably won't support indexing the static column in
3.0.0, but we should be able to support that pretty soon afterwards.  The
ticket for that is https://issues.apache.org/jira/browse/CASSANDRA-8103.

If you don't want to wait for 3.x, migrating to a table like this is
probably your best option:

CREATE TABLE ref_file (
key text PRIMARY KEY,
folder text,
attributes maptext, text
)

In this case, the attributes map would hold your dynamic fields.

On Thu, Jul 16, 2015 at 4:22 AM, Clement Honore honor...@gmail.com wrote:

 Hi,

 I'm trying to migrate from Cassandra 1.1 and Hector to a more up-to-date
 stack like Cassandra 1.2+ and CQL3.

 I have read http://www.datastax.com/dev/blog/thrift-to-cql3
 https://webmail.one.grp/owa/redir.aspx?C=d70889e7914440b0ad13875bf00770a8URL=http%3a%2f%2fwww.datastax.com%2fdev%2fblog%2fthrift-to-cql3
  but
 my use case adds a complexity which seems not documented : I have a mixed
 column family with a secondary index.

 The column family has one explicitly declared column, which is indexed
 natively.
 In this column family, I'm also adding columns dynamically : some with
 predictive names, some with dynamic names.

 If I try to query this table in cql, I can access only the declared column
 (as stated in the documentation above).

 If I change the declaration by removing the explicitly declared column (as
 explained in the documentation above), I loose the secondary index on it.

 If I explicitly declare all the columns with an already known name
 (assuming I accept that I will get plenty of columns with a null value for
 the lines which don't have those attributes), I still can't manage columns
 with a dynamic name.
 And I can't declare a collection as my  comparator is UTF8Type.

 Should I migrate in a new table if I want to keep all the functionalities?
 This is really a solution I want to avoid.

 Here is an example representing my actual schema :

 I have a column family REF_File referencing my files.
 A file always has a folder. The folder is indexed to easily find my
 files.
 A file may have some attributes like name, size, mime .
 A file may have some comments referenced by a column COM_X where X is
 the comment ID.

 Column family creation :

 Create column family REF_File with comparator=UTF8Type and
 default_validation_class=UTF8Type and key_validation_class=UTF8Type and
 column_metadata=[{column_name: folder, validation_class: UTF8Type,
 index_type: KEYS}];

 set REF_File['id1']['folder']=folder1;
 set REF_File['id1']['name']=file1;
 set REF_File['id1']['size']=1234;
 set REF_File['id1']['COM_1']='';
 set REF_File['id1']['COM_2']='';
 set REF_File['id2']['folder']=folder1;
 set REF_File['id2']['name']=file2;
 set REF_File['id2']['mime']='image/jpeg';
 set REF_File['id2']['COM_1']='';

 Requesting :

 [default@DUNE_metadonnees] list REF_File;
 Using default limit of 100 Using default cell limit of 100
 ---
 RowKey: id1
 = (name=COM_1, value=, timestamp=1437034903045000) = (name=COM_2,
 value=, timestamp=1437034911121000) = (name=folder, value=folder1,
 timestamp=1437034833452000) = (name=name, value=file1,
 timestamp=1437034851993000) = (name=size, value=1234,
 timestamp=1437034871356000)
 ---
 RowKey: id2
 = (name=COM_1, value=, timestamp=1437035169011000) = (name=folder,
 value=folder1, timestamp=143703506208) = (name=mime, value=image/jpeg,
 timestamp=1437035145227000) = (name=name, value=file2,
 timestamp=1437035073596000)

 Thanks for your help !




-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Read Consistency

2015-06-30 Thread Tyler Hobbs

, as it expects to receive data from 2 nodes with
 RF=3


 Scenario 2: Read query is fired and all 3 replicas have different data
 with different timestamps.

 Read query will return the data with most recent timestamp and trigger a
 read repair in the backend .

 On Tue, Jun 23, 2015 at 10:57 AM, Anuj Wadehra anujw_2...@yahoo.co.in
 wrote:

 Hi,

 Need to validate my understanding..

 RF=3 , Read CL = Quorum

 What would be returned to the client in following scenarios:

 Scenario 1: Read query is fired for a key, data is found on one node and
 not found on other two nodes who are responsible for the token
 corresponding to key.

 Options: no data is returned OR data from the only node having data is
 returned?

 Scenario 2: Read query is fired and all 3 replicas have different data
 with different timestamps.

 Options: data with latest timestamp is returned OR something else???

 Thanks
 Anuj

 Sent from Yahoo Mail on Android
 https://overview.mail.yahoo.com/mobile/?.src=Android




 --
 Arun





 --
 Arun
 Senior Hadoop/Cassandra Engineer
 Cloudwick


 2014 Data Impact Award Winner (Cloudera)

 http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html





 --
 Arun
 Senior Hadoop/Cassandra Engineer
 Cloudwick


 2014 Data Impact Award Winner (Cloudera)

 http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html






-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Read Consistency

2015-06-30 Thread Tyler Hobbs

On Tue, Jun 30, 2015 at 12:27 PM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:

 Agree Tyler. I think its our application problem. If client returns failed
 write in spite of retries, application must have a rollback mechanism to
 make sure old state is restored. Failed write may be because of the fact
 that CL was not met even though one node successfully wrote.Cassandra wont
 do cleanup or rollback on one node so you need to do it yourself to make
 sure that integrity of data is maintained in case strong consistency is a
 requirement. Right?


Correct, if you get a WriteTimeout error, you don't know if any replicas
have written the data or not.  It's even possible that all replicas wrote
the data but didn't respond to the coordinator in time.  I suspect most
users handle this situation by retrying the write with the same timestamp
(which makes the operation idempotent).

It's worth noting that if you get an Unavailable response, you are
guaranteed that the data has not been written to any replicas, because the
coordinator already knew that the replicas were down when it got the
response.


-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Inconsistent behavior during read

2015-06-25 Thread Tyler Hobbs

On Thu, Jun 25, 2015 at 1:00 PM, Robert Coli rc...@eventbrite.com wrote:

 [1] or read repair set to 100% combined with a full scan of all data...
 which no one does...


And this is only true if full scan means reading every partition
individually.  Reads of partition ranges (or a range slice, in old Thrift
terms) don't do read repair.


-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: MarshalException after upgrading to 2.1.6

2015-06-11 Thread Tyler Hobbs

(UUIDType.java:184)
 ... 12 more
 Caused by: java.text.ParseException: Unable to parse the date: currencyCode
 at
 org.apache.commons.lang3.time.DateUtils.parseDateWithLeniency(DateUtils.java:336)
 at
 org.apache.commons.lang3.time.DateUtils.parseDateStrictly(DateUtils.java:286)
 at
 org.apache.cassandra.serializers.TimestampSerializer.dateStringToTimestamp(TimestampSerializer.java:107)
 ... 13 more
 Exception encountered during startup: unable to make version 1 UUID from
 'currencyCode'







-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Cassandra 2.2, 3.0, and beyond

2015-06-10 Thread Tyler Hobbs

On Wed, Jun 10, 2015 at 1:43 PM, sean_r_dur...@homedepot.com wrote:

 With 3.0, what happens to existing Thrift-based tables (with dynamic
 column names, etc.)?


Just like in Cassandra 2.x, they will show up as COMPACT STORAGE tables in
a format that CQL can work with.  We're making a few adjustments to how the
schema is presented in CQL, mostly to better deal with a mixture of defined
and undefined column names (mixed static and dynamic).  That mostly
involves treating defined columns as static.

However, the storage format for COMPACT STORAGE tables will not be
(significantly) different from normal tables any more.  You can read a few
details about the new storage format here:
https://github.com/pcmanus/cassandra/blob/8099_engine_refactor/guide_8099.md#storage-format-on-disk-and-on-wire


-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: TTL and gc_grace_period

2015-06-05 Thread Tyler Hobbs

On Fri, Jun 5, 2015 at 11:02 AM, Kévin LOVATO klov...@alprema.com wrote:

 Great, so is there any reason I wouldn't want to set gc_grace_seconds to 0
 on an insert once/ttl only column family, since it feels like the best
 thing to do?


Nope, setting gc_grace_seconds to 0 is just fine in your case.


-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: TTL and gc_grace_period

2015-06-05 Thread Tyler Hobbs

On Fri, Jun 5, 2015 at 10:30 AM, Kévin LOVATO klov...@alprema.com wrote:


 I have a column family with data (metrics) that is never overwritten and
 only deleted using TTLs, and I am wondering if it would be reasonable to
 have a very low gc_grace_period (even 0) on that CF. I would like to do
 that mainly to save space and also to prevent tombstone scanning.


Yes, you can safely lower gc_grace_seconds.  You would only _not_ want to
lower gc_grace_seconds if you did deletes or overwrote cells with a lower
TTL.



 From what I understand of what I could read online, when an expired TTLed
 column is compacted, it is replaced by a tombstone, so having
 gc_grace_period would prevent that. Although this would allow the
 appearance of ghost/zombie columns.

 The question I'm trying to answer here is the following: Would those ghost
 columns be able to appear, and if so, would it be a problem, since they
 would themselves be marked as expired?


You don't need to worry about expired data being revived because every node
that has a copy of that data will have the same TTL.


-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Coordination of expired TTLs compared to tombstones

2015-05-29 Thread Tyler Hobbs

On Fri, May 29, 2015 at 1:31 PM, Robert Wille rwi...@fold3.com wrote:


 I was wondering how that compares to cells with expired TTLs. Does the
 node get to skip sending data back to the coordinator for an expired TTL?


No, it has to send expired cells.



 Suppose you wrote a cell with no TTL, and then updated it with a TTL.
 Suppose that node 1 got both writes, but node 2 only got the first one. If
 you asked for the cell after it expired, and node 1 did not send anything
 to the coordinator, it seems to me that that could violate consistency
 levels. Also, read repair could never fix node 2. So, how does that work?


That's precisely why they have to be sent to the coordinator.



 On a related note, do cells with expired TTLs have to wait
 gc_grace_seconds before they can be compacted out?


Yes.


 It seems to me that if they could get compacted out immediately after
 expiration, you could get zombie data, just like you can with tombstones.
 For example, write a cell with no TTL to all replicas, shut down one
 replica, update the cell with a TTL, compact after the TTL has expired,
 then bring the other node back up. Voila, the formerly down node has a
 value that will replicate to the other nodes.


Correct, that's why they can't be purged immediately.


-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: A few stupid questions...

2015-05-26 Thread Tyler Hobbs

On Tue, May 26, 2015 at 2:00 PM, Eax Melanhovich m...@eax.me wrote:


 First. Lets say I have a table (field1, field2, field3, field4), where
 (field1, field2) is a primary key and field1 is partition key. There is
 a secondary index for field3 column. Do I right understand that in this
 case query like:

 select ... from my_table where field1 = 123 and field3  '...';

 ... would be quite efficient, i.e. request would be send only to one
 node, not the whole cluster?


You are correct that it would only query one node (or one set of replicas,
if RF  1 and CL  1) due to the partition key being restricted.  However,
using '' for the operator on the indexed column forces Cassandra to scan
the partition instead of using the index, because secondary indexes only
support '=' operations.  If you care about performance, you're probably
better off creating a dedicated table to serve this type of query, as
described here:
http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling



 Second. Lets say there is some data that almost never changes but is
 read all the time. E.g. information about smiles in social network. Or
 current sessions. In this case would Cassandra cache hot data in
 memtable? Or such data should be stored somewhere else, i.e. Redis or
 Couchbase?


Memtables are only used for buffering writes, not for caching read data.
Cassandra does have several layers of caching though.  Frequently read data
will end up in the key cache and the OS page cache, making reads quite
efficient.  Optionally, you can also enable the row cache.  Since you're
almost never modifying the data, the row cache is actually a decent fit,
although I recommend testing it heavily with your use case for stability.
The best way to find out if your performance is good enough is to benchmark
it with your own usecase.


-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Clarification of property: storage_port

2015-05-15 Thread Tyler Hobbs

On Fri, May 15, 2015 at 4:17 AM, Magnus Vojbacke 
magnus.vojba...@digitalroute.com wrote:


 Function: What protocols and functions is storage_port used for? Am I
 right to believe that it is used for Gossip?


It's used for all internode communication (gossip, requests, etc).



 And more importantly: It seems to me that storage_port MUST be configured
 to be the same port for _all_ nodes in a cluster, is this correct?


That's correct.

-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Leap sec

2015-05-15 Thread Tyler Hobbs

This post has some good advice for preparing for the leap second:
http://www.datastax.com/dev/blog/preparing-for-the-leap-second

On Fri, May 15, 2015 at 12:25 PM, cass savy casss...@gmail.com wrote:

 Just curious to know on how you are preparing Prod C* clusters for leap
 sec.

 What are the workaorund other than upgrading kernel to 3.4+?
 Are you upgrading clusters to Java 7 or higher on client and C* servers?





-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Caching the PreparedStatement (Java driver)

2015-05-15 Thread Tyler Hobbs

On Fri, May 15, 2015 at 12:02 PM, Ajay ajay.ga...@gmail.com wrote:


 But I am also not sure of what happens when a cached prepared statement is
 executed after cassandra nodes restart. Does the server prepared statements
 cache is persisted or in memory?.


For now, it's just in memory, so they are lost when the node is restarted.


 If it is in memory, how do we handle stale prepared statement in the cache?


If a prepared statement ID is used that Cassandra doesn't recognize (e.g.
after a node restart), it responds with a specific error to the driver.
When the driver sees this error, it automatically re-prepares the statement
against that node using the statement info from its own cache.  After the
statement has been re-prepared, it attempts to execute the query again.
This all happens transparently, so your application will not even be aware
of it (aside from an increase in latency).

There are plans to persist prepared statements in a system table:
https://issues.apache.org/jira/browse/CASSANDRA-8831


-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: CQL 3.x Update ...USING TIMESTAMP...

2015-04-21 Thread Tyler Hobbs

On Mon, Apr 20, 2015 at 4:02 PM, Sachin Nikam skni...@gmail.com wrote:

 #1. We have 2 data centers located close by with plans to expand to more
 data centers which are even further away geographically.
 #2. How will this impact light weight transactions when there is high
 level of network contention for cross data center traffic.


If you are only expecting updates to a given document from one DC, then you
could use LOCAL_SERIAL for the LWT operations.  If you can't do that, then
LWT are probably not a great option for you.


 #3. Do you know of any real examples where companies have used light
 weight transactions in a multi-data center traffic.


I don't know who's doing that off the top of my head, but I imagine they're
using LOCAL_SERIAL.


-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: High latencies for simple queries

2015-03-31 Thread Tyler Hobbs

To clarify, that's in Cassandra 2.1+.  In 2.0 and earlier, we used
http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ for cqlsh.

On Tue, Mar 31, 2015 at 10:40 AM, Tyler Hobbs ty...@datastax.com wrote:

 The python driver that we bundle with Cassandra for cqlsh is the normal
 python driver (https://github.com/datastax/python-driver), although
 sometimes it's patched for bugfixes or is not an official release.

 On Sat, Mar 28, 2015 at 5:36 PM, Ben Bromhead b...@instaclustr.com wrote:

 cqlsh runs on the internal cassandra python drivers: cassandra-pylib and
 cqlshlib.

 I would not recommend using them at all (nothing wrong with them, they
 are just not built with external users in mind).

 I have never used python-driver in anger so I can't comment on whether it
 is genuinely slower than the internal C* python driver, but this might be a
 question for python-driver folk.

 On 28 March 2015 at 00:34, Artur Siekielski a...@vhex.net wrote:

 On 03/28/2015 12:13 AM, Ben Bromhead wrote:

 One other thing to keep in mind / check is that doing these tests
 locally the cassandra driver will connect using the network stack,
 whereas postgres supports local connections over a unix domain socket
 (this is also enabled by default).

 Unix domain sockets are significantly faster than tcp as you don't have
 a network stack to traverse. I think any driver using libpq will attempt
 to use the domain socket when connecting locally.


 Good catch. I assured that psycopg2 connects through a TCP socket and
 the numbers increased by about 20%, but it still is an order of magnitude
 faster than Cassandra.


 But I'm going to hazard a guess something else is going on with the
 Cassandra connection as I'm able to get 0.5ms queries locally and that's
 even with trace turned on.


 Using python-driver?




 --

 Ben Bromhead

 Instaclustr | www.instaclustr.com | @instaclustr
 http://twitter.com/instaclustr | (650) 284 9692




 --
 Tyler Hobbs
 DataStax http://datastax.com/




-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: High latencies for simple queries

2015-03-31 Thread Tyler Hobbs

The python driver that we bundle with Cassandra for cqlsh is the normal
python driver (https://github.com/datastax/python-driver), although
sometimes it's patched for bugfixes or is not an official release.

On Sat, Mar 28, 2015 at 5:36 PM, Ben Bromhead b...@instaclustr.com wrote:

 cqlsh runs on the internal cassandra python drivers: cassandra-pylib and
 cqlshlib.

 I would not recommend using them at all (nothing wrong with them, they are
 just not built with external users in mind).

 I have never used python-driver in anger so I can't comment on whether it
 is genuinely slower than the internal C* python driver, but this might be a
 question for python-driver folk.

 On 28 March 2015 at 00:34, Artur Siekielski a...@vhex.net wrote:

 On 03/28/2015 12:13 AM, Ben Bromhead wrote:

 One other thing to keep in mind / check is that doing these tests
 locally the cassandra driver will connect using the network stack,
 whereas postgres supports local connections over a unix domain socket
 (this is also enabled by default).

 Unix domain sockets are significantly faster than tcp as you don't have
 a network stack to traverse. I think any driver using libpq will attempt
 to use the domain socket when connecting locally.


 Good catch. I assured that psycopg2 connects through a TCP socket and the
 numbers increased by about 20%, but it still is an order of magnitude
 faster than Cassandra.


 But I'm going to hazard a guess something else is going on with the
 Cassandra connection as I'm able to get 0.5ms queries locally and that's
 even with trace turned on.


 Using python-driver?




 --

 Ben Bromhead

 Instaclustr | www.instaclustr.com | @instaclustr
 http://twitter.com/instaclustr | (650) 284 9692




-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: High latencies for simple queries

2015-03-27 Thread Tyler Hobbs

Just to check, are you concerned about minimizing that latency or
maximizing throughput?

I'll that latency is what you're actually concerned about.  A fair amount
of that latency is probably happening in the python driver.  Although it
can easily execute ~8k operations per second (using cpython), in some
scenarios it can be difficult to guarantee sub-ms latency for an individual
query due to how some of the internals work.  In particular, it uses
python's Conditions for cross-thread signalling (from the event loop thread
to the application thread).  Unfortunately, python's Condition
implementation includes a loop with a minimum sleep of 1ms if the Condition
isn't already set when you start the wait() call.  This is why, with a
single application thread, you will typically see a minimum of 1ms latency.

Another source of similar latencies for the python driver is the Asyncore
event loop, which is used when libev isn't available.  I would make sure
that you can use the LibevConnection class with the driver to avoid this.

On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski a...@vhex.net wrote:

 I'm running Cassandra locally and I see that the execution time for the
 simplest queries is 1-2 milliseconds. By a simple query I mean either
 INSERT or SELECT from a small table with short keys.

 While this number is not high, it's about 10-20 times slower than
 Postgresql (even if INSERTs are wrapped in transactions). I know that the
 nature of Cassandra compared to Postgresql is different, but for some
 scenarios this difference can matter.

 The question is: is it normal for Cassandra to have a minimum latency of 1
 millisecond?

 I'm using Cassandra 2.1.2, python-driver.





-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: High latencies for simple queries

2015-03-27 Thread Tyler Hobbs

Since you're executing queries sequentially, you may want to look into
using callback chaining to avoid the cross-thread signaling that results in
the 1ms latencies.  Basically, just use session.execute_async() and attach
a callback to the returned future that will execute your next query.  The
callback is executed on the event loop thread.  The main downsides to this
are that you need to be careful to avoid blocking the event loop thread
(including executing session.execute() or prepare()) and you need to ensure
that all exceptions raised in the callback are handled by your application
code.

On Fri, Mar 27, 2015 at 3:11 PM, Artur Siekielski a...@vhex.net wrote:

 I think that in your example Postgres spends most time on waiting for
 fsync() to complete. On Linux, for a battery-backed raid controller, it's
 safe to mount ext4 filesystem with barrier=0 option which improves
 fsync() performance a lot. I have partitions mounted with this option and I
 did a test from Python, using psycopg2 driver, and I got the following
 latencies, in milliseconds:
 - INSERT without COMMIT: 0.04
 - INSERT with COMMIT: 0.12
 - SELECT: 0.05
 I'm also repeating benchmark runs multiple times (I'm using Python's
 timeit module).


 On 03/27/2015 07:58 PM, Ben Bromhead wrote:

 Latency can be so variable even when testing things locally. I quickly
 fired up postgres and did the following with psql:

 ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
 CREATE TABLE
 ben=# \timing
 Timing is on.
 ben=# INSERT INTO foo VALUES(2, 'yay');
 INSERT 0 1
 Time: 1.162 ms
 ben=# INSERT INTO foo VALUES(3, 'yay');
 INSERT 0 1
 Time: 1.108 ms

 I then fired up a local copy of Cassandra (2.0.12)

 cqlsh CREATE KEYSPACE foo WITH replication = { 'class' :
 'SimpleStrategy', 'replication_factor' : 1 };
 cqlsh USE foo;
 cqlsh:foo CREATE TABLE foo(i int PRIMARY KEY, j text);
 cqlsh:foo TRACING ON;
 Now tracing requests.
 cqlsh:foo INSERT INTO foo (i, j) VALUES (1, 'yay');





-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Not seeing keyspace in nodetool compactionhistory

2015-03-25 Thread Tyler Hobbs

What version of Cassandra are you using?  Since it sounds like you aren't
doing any reads, it could be
https://issues.apache.org/jira/browse/CASSANDRA-8635.

On Wed, Mar 18, 2015 at 9:37 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 When I run nodetool compactionhistory , I'm only seeing the system
 keyspace, and OpsCenter keyspace in the compactions. I only see one mention
 of my own keyspace, but its only for the smallest table within that
 keyspace (containing only about 1k rows). My two other tables, containing
 1.1m and 100k rows respectively, weren't to be seen.

 Any reason why that is?

 I did fill up the data in those two tables within the span of about 4
 hours (I ran a script to migrate existing data from legacy rdbms dbs).
 Could that have something to do with it?

 I'm using SizeTieredCompactionStrategy for all tables.




-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Not seeing keyspace in nodetool compactionhistory

2015-03-25 Thread Tyler Hobbs

How many sstables (*-Data.db files) do each of your two tables have?

On Wed, Mar 25, 2015 at 2:54 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 I also just inserted, didn't do any updates.

 On Thu, Mar 26, 2015 at 12:54 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 I'm on 2.0.12

 I'm not sure if that's issue, since the size isn't growing. The size is
 about what i'd expect.

 On Thu, Mar 26, 2015 at 12:44 AM, Tyler Hobbs ty...@datastax.com wrote:

 What version of Cassandra are you using?  Since it sounds like you
 aren't doing any reads, it could be
 https://issues.apache.org/jira/browse/CASSANDRA-8635.

 On Wed, Mar 18, 2015 at 9:37 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 When I run nodetool compactionhistory , I'm only seeing the system
 keyspace, and OpsCenter keyspace in the compactions. I only see one mention
 of my own keyspace, but its only for the smallest table within that
 keyspace (containing only about 1k rows). My two other tables, containing
 1.1m and 100k rows respectively, weren't to be seen.

 Any reason why that is?

 I did fill up the data in those two tables within the span of about 4
 hours (I ran a script to migrate existing data from legacy rdbms dbs).
 Could that have something to do with it?

 I'm using SizeTieredCompactionStrategy for all tables.




 --
 Tyler Hobbs
 DataStax http://datastax.com/






-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: error in bulk loading

2015-03-24 Thread Tyler Hobbs

On Tue, Mar 24, 2015 at 5:30 AM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

 I need to import a csv file to a table using copy command, but file
 contains carriage returns which causing me problem in doing so, Is there
 any way in cassandra to solve this


You can surround the field with double-quotes to handle this (or change the
quote character with the QUOTE option for COPY).

-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: CQL 3.x Update ...USING TIMESTAMP...

2015-03-24 Thread Tyler Hobbs

 clustering key to
 resolve that with LIMIT 1; also this is for DSE Solr, which wouldn't be
 able to query a by max b.foo anyway.  So when we write to *b*, we
 also write to *a* with something like

 UPDATE a USING TIMESTAMP ${b.a_timestamp.toMicros + b.foo} SET
 max_b_foo = ${b.foo} WHERE id = ${b.a_id}

 Assuming that we don't run afoul of related antipatterns such as
 repeatedly overwriting the same value indefinitely, this strikes me as
 sound if unorthodox practice, as long as conflict resolution in Cassandra
 isn't broken in some subtle way.  We also designed this to be safe from
 getting write timestamps greatly out of sync with clock time so that
 non-timestamped operations (especially delete) if done accidentally will
 still have a reasonable chance of having the expected results.

 So while it may not be the intended use case for write timestamps, and
 there are definitely gotchas if you are not careful or misunderstand the
 consequences, as far as I can see the logic behind it is sound but does
 rely on correct conflict resolution in Cassandra.  I'm curious if I'm
 missing or misunderstanding something important.

 On Wed, Mar 11, 2015 at 4:11 PM, Tyler Hobbs ty...@datastax.com
 wrote:

 Don't use the version as your timestamp.  It's possible, but you'll
 end up with problems when attempting to overwrite or delete entries.

 Instead, make the version part of the primary key:

 CREATE TABLE document_store (document_id bigint, version int,
 document text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER 
 BY
 (version desc)

 That way you don't have to worry about overwriting higher versions
 with a lower one, and to read the latest version, you only have to do:

 SELECT * FROM document_store WHERE document_id = ? LIMIT 1;

 Another option is to use lightweight transactions (i.e. UPDATE ...
 SET docuement = ?, version = ? WHERE document_id = ? IF version  ?), but
 that's going to make writes much more expensive.

 On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam skni...@gmail.com
 wrote:

 I am planning to use the Update...USING TIMESTAMP... statement to
 make sure that I do not overwrite fresh data with stale data while 
 having
 to avoid doing at least LOCAL_QUORUM writes.

 Here is my table structure.

 Table=DocumentStore
 DocumentID (primaryKey, bigint)
 Document(text)
 Version(int)

 If the service receives 2 write requests with Version=1 and
 Version=2, regardless of the order of arrival, the business requirement 
 is
 that we end up with Version=2 in the database.

 Can I use the following CQL Statement?

 Update DocumentStore using versionValue
 SET  Document=documentValue,
 Version=versionValue
 where DocumentID=documentIDValue;

 Has anybody used something like this? If so was the behavior as
 expected?

 Regards
 Sachin




 --
 Tyler Hobbs
 DataStax http://datastax.com/








-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: CQL 3.x Update ...USING TIMESTAMP...

2015-03-11 Thread Tyler Hobbs

Don't use the version as your timestamp.  It's possible, but you'll end up
with problems when attempting to overwrite or delete entries.

Instead, make the version part of the primary key:

CREATE TABLE document_store (document_id bigint, version int, document
text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER BY (version
desc)

That way you don't have to worry about overwriting higher versions with a
lower one, and to read the latest version, you only have to do:

SELECT * FROM document_store WHERE document_id = ? LIMIT 1;

Another option is to use lightweight transactions (i.e. UPDATE ... SET
docuement = ?, version = ? WHERE document_id = ? IF version  ?), but
that's going to make writes much more expensive.

On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam skni...@gmail.com wrote:

 I am planning to use the Update...USING TIMESTAMP... statement to make
 sure that I do not overwrite fresh data with stale data while having to
 avoid doing at least LOCAL_QUORUM writes.

 Here is my table structure.

 Table=DocumentStore
 DocumentID (primaryKey, bigint)
 Document(text)
 Version(int)

 If the service receives 2 write requests with Version=1 and Version=2,
 regardless of the order of arrival, the business requirement is that we end
 up with Version=2 in the database.

 Can I use the following CQL Statement?

 Update DocumentStore using versionValue
 SET  Document=documentValue,
 Version=versionValue
 where DocumentID=documentIDValue;

 Has anybody used something like this? If so was the behavior as expected?

 Regards
 Sachin




-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: sstables remain after compaction

2015-03-03 Thread Tyler Hobbs

On Tue, Mar 3, 2015 at 3:44 AM, Jason Wee peich...@gmail.com wrote:

 we are in the midst of upgrading... 1.0.8 - 1.0.12 then to 1.1.0.. then
 to the latest of 1.1.. then to 1.2


I'm not aware of any good reason to put 1.1.0 in the middle there.  I would
go straight from 1.0.12 to the latest 1.1.x.


-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: Documentation of batch statements

2015-03-03 Thread Tyler Hobbs

On Tue, Mar 3, 2015 at 2:39 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Actually, that's not true either.  It's technically possible for a batch
 to be partially applied in the current implementation, even with logged
 batches.  atomic is used incorrectly here, imo, since more than 2 states
 can be visible, unapplied  applied.


That's a matter of isolation, not atomicity.  Although, with a long enough
gap between partial and full application, the distinction becomes somewhat
pedantic, I suppose.


-- 
Tyler Hobbs
DataStax http://datastax.com/

1 2 3 4 5 6 >

1 - 100 of 552 matches

Mail list logo