Re: Frozen type sin cassandra.
On Sun, Mar 5, 2017 at 11:53 PM, anuja jain <anujaja...@gmail.com> wrote: > Is there is difference between creating column of type > frozen<list> and frozen where list_double is UDT of > type frozen<list> ? > Yes, there is a difference in serialization format: the first will be serialized directly as a list, the second will be serialized as a single-field UDT containing a list. Additionally, the second form supports altering the type by adding fields to the UDT. This can't be done with the first form. If you don't need this capability, I recommend going with the simpler option of frozen<list>. > Also how to create a solr index on such columns? > I have no idea, sorry. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Frozen type support
There are no plans to remove support for frozen types. I don't expect that would ever happen. On Tue, Jan 24, 2017 at 9:38 AM, Ahmed Eljami <ahmed.elj...@gmail.com> wrote: > Hi, > > I would like to know if the Frozen type will no longer be supported in > the future versions of Cassandra ? > > > Thx. > Ahmed > > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Import failure for use python cassandra-driver
This was fixed by the 3.7.1 release of the python driver: https://groups.google.com/a/lists.datastax.com/forum/#!topic/python-driver-user/1UbvYc_h9KQ On Wed, Oct 26, 2016 at 4:35 AM, Stefano Ortolani <ostef...@gmail.com> wrote: > Did you try the workaround they posted (aka, downgrading Cython)? > > Cheers, > Stefano > > On Wed, Oct 26, 2016 at 10:01 AM, Zao Liu <zao...@gmail.com> wrote: > > Same happen to my ubuntu boxes. > > > > File > > "/home/jasonl/.pex/install/cassandra_driver-3.7.0-cp27- > none-linux_x86_64.whl.ebfb31ab99650d53ad134e0b312c74 > 94296cdd2b/cassandra_driver-3.7.0-cp27-none-linux_x86_64. > whl/cassandra/cqlengine/connection.py", > > line 20, in > > > > from cassandra.cluster import Cluster, _NOT_SET, NoHostAvailable, > > UserTypeDoesNotExist > > > > ImportError: > > /home/jasonl/.pex/install/cassandra_driver-3.7.0-cp27- > none-linux_x86_64.whl.ebfb31ab99650d53ad134e0b312c74 > 94296cdd2b/cassandra_driver-3.7.0-cp27-none-linux_x86_64. > whl/cassandra/cluster.so: > > undefined symbol: PyException_Check > > > > > > And there is someone asked the same question in stack overflow: > > > > http://stackoverflow.com/questions/40251893/datastax- > python-cassandra-driver-build-fails-on-ubuntu# > > > > > > > > On Wed, Oct 26, 2016 at 1:49 AM, Zao Liu <zao...@gmail.com> wrote: > >> > >> Hi, > >> > >> Suddenly I start to get this following errors when use python cassandra > >> driver 3.7.0 in my macbook pro running OS X EI Capitan. Tries to > reinstall > >> the package and all the dependencies, unfortunately no luck. I was able > to > >> run it a few days earlier. Really can't recall what I changed could > cause > >> this. > >> > >> File > >> "/Library/Python/2.7/site-packages/cassandra/cqlengine/connection.py", > line > >> 20, in > >> from cassandra.cluster import Cluster, _NOT_SET, NoHostAvailable, > >> UserTypeDoesNotExist > >> ImportError: > >> dlopen(/Library/Python/2.7/site-packages/cassandra/cluster.so, 2): > Symbol > >> not found: _PyException_Check > >> Referenced from: /Library/Python/2.7/site- > packages/cassandra/cluster.so > >> Expected in: flat namespace > >> in /Library/Python/2.7/site-packages/cassandra/cluster.so > >> > >> Thanks, > >> Jason > >> > >> > > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Cannot set TTL in COPY command
On Wed, Oct 26, 2016 at 10:07 AM, techpyaasa . <techpya...@gmail.com> wrote: > Can some one please tell me how to set TTL using COPY command? It looks like you're using Cassandra 2.0. I don't think COPY supports the TTL option until at least 2.1. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: CASSANDRA-5376: CQL IN clause on last key not working when schema includes set,list or map
That ticket was just to improve the error message. From the comments on the ticket: "Unfortunately, handling collections is slightly harder than what CASSANDRA-5230 <https://issues.apache.org/jira/browse/CASSANDRA-5230> aimed for, because we can't do a name query. So this will have to wait for CASSANDRA-4762 <https://issues.apache.org/jira/browse/CASSANDRA-4762>. In the meantime, we should obviously not throw an assertion error so attaching a patch to improve validation." However, it seems like this would be possible to support in Cassandra 3.x. We probably just need to remove the check and verify that it actually works. Can you open a new JIRA ticket for this? On Thu, Sep 15, 2016 at 12:49 PM, Samba <saas...@gmail.com> wrote: > any update on this issue? > > the quoted JIRA issue (CASSANDRA-5376) is resolved as fixed in 1.2.4 but > it is still not possible (even in 3.7) to use IN operator in queries that > fetch collection columns. > > is the fix only to report better error message that this is not possible > or was it fixed then but the issue resurfaced in regression? > > could you please confirm one way or the other? > > Thanks and Regards, > Samba > > > On Tue, Sep 6, 2016 at 6:34 PM, Samba <saas...@gmail.com> wrote: > >> Hi, >> >> "CASSANDRA-5376: CQL IN clause on last key not working when schema >> includes set,list or map" >> >> is marked resolved in 1.2.4 but i still see the issue (not an Assertion >> Error, but an query validation message) >> >> was the issue resolved only to report proper error message or was it >> fixed to support retrieving collections when query contains IN clause of >> partition/cluster (last) columns? >> >> If it was fixed properly to support retrieving collections with IN >> clause, then is it a bug in 3.7 release that i get the same message? >> >> Could you please explain, if it not fixed as intended, if there are plans >> to support this in future? >> >> Thanks & Regards, >> Samba >> > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: race condition for quorum consistency
On Wed, Sep 14, 2016 at 3:49 PM, Nicolas Douillet < nicolas.douil...@gmail.com> wrote: > - > - during read requests, cassandra will ask to one node the data and to > the others involved in the CL a digest, and if all digests do not match, > will ask for them the entire data, handle the merge and finally will ask to > those nodes a background repair. Your write may have succeed during this > time. This is very good info, but as a minor correction, the repair here will happen in the foreground before the response is returned to the client. So, at least from a single client's perspective, you get monotonic reads. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: ServerError: An unexpected error occurred server side; in cassandra java driver
There should be a corresponding error and stacktrace in your cassandra logs on 10.0.230.25. Please find that and post it, if you can. On Thu, Sep 1, 2016 at 7:23 AM, Siddharth Verma < verma.siddha...@snapdeal.com> wrote: > Debugged the issue a little. > AbstractFuture.get() throws java.util..concurrent.ExecutionException > in, Uninterruptables.getUninterruptibly interrupted gets set to true, > which does Thread.interrupt() > thus in DefaultResultSetFuture > (ResultSet)Uninterruptibles.getUninterruptibly(this) > throws exception. > > If someone who might have faced a similar issue could provide his/her > views. > > Thanks > Siddharth > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: How to create a TupleType/TupleValue in a UDF
On Thu, Aug 18, 2016 at 12:57 PM, Drew Kutcharian <d...@venarc.com> wrote: > I’m running 3.0.8, so it probably wasn’t fixed? ;) > Hmm, would you mind opening a new JIRA ticket about that and linking it to CASSANDRA-11033? > > The CodecNotFoundException is very random, when I get it, if I re-run the > same exact query then it works! I’ll see if I can reproduce it more > consistently. > Thanks. If you can reproduce, please go ahead and open a ticket for that as well. > BTW, is there a way to get the CodecRegistry and the ProtocolVersion from > the UDF environment so I don’t have to create them? > At least in 3.0.8, I don't think so. It's worth pointing out https://issues.apache.org/jira/browse/CASSANDRA-10818, which makes it much easier to create tuples and UDTs in 3.6+. Check out the bottom of the UDF section of the docs for some examples and details: http://cassandra.apache.org/doc/latest/cql/functions.html#user-defined-functions -- Tyler Hobbs DataStax <http://datastax.com/>
Re: How to create a TupleType/TupleValue in a UDF
The logback-related error is due to https://issues.apache.org/jira/browse/CASSANDRA-11033, which is fixed in 3.0.4 and 3.4. I'm not sure about the CodecNotFoundException, can you reproduce that one reliably? On Thu, Aug 18, 2016 at 10:52 AM, Drew Kutcharian <d...@venarc.com> wrote: > Hi All, > > I have a UDF/UDA that returns a map of date -> TupleValue. > > CREATE OR REPLACE FUNCTION min_max_by_timestamps_udf(state map<date, > frozen<tuple<timestamp, timestamp>>>, flake blob) > RETURNS NULL ON NULL INPUT > RETURNS map<date, frozen<tuple<timestamp, timestamp>>> > LANGUAGE java > > CREATE OR REPLACE AGGREGATE min_max_by_timestamps(blob) > SFUNC min_max_by_timestamps_udf > STYPE map<date, frozen<tuple<timestamp, timestamp>>> > INITCOND {}; > > I’ve been using the following syntax to build the TupleType/TupleValue in > my UDF: > > TupleType tupleType = TupleType.of(com.datastax. > driver.core.ProtocolVersion.NEWEST_SUPPORTED, CodecRegistry.DEFAULT_INSTANCE, > DataType.timestamp(), DataType.timestamp()); > tupleType.newValue(new java.util.Date(timestamp), new > java.util.Date(timestamp))); > > But “randomly" I get errors like the following: > FunctionFailure: code=1400 [User Defined Function failure] > message="execution of ’testdb.min_max_by_timestamps_udf[map<date, > frozen<tuple<timestamp, timestamp>>>, blob]' failed: > java.security.AccessControlException: > access denied ("java.io.FilePermission" "/etc/cassandra/logback.xml" > "read”)" > > Or CodecNotFoundException for Cassandra not being able to find a codec for > "map<date, frozen<tuple<timestamp, timestamp>>>”. > > Is this a bug or I’m doing something wrong? > > > Thanks, > > Drew > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: migrating from 2.1.2 to 3.0.8 log errors
That just means that a client/driver disconnected. Those log messages are supposed to be suppressed, but perhaps that stopped working in 3.x due to another change. On Wed, Aug 10, 2016 at 10:33 AM, Adil <adil.cha...@gmail.com> wrote: > Hi guys, > We have migrated our cluster (5 nodes in DC1 and 5 nodes in DC2) from > cassandra 2.1.2 to 3.0.8, all seems fine, executing nodetool status shows > all nodes UN, but in each node's log there is this log error continuously: > java.io.IOException: Error while read(...): Connection reset by peer > at io.netty.channel.epoll.Native.readAddress(Native Method) > ~[netty-all-4.0.23.Final.jar:4.0.23.Final] > at io.netty.channel.epoll.EpollSocketChannel$ > EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675) > ~[netty-all-4.0.23.Final.jar:4.0.23.Final] > at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe. > epollInReady(EpollSocketChannel.java:714) ~[netty-all-4.0.23.Final.jar: > 4.0.23.Final] > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) > ~[netty-all-4.0.23.Final.jar:4.0.23.Final] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) > ~[netty-all-4.0.23.Final.jar:4.0.23.Final] > at io.netty.util.concurrent.SingleThreadEventExecutor$2. > run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.23.Final.jar: > 4.0.23.Final] > at io.netty.util.concurrent.DefaultThreadFactory$ > DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > ~[netty-all-4.0.23.Final.jar:4.0.23.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] > > we have installed java-8_101 > > anya idea what woud be the problem? > > thanks > > Adil > does anyone > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: (C)* stable version after 3.5
On Wed, Jul 13, 2016 at 11:32 AM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > Why do you think that skipping 2.2 is not recommended when NEWS.txt > suggests otherwise? Can you elaborate? We test upgrading from 2.1 -> 3.x and upgrading from 2.2 -> 3.x equivalently. There should not be a difference in terms of how well the upgrade is supported. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: C* 2.2.7 ?
2.2.7 just got tentatively tagged yesterday. So, there should be a vote on releasing it shortly. On Wed, Jun 29, 2016 at 8:24 AM, Dominik Keil <dominik.k...@movilizer.com> wrote: > +1 > > there's some bugs fixed we might be or sure are affected by and the > change log has become quite large already mind voting von 2.2.7 soon? > > > Am 21.06.2016 um 15:31 schrieb horschi: > > Hi, > > are there any plans to release 2.2.7 any time soon? > > kind regards, > Christian > > > -- > *Dominik Keil* > Phone: + 49 (0) 621 150 207 31 > Mobile: + 49 (0) 151 626 602 14 > > Movilizer GmbH > Konrad-Zuse-Ring 30 > 68163 Mannheim > Germany > > movilizer.com > > [image: Visit company website] <http://movilizer.com/> > *Reinvent Your Mobile Enterprise* > > *-Movilizer is moving* > After June 27th 2016 Movilizer's new headquarter will be > > > > > *EASTSITE VIIIKonrad-Zuse-Ring 3068163 Mannheim* > > <http://movilizer.com/training> > <http://movilizer.com/training> > > *Be the first to know:* > Twitter <https://twitter.com/Movilizer> | LinkedIn > <https://www.linkedin.com/company/movilizer-gmbh> | Facebook > <https://www.facebook.com/Movilizer> | stack overflow > <http://stackoverflow.com/questions/tagged/movilizer> > > Company's registered office: Mannheim HRB: 700323 / Country Court: > Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche > Please inform us immediately if this e-mail and/or any attachment was > transmitted incompletely or was not intelligible. > > This e-mail and any attachment is for authorized use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be > copied, disclosed to, retained or used by any other party. If you are not > an intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Read operation can read uncommitted data?
Reads at CL.SERIAL will complete any in-progress paxos writes, so the behavior you're seeing is expected. On Mon, Jun 27, 2016 at 1:55 AM, Yuji Ito <y...@imagine-orb.com> wrote: > Hi, > > I'm testing Cassandra CAS operation. > > Can a read operation read uncommitted data which is being updated by CAS > in the following case? > > I use Cassandra 2.2.6. > There are 3 nodes (A, B and C) in a cluster. > Replication factor of keyspace is 3. > CAS operation on node A starts to update row X (updating the value in row > from 0 to 1). > > 1. prepare/promise phase succeeds on node A > 2. node C is down > 3. read/results phase in node A sends read requests to node B and C and > waits for read responses from them. > 4. (unrelated) read operation (CL: SERIAL) reads the same row X and gets > the value "1" in the row!! > 5. read/results phase fails by ReadTimeoutException caused by failure of > node C > > Thanks, > Yuji Ito > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Adding column to materialized view
This is expected. It's something we plan to support, but it hasn't been done yet: https://issues.apache.org/jira/browse/CASSANDRA-9736 On Mon, Jun 27, 2016 at 4:25 PM, Jason J. W. Williams < jasonjwwilli...@gmail.com> wrote: > Hey Guys, > > Running Cassandra 3.0.5. Needed to add a column to a materialized view, > but ALTER MATERIALIZED VIEW doesn't seem to allow that. So we ended up > dropping the view and recreating it. Is that expected or did I miss > something in the docs? > > -J > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Token Ring Question
On Fri, Jun 24, 2016 at 2:31 PM, Anubhav Kale <anubhav.k...@microsoft.com> wrote: > So, can someone educate me on how token aware policies in drivers really > work ? It appears that it’s quite possible that the data may live on nodes > that don’t own the tokens for it. By “own” I mean the ownership as defined > in system.local / peers and is fed back to drivers. > The tokens in system.local/peers are accurate. Combined with the replication settings for a keyspace, drivers can accurately determine which nodes are replicas for a given partition. Even if the driver's calculation is incorrect for some reason, token-aware routing is just an optimization. Nothing will break if a query is sent to a node that's not a replica. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Installing Cassandra from Tarball
On Mon, Jun 13, 2016 at 11:49 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > > WARN 15:41:58 Cassandra server running in degraded mode. Is swap >> disabled? : true, Address space adequate? : true, nofile limit adequate? >> : false, nproc limit adequate? : false >> > You need to disable swap in order to avoid this message, using swap space > can have serious performance implications. Make sure you disable fstab > entry as well for swap partition. > It looks like swap is actually disabled, but the nofile and nproc limits are too low. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: select query on entire primary key returning more than one row in result
Is 'id' your partition key? I'm not familiar with the stratio indexes, but it looks like the primary key columns are both indexed. Perhaps this is related? On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <atul.sar...@snapdeal.com> wrote: > After further debug, this issue is found in in-memory memtable as doing > nodetool flush + compact resolve the issue. And there is no batch write > used for this table which is showing issue. > Table properties: > > WITH CLUSTERING ORDER BY (f_name ASC) >> AND bloom_filter_fp_chance = 0.01 >> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} >> AND comment = '' >> AND compaction = {'class': >> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', >> 'max_threshold': '32', 'min_threshold': '4'} >> AND compression = {'chunk_length_in_kb': '64', 'class': >> 'org.apache.cassandra.io.compress.LZ4Compressor'} >> AND crc_check_chance = 1.0 >> AND dclocal_read_repair_chance = 0.1 >> AND default_time_to_live = 0 >> AND gc_grace_seconds = 864000 >> AND max_index_interval = 2048 >> AND memtable_flush_period_in_ms = 0 >> AND min_index_interval = 128 >> AND read_repair_chance = 0.0 >> AND speculative_retry = '99PERCENTILE'; >> CREATE CUSTOM INDEX nbf_index ON nbf () USING >> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds': >> '1', 'schema': '{ >> fields : { >> id : {type : "bigint"}, >> f_d_name : { >> type : "string", >> indexed: true, >> sorted : false, >> validated : true, >> case_sensitive : false >> } >> } >> }'}; >> > > > > - > Atul Saroha > *Lead Software Engineer* > *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369 > Plot # 362, ASF Centre - Tower A, Udyog Vihar, > Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA > > On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma < > verma.siddha...@snapdeal.com> wrote: > >> No, all rows were not the same. >> Querying only on the partition key gives 20 rows. >> In the erroneous result, while querying on partition key and clustering >> key, we got 16 of those 20 rows. >> >> And for "*tombstone_threshold"* there isn't any entry at column family >> level. >> >> Thanks, >> Siddharth Verma >> >> >> > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Tick Tock version numbers
On Mon, Jun 13, 2016 at 11:59 AM, Francisco Reyes <li...@natserv.net> wrote: > > > Can I upgrade them to 3.6 from 3.2? Or is it advisable to upgrade to each > intermediary version? > You can (and should) upgrade directly to 3.6 or 3.7. The 3.7 release is just 3.6 + bugfixes. > > Based on what I have gather seems like it is matter of: > bring node down > install new version > bring up > run nodetool upgradesstables -a > For upgrades within the 3.x line, you don't need to run upgradesstables. Other than that, this is correct. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Lightweight Transactions during datacenter outage
You can set the serial_consistency_level to LOCAL_SERIAL to tolerate a DC failure: http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level. It defaults to SERIAL, which ignores DCs. On Tue, Jun 7, 2016 at 12:26 PM, Jeronimo de A. Barros < jeronimo.bar...@gmail.com> wrote: > Hi, > > I have a cluster spreaded among 2 datacenters (DC1 and DC2), two server on > each DC and I have a keyspace with NetworkTopologyStrategy (DC1:2 and > DC2:2) with the following table: > > CREATE TABLE test ( > k1 int, > k2 timeuuid, > PRIMARY KEY ((k1), k2) > ) WITH CLUSTERING ORDER BY (k2 DESC) > > During a datacenter outage, as soon as a datacenter goes offline, I get > this error during a lightweight transaction: > > cqlsh:devtest> insert into test (k1,k2) values(1,now()) if not exists; > Request did not complete within rpc_timeout. > > > And a short time after the on-line DC verify the second DC is off-line: > > cqlsh:devtest> insert into test (k1,k2) values(1,now()) if not exists; > Unable to complete request: one or more nodes were unavailable. > > > So, my question is: Is there any way to keep lightweight transactions > working during a datacenter outage using the C* Python driver 2.7.2 ? > > I was thinking about catch the exception and do a simple insert (without > "IF") when the error occur, but having the lightweight transactions working > even during a DC outage/split would be nice. > > Thanks in advance for any help/hints. > > Best regards, Jero > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Token Ring Question
There really is only one token ring, but conceptually it's easiest to think of it like multiple rings, as OpsCenter shows it. The only difference is that every token has to be unique across the whole cluster. Now, if the token for a particular write falls in the “primary range” of a > node living in DC2, does the code check for such conditions and instead put > it on some node in DC1 ? > Yes. It will continue searching around the token ring until it hits a token that belongs to a node in the correct datacenter. What is the true meaning of “primary” token range in such scenarios ? > There's not really any such thing as a "primary token range", it's just a convenient idea for some tools. In reality, it's just the replica that owns the first (clockwise) token. I'm not sure what you're really asking, though -- what are you concerned about? On Wed, Jun 1, 2016 at 2:40 PM, Anubhav Kale <anubhav.k...@microsoft.com> wrote: > Hello, > > > > I recently learnt that regardless of number of Data Centers, there is > really only one token ring across all nodes. (I was under the impression > that there is one per DC like how Datastax Ops Center would show it). > > > > Suppose we have 4 v-nodes, and 2 DCs (2 nodes in each DC) and a key space > is set to replicate in only one DC – say DC1. > > > > Now, if the token for a particular write falls in the “primary range” of a > node living in DC2, does the code check for such conditions and instead put > it on some node in DC1 ? What is the true meaning of “primary” token range > in such scenarios ? > > > > Is this how things works roughly speaking or am I missing something ? > > > > Thanks ! > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Blob or columns
On Fri, Jun 3, 2016 at 10:43 AM, Abhinav Solan <abhinav.so...@gmail.com> wrote: > Should we store these inconsequential data as blob or JSON in one column > or create separate columns for them, which one should be the preferred way > here ? A blob will be more compact and require less server and driver resources for serialization and deserialization. Since you don't need to update anything in the blob individually, I recommend going with that. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Get clustering column in Custom cassandra trigger
Try: unfilteredRowIterator.next().clustering().toString(update.metadata()) To get the raw values, you can use: unfilteredRowIterator.next().clustering().getRawValues() On Thu, May 26, 2016 at 7:25 AM, Siddharth Verma < verma.siddha...@snapdeal.com> wrote: > Hi Sam, > Sorry, I couldn't understand. > > I am already using > UnfilteredRowIterator unfilteredRowIterator > =partition.unfilteredIterator(); > > while(unfilteredRowIterator.hasNext()){ > next.append(unfilteredRowIterator.next().toString()+"\001"); > } > > Is there another way to access it? > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Setting bloom_filter_fp_chance < 0.01
On Thu, May 26, 2016 at 4:36 AM, Adarsh Kumar <adarsh0...@gmail.com> wrote: > > 1). Is there any other way to configure no of buckets along with > bloom_fileter_fp_chance, to avoid this exception? > No, it's hard coded, although we could theoretically hard code it to support a higher number of buckets. > 2). If this validation is hard coaded then why it is even allowed to set > such value of bloom_fileter_fp_chance, that can prevent ssTable generation. > You're right, we should be validating this upfront when the probability is set. Can you open a ticket here for that? https://issues.apache.org/jira/browse/CASSANDRA -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Internal Handling of Map Updates
If you replace an entire collection, whether it's a map, set, or list, a range tombstone will be inserted followed by the new collection. If you only update a single element, no tombstones are generated. On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff < matthias.nieh...@codecentric.de> wrote: > Hi, > > we have a table with a Map Field. We do not delete anything in this table, > but to updates on the values including the Map Field (most of the time a > new value for an existing key, Rarely adding new keys). We now encounter a > huge amount of thumbstones for this Table. > > We used sstable2json to take a look into the sstables: > > > {"key": "Betty_StoreCatalogLines:7", > > "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001], > >["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 > 08:40Z",1463820040628001], > > > ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069], > > > ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708], > > > ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700], > > > ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430], > > > [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595], > > . . . > > > ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040], > > > ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","0154d265c6b0",1463820040628001], > > > [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article > Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article > #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country > Code\":\"276\"}}",1463820040628001] > > > > Looking at the SStables it seem like every update of a value in a Map > breaks down to a delete and insert in the corresponding SSTable (see all > the thumbstone flags „t“ in the extract of sstable2json above). > > We are using Cassandra 2.2.5. > > Can you confirm this behavior? > > Thanks! > -- > Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting > codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland > tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) > 172.1702676 > www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | > www.more4fi.de > > Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal > Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns > Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz > > Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche > und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige > Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie > bitte sofort den Absender und löschen Sie diese E-Mail und evtl. > beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen > evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist > nicht gestattet > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Low cardinality secondary index behaviour
On Tue, May 10, 2016 at 6:41 AM, Atul Saroha <atul.sar...@snapdeal.com> wrote: > I have concern over using secondary index on field with low cardinality. > Lets say I have few billion rows and each row can be classified in 1000 > category. Lets say we have 50 node cluster. > > Now we want to fetch data for a single category using secondary index over > a category. And query is paginated too with fetch size property say 5000. > > Since query on secondary index works as scatter and gatherer approach by > coordinator node. Would it lead to out of memory on coordinator or timeout > errors too much. > Paging will prevent the coordinator from using excessive memory. With the type of data that you described, timeouts shouldn't be huge problem because it will only take a few token ranges (assuming you're using vnodes) to get enough matching rows to hit the page size. > > How does pagination (token level data fetch) behave in scatter and > gatherer approach? > Secondary index queries fetch token ranges in sequential order [1], starting with the minimum token. When you fetch a new page, it resumes from the last token (and primary key) that it returned in the previous page. [1] As an optimization, multiple token ranges will be fetched in parallel based on estimates of how many token ranges it will take to fill the page. > > Secondly, What If we create an inverted table with partition key as > category. Then this will led to lots of data on single node. Then it might > led to hot shard issue and performance issue of data fetching from single > node as a single partition has millions of rows. > > How should we tackle such low cardinality index in Cassandra? The data distribution that you described sounds like a reasonable fit for secondary indexes. However, I would also take into account how frequently you run this query and how fast you need it to be. Even ignoring the scatter-gather aspects of a secondary index query, they are still expensive because they fetch many non-contiguous rows from an SSTable. If you need to run this query very frequently, that may add too much load to your cluster, and some sort of inverted table approach may be more appropriate. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Cassandra 3.0.6 Release?
On Mon, May 9, 2016 at 2:48 PM, Drew Kutcharian <d...@venarc.com> wrote: > > > What’s the 3.0.6 release date? Seems like the code has been frozen for a > few days now. I ask because I want to install Cassandra on Ubuntu 16.04 and > CASSANDRA-10853 is blocking it. > We've been holding it up to sync it with the 3.6 release. There were a couple of bugs in the first 3.6-tentative tag that forced us to re-roll and restart test runs. The release vote for 3.0.6 and 3.6 should start within the next couple of days, and takes 72 hours to complete. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Discrepancy while paging through table, and static column updated inbetween
This sounds similar to https://issues.apache.org/jira/browse/CASSANDRA-10010, but that only affected 2.x. Can you open a Jira ticket with your table schema, the problematic query, and the details you posted here? On Tue, Apr 19, 2016 at 10:25 AM, Siddharth Verma < verma.siddha...@snapdeal.com> wrote: > Hi, > > We are using cassandra(dsc3.0.3) on production. > > For some purpose, we were doing a full table scan (setPagingState and > getPagingState used on ResultSet in java program), and there has been some > discrepancy when we ran the same job multiple times. > Each time some new data was added to the output, and some was left out. > > Side Note 1 : > Table structure > col1, col2, col3, col4, col5, col6 > Primary key(col1, col2) > col5 is static column > col6 static column. Used to explicitly store updated time when col5 changed > > > Sample Data > 1,A,AA,AAA,STATIC,T1 > 1,B,BB,BBB,STATIC,T1 > 1,C,CC,CCC,STATIC,T1 > 1,D,DD,DDD,STATIC,T1 > > For some key, sometime col6 was updated while the job was running, so some > values were not printed for that partition key. > > Side Note 2 : > we did -> select col6, writetime(col6) from ... where col1=... and col2=... > For the data that was missed out to make sure that particular entry wasn't > added later. > > > Side Note 3: > The above scenario that some col6 was updated while job was running, > therefore some entry for that partition key was ignored, is an assumption > from our end. > We can't understand why some entries were not printed in the table scan. > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Proper use of COUNT
On Tue, Apr 19, 2016 at 9:51 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > > 1. Another clarification: All of the aggregate functions, AVG, SUM, MIN, > MAX are in exactly the same boat as COUNT, right? > Yes. > > 2. Is the paging for COUNT, et al, done within the coordinator node? > Yes. > > 3. Does dedupe on the coordinator node consume memory proportional to the > number of rows on all nodes? I mean, you can't dedupe using only partition > keys of the coordinator node, right? What I'm wondering is if the usability > of COUNT (et al) is memory limited as well as time. > Deduping (i.e. normal conflict resolution) happens per-page, so in the worst case the memory requirements for the coordinator are RF * page size. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Compaction Error When upgrading from 2.1.9 to 3.0.2
On Thu, Apr 14, 2016 at 2:08 PM, Anthony Verslues < anthony.versl...@mezocliq.com> wrote: > It was an older upgrade plan so I went ahead and tried to upgrade to 3.0.5 > and I ran into the same error. > Okay, good to know. Please include that info in the ticket when you open it. > > > Do you know what would cause this error? Is it something to do with > tombstoned or deleted rows? > > > I'm not sure, I haven't looked into it too deeply yet. From the stacktrace it looks related to reading the static columns of a row. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Leak Detected while bootstrap
This looks like it might be https://issues.apache.org/jira/browse/CASSANDRA-11374. Can you comment on that ticket and share your logs leading up to the error? On Wed, Apr 13, 2016 at 3:37 PM, Anubhav Kale <anubhav.k...@microsoft.com> wrote: > Hello, > > > > Since we upgraded to Cassandra 2.1.12, we are noticing that * below* > happens when we are trying to bootstrap nodes, and the process just gets > stuck. Restarting the process / VM does not help. Our nodes are around ~300 > GB and run on local SSDs and we haven’t seen this problem on older versions > (specifically 2.1.9). > > > > Is this a known issue / any workarounds ? > > > > *ERROR [Reference-Reaper:1] 2016-04-13 20:33:53,394 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@15e611a3) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@203187780:[[OffHeapBitSet]] > was not released before the reference was garbage collected* > > > > Thanks ! > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Compaction Error When upgrading from 2.1.9 to 3.0.2
Can you open a ticket here with your schema and the stacktrace? https://issues.apache.org/jira/browse/CASSANDRA I'm also curious why you're not upgrading to 3.0.5 instead of 3.0.2. On Wed, Apr 13, 2016 at 4:37 PM, Anthony Verslues < anthony.versl...@mezocliq.com> wrote: > I got this compaction error when running ‘nodetool upgradesstable –a’ > while upgrading from 2.1.9 to 3.0.2. According to documentation this > upgrade should work. > > > > Would upgrading to another intermediate version help? > > > > > > This is the line number: > https://github.com/apache/cassandra/blob/cassandra-3.0.2/src/java/org/apache/cassandra/db/LegacyLayout.java#L1124 > > > > > > error: null > > -- StackTrace -- > > java.lang.AssertionError > > at > org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1124) > > at > org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1099) > > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:444) > > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:423) > > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:289) > > at > org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:134) > > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:57) > > at > org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:329) > > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) > > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65) > > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:109) > > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:100) > > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:206) > > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:159) > > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:150) > > at > org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72) > > at > org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:226) > > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177) > > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78) > > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) > > at > org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:572) > > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Adding Options to Create Statements...
I'm not sure which driver you're referring to, but if it's the java driver, it has its own mailing list that may be more helpful: https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user On Thu, Mar 31, 2016 at 4:40 PM, James Carman <ja...@carmanconsulting.com> wrote: > No thoughts? Would an upgrade of the driver "fix" this? > > On Wed, Mar 30, 2016 at 10:42 AM James Carman <ja...@carmanconsulting.com> > wrote: > >> I am trying to perform the following operation: >> >> public Create createCreate() { >> Create create = >> SchemaBuilder.createTable("foo").addPartitionColumn("bar", >> varchar()).addClusteringColumn("baz", varchar); >> if(descending) { >> create.withOptions().clusteringOrder("baz", Direction.DESC); >> return create; >> } >> >> I don't want to have to return the Create.Options object from this method >> (as I may need to add other columns). Is there a way to have the options >> "decorate" the Create directly without having to return the Create.Options? >> >> -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Thrift composite partition key to cql migration
Also, can you paste the results of the relevant portions of "SELECT * FROM system.schema_columns" and "SELECT * FROM system.schema_columnfamilies"? On Thu, Mar 31, 2016 at 2:35 PM, Tyler Hobbs <ty...@datastax.com> wrote: > In the Thrift schema, is the key_validation_class actually set to > CompositeType(UTF8Type, UTF8Type), or is it just BytesType? What Cassandra > version? > > On Wed, Mar 30, 2016 at 4:44 PM, Jan Kesten <j.kes...@enercast.de> wrote: > >> Hi, >> >> while migrating the reminder of thrift operations in my application I >> came across a point where I cant find a good hint. >> >> In our old code we used a composite with two strings as row / partition >> key and a similar composite as column key like this: >> >> public Composite rowKey() { >> final Composite composite = new Composite(); >> composite.addComponent(key1, StringSerializer.get()); >> composite.addComponent(key2, StringSerializer.get()); >> return composite; >> } >> >> public Composite columnKey() { >> final Composite composite = new Composite(); >> composite.addComponent(key3, StringSerializer.get()); >> composite.addComponent(key4, StringSerializer.get()); >> return composite; >> } >> >> In cql this columnfamiliy looks like this: >> >> CREATE TABLE foo.bar ( >> key blob, >> column1 text, >> column2 text, >> value blob, >> PRIMARY KEY (key, column1, column2) >> ) >> >> For the columns key3 and key4 became column1 and column2 - but the old >> rowkey is presented as blob (I can put it into a hex editor and see that >> key1 and key2 values are in there). >> >> Any pointers to handle this or is this a known issue? I am using now >> DataStax Java driver for CQL, old connector used thrift. Is there any way >> to get key1 and key2 back apart from completly rewriting the table? This is >> what I had expected it to be: >> >> CREATE TABLE foo.bar ( >> key1 text, >> key2 text, >> column1 text, >> column2 text, >> value blob, >> PRIMARY KEY ((key1, key2), column1, column2) >> ) >> >> Cheers, >> Jan >> > > > > -- > Tyler Hobbs > DataStax <http://datastax.com/> > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Thrift composite partition key to cql migration
In the Thrift schema, is the key_validation_class actually set to CompositeType(UTF8Type, UTF8Type), or is it just BytesType? What Cassandra version? On Wed, Mar 30, 2016 at 4:44 PM, Jan Kesten <j.kes...@enercast.de> wrote: > Hi, > > while migrating the reminder of thrift operations in my application I came > across a point where I cant find a good hint. > > In our old code we used a composite with two strings as row / partition > key and a similar composite as column key like this: > > public Composite rowKey() { > final Composite composite = new Composite(); > composite.addComponent(key1, StringSerializer.get()); > composite.addComponent(key2, StringSerializer.get()); > return composite; > } > > public Composite columnKey() { > final Composite composite = new Composite(); > composite.addComponent(key3, StringSerializer.get()); > composite.addComponent(key4, StringSerializer.get()); > return composite; > } > > In cql this columnfamiliy looks like this: > > CREATE TABLE foo.bar ( > key blob, > column1 text, > column2 text, > value blob, > PRIMARY KEY (key, column1, column2) > ) > > For the columns key3 and key4 became column1 and column2 - but the old > rowkey is presented as blob (I can put it into a hex editor and see that > key1 and key2 values are in there). > > Any pointers to handle this or is this a known issue? I am using now > DataStax Java driver for CQL, old connector used thrift. Is there any way > to get key1 and key2 back apart from completly rewriting the table? This is > what I had expected it to be: > > CREATE TABLE foo.bar ( > key1 text, > key2 text, > column1 text, > column2 text, > value blob, > PRIMARY KEY ((key1, key2), column1, column2) > ) > > Cheers, > Jan > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Inconsistent query results and node state
On Thu, Mar 31, 2016 at 11:53 AM, Jason Kania <jason.ka...@ymail.com> wrote: > > To me it just seems like the timestamp column value is sometimes not being > set somewhere in the pipeline and the result is the epoch 0 value. > I agree, especially since you can't directly query this row and that timestamp doesn't fit in the normal ordering. > > Thoughts on how to proceed? > Please open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and include your schema and queries. If possible, it would also be extremely helpful if you can upload the sstables for that table. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Inconsistent query results and node state
> > org.apache.cassandra.service.DigestMismatchException: Mismatch for key > DecoratedKey(-4908797801227889951, 4a41534b414e) > (6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb) That key matches the row you mentioned, so it seems like all of the replicas should have converged on the same value for that row. Do you consistently get the *1969-12-31 19:00 *timestamp back now? If not, try selecting both "time" and "writetime(time)}" from that row and see what write timestamps each of the values have. The ArrayIndexOutOfBoundsException in response to nodetool compact looks like a bug. What version of Cassandra are you running? On Wed, Mar 30, 2016 at 9:59 AM, Kai Wang <dep...@gmail.com> wrote: > Do you have NTP setup on all nodes? > > On Tue, Mar 29, 2016 at 11:48 PM, Jason Kania <jason.ka...@ymail.com> > wrote: > >> We have encountered a query inconsistency problem wherein the following >> query returns different results sporadically with invalid values for a >> timestamp field looking like the field is uninitialized (a zero timestamp) >> in the query results. >> >> Attempts to repair and compact have not changed the results. >> >> select "subscriberId","sensorUnitId","sensorId","time" from >> "sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND >> "sensorId"=0 ORDER BY "time" LIMIT 10; >> >> Invalid Query Results >> subscriberIdsensorUnitIdsensorIdtime >> JASKAN002015-05-24 2:09 >> JASKAN00*1969-12-31 19:00* >> JASKAN002016-01-21 2:10 >> JASKAN002016-01-21 2:10 >> JASKAN002016-01-21 2:10 >> JASKAN002016-01-21 2:11 >> JASKAN002016-01-21 2:22 >> JASKAN002016-01-21 2:22 >> JASKAN002016-01-21 2:22 >> JASKAN002016-01-21 2:22 >> >> Valid Query Results >> subscriberIdsensorUnitIdsensorIdtime >> JASKAN002015-05-24 2:09 >> JASKAN002015-05-24 2:09 >> JASKAN002015-05-24 2:10 >> JASKAN002015-05-24 2:10 >> JASKAN002015-05-24 2:10 >> JASKAN002015-05-24 2:10 >> JASKAN002015-05-24 2:11 >> JASKAN002015-05-24 2:13 >> JASKAN002015-05-24 2:13 >> JASKAN002015-05-24 2:14 >> >> We have confirmed that the 1969-12-31 timestamp is not within the data >> based on running and number of queries so it looks like the invalid >> timestamp value is generated by the query. The query below returns no row. >> >> select * from "sensorReadingIndex" where "subscriberId"='JASKAN' AND >> "sensorUnitId"=0 AND "sensorId"=0 AND time='1969-12-31 19:00:00-0500'; >> >> No logs are coming out but the following was observed intermittently in >> the tracing output, but not correlated to the invalid query results: >> >> Digest mismatch: org.apache.cassandra.service.DigestMismatchException: >> Mismatch for key DecoratedKey(-7563144029910940626, >> 00064a41534b414e040400) >> (be22d379c18f75c2f51dd6942d2f9356 vs da4e95d571b41303b908e0c5c3fff7ba) >> [ReadRepairStage:3179] | 2016-03-29 23:12:35.025000 | 192.168.10.10 | >> >> An error from the debug log that might be related is: >> >> org.apache.cassandra.service.DigestMismatchException: Mismatch for key >> DecoratedKey(-4908797801227889951, 4a41534b414e) >> (6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb) >> at >> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) >> ~[apache-cassandra-3.0.3.jar:3.0.3] >> at >> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225) >> ~[apache-cassandra-3.0.3.jar:3.0.3] >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> [na:1.8.0_74] >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> [na:1.8.0_74] >> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74] >> >> The tracing files are attached and seem to show that in the failed case, >> content is skipped because of tombstones if we understand it correctly. >> This could be an inconsistency problem on 192.168.10.9 Unfortunately, >> attempts to compact on 192.168.10.9 only give the following error without >> any stack trace detail and are not fixed with repair. >> >> root@cutthroat:/usr/local/bin/analyzer/bin# nodetool compact >> error: null >> -- StackTrace -- >> java.lang.ArrayIndexOutOfBoundsException >> >> Any suggestions on how to fix or what to search for would be much >> appreciated. >> >> Thanks, >> >> Jason >> >> >> >> > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Drop and Add column with different datatype in Cassandra
On Tue, Mar 29, 2016 at 10:31 AM, Bhupendra Baraiya < bhupendra.bara...@continuum.net> wrote: > Does it mean Cassandra does not allow adding of the same column in the > Table even though it does not exists in the Table > As the error message says, you can't re-add a *collection* column with the same name. Other types of columns are fine. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Python to type field
This should be useful: http://datastax.github.io/python-driver/user_defined_types.html On Wed, Mar 16, 2016 at 1:18 PM, Rakesh Kumar <rakeshkumar46...@gmail.com> wrote: > Hi > > I have a type defined as follows > > CREATE TYPE etag ( > ttype int, > tvalue text > ); > > And this is used in a col of a table as follows > > evetag list > > > I have the following value in a file > [{ttype: 3 , tvalue: '90A1'}] > > This gets inserted via COPY command with no issues. > > However when I try to insert the same via a python program which I am > writing. where I prepare and then bind, I get this error while executing > > TypeError: Received an argument of invalid type for column "evetag". > Expected: VarcharType))'>, Got: ; (Received a string for a type that > expects a sequence) > > I tried casting the variable in python to list, tuple, but same error. > > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Understanding SELECT * paging/ordering
On Fri, Mar 18, 2016 at 4:58 PM, Dan Checkoway <dchecko...@gmail.com> wrote: > Say I have a table with 50M rows in a keyspace with RF=3 in a cluster of > 15 nodes (single local data center). When I do "SELECT * FROM table" and > page through those results (with a fetch size of say 1000), I'd like to > understand better how that paging works. > > Specifically, what determines the order in which which rows are returned? > Results are returned in token order (murmur3 hash of the partition key), and within a single partition, rows are ordered by the clustering key. > And what's happening under the hood...i.e. is the coordinator fetching > pages of 1000 from each node, passing some sort of paging state to each > node, and the coordinator merges the per-node sorted result sets? > The coordinator sequentially[1] queries each token range until it has enough rows to meet the page size. When the next page is fetched, it resumes this process, but starts at the last-used token (which is in the paging state that the driver passes to the coordinator) rather than the start of the ring. > I'm also curious how consistency level comes into play. i.e. if I use ONE > vs. QUORUM vs. ALL, how that impacts where the results come from and how > they're ordered, merged, and who knows what else I don't know... :-) > The only difference between ONE and QUORUM is that the coordinator will query multiple replicas for each token range and perform the standard conflict resolution. [1] In reality, based on estimates of how many token ranges it will need to query in order to meet the page size, it will query multiple token ranges in parallel. See CASSANDRA-1337 for details. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Automatically connect to any up node via cqlsh
On Wed, Mar 9, 2016 at 8:09 AM, Rakesh Kumar <dcrunch...@aim.com> wrote: > > Is it possible to set up cassandra/cqlsh so that if any node is down, > cqlsh will automatically try to connect to the other surviving nodes, > instead of erroring out. I know it is possible to supply ip_address and > port of the UP node as arguments to cqlsh, but I am looking at automatic > detection. > No, right now cqlsh is designed to connect to only a single node. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Isolation for atomic batch on the same partition key
On Mon, Feb 22, 2016 at 3:58 PM, Yawei Li <yawei...@gmail.com> wrote: > > 1. If an atomic batch (logged batch) contains a bunch of row mutations > and all of them have the same partition key, can I assume all those changes > have the same isolation as the row-level isolation? According to the post > here http://www.mail-archive.com/user%40cassandra.apache.org/msg42434.html, > it seems that we can get strong isolation. > e.g. > *BEGIN BATCH* > * UPDATE a IF condition_1;* > * INSERT b;* > * INSERT c;* > *APPLY BATCH* > > So at any replica, we expect isolation for the three changes on *a*, *b*, > *c* (*a* , *b*, *c* have the same partition key *k1*) -- i.e. either > none or all of them are visible. Can someone help confirm? > That is correct. > > 2. Say in the above batch, we include two extra row mutations d and e for > another partition key *k2*. Will the changes on (*a*, *b*, *c*) and (*d* > , *e*) still atomic respectively in terms of isolation? I understand > there is no isolation between (*a*, *b*, *c*) and (*d*, *e*). I.e. is > there a per-parition-key isolation guaranteed? > You can't use LWT conditions (i.e. "IF condition_1") in batches that span multiple partitions keys. If you did not include the condition, then you would get per-partition isolation, as you describe. > > > 3. I assume CL SERIAL or LOCAL_SERIAL on reads will try applying the above > logged batch if it is committed but not applied. Right? > Correct. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: IF NOT EXISTS with multiple static columns confusion
What version of Cassandra are you using? I just tested this out against trunk and got reasonable behavior: cqlsh:ks1> CREATE TABLE test (k int, s1 int static, s2 int static, c int, v int, PRIMARY KEY (k, c)); cqlsh:ks1> INSERT INTO test (k, c, v) VALUES (0, 0, 0); cqlsh:ks1> UPDATE test SET s1 = 0 WHERE k = 0 IF s1 = null; [applied] --- True cqlsh:ks1> TRUNCATE test; cqlsh:ks1> INSERT INTO test (k, c, v) VALUES (0, 0, 0); cqlsh:ks1> INSERT INTO test (k, s1) VALUES (0, 0) IF NOT EXISTS; [applied] --- True On Tue, Feb 23, 2016 at 6:15 PM, Nimi Wariboko Jr <n...@channelmeter.com> wrote: > I have a table with 2 static columns, and I write to either one of them, > if I then write to the other one using IF NOT EXISTS, it fails even though > it has never been written too before. Is it the case that all static > columns share the same "written too" marker? > > Given a table like so: > > CREATE TABLE test ( > id timeuuid, > foo int static, > bar int static, > baz int, > baq int > PRIMARY KEY (id, baz) > ) > > I'm seeing some confusing behavior see the statements below - > > """ > INSERT INTO cmpayments.report_payments (id, foo) VALUES (NOW(), 1) IF NOT > EXISTS; // succeeds > TRUNCATE test; > INSERT INTO cmpayments.report_payments (id, baq) VALUES > (99c3-b01a-11e5-b170-0242ac110002, 1); > UPDATE cmpayments.report_payments SET foo = 1 WHERE > id=99c3-b01a-11e5-b170-0242ac110002 IF foo=null; // fails, even though > foo=null > TRUNCATE test; > INSERT INTO cmpayments.report_payments (id, bar) VALUES > (99c3-b01a-11e5-b170-0242ac110002, 1); // succeeds > INSERT INTO cmpayments.report_payments (id, foo) VALUES (NOW(), 1) IF NOT > EXISTS; // fails, even though foo=null, and has never been written too > """ > > Nimi > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: copy and rename sstable files as keyspace migration approach
On Tue, Feb 23, 2016 at 12:36 PM, Robert Coli <rc...@eventbrite.com> wrote: > [1] In some very new versions of Cassandra, this may not be safe to do > with certain meta information files which are sadly no longer immutable. I presume you're referring to the index summary (i.e Summary.db files). These just contain a sampling of the (immutable) Index.db files, and are safe to hardlink in the way that you've described. The sampling level of the summary (which is what can change over time) is serialized at the start of the Summary.db file. If you're truly paranoid, you can skip the Summary.db files and they'll be rebuilt on startup. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: High Bloom filter false ratio
You can try slightly lowering the bloom_filter_fp_chance on your table. Otherwise, it's possible that you're repeatedly querying one or two partitions that always trigger a bloom filter false positive. You could try manually tracing a few queries on this table (for non-existent partitions) to see if the bloom filter rejects them. Depending on your Cassandra version, your false positive ratio could be inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525 There are also a couple of recent improvements to bloom filters: * https://issues.apache.org/jira/browse/CASSANDRA-8413 * https://issues.apache.org/jira/browse/CASSANDRA-9167 On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal <anis...@gmail.com> wrote: > Hello, > > We have a table with composite partition key with humungous cardinality, > its a combination of (long,long). On the table we have > bloom_filter_fp_chance=0.01. > > On doing "nodetool cfstats" on the 5 nodes we have in the cluster we are > seeing "Bloom filter false ratio:" in the range of 0.7 -0.9. > > I thought over time the bloom filter would adjust to the key space > cardinality, we have been running the cluster for a long time now but have > added significant traffic from Jan this year, which would not lead to > writes in the db but would lead to high reads to see if are any values. > > Are there any settings that can be changed to allow better ratio. > > Thanks > Anishek > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: „Using Timestamp“ Feature
2016-02-18 2:00 GMT-06:00 Matthias Niehoff <matthias.nieh...@codecentric.de> : > > * is the 'using timestamp' feature (and providing statement timestamps) > sufficiently robust and mature to build an application on? > Yes. It's been there since the start of CQL3. > * In a BatchedStatement, can different statements have different > (explicitly provided) timestamps, or is the BatchedStatement's timestamp > used for them all? Is this specified / stable behaviour? > Yes, you can separate timestamps per statement. And, in fact, if you potentially mix inserts and deletes on the same rows, you *should *use explicit timestamps with different values. See the timestamp notes here: http://cassandra.apache.org/doc/cql3/CQL.html#batchStmt > * cqhsh reports a syntax error when I use 'using timestamp' with an update > statement (works with 'insert'). Is there a good reason for this, or is it > a bug? > The "USING TIMESTAMP" goes in a different place in update statements. It should be something like: UPDATE mytable USING TIMESTAMP ? SET col = ? WHERE key = ? -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Duplicated key with an IN statement
On Thu, Feb 4, 2016 at 9:57 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > there's a bug in CHANGES.TXT for this issue. It says: "Duplicate rows > returned when in clause has repeated values (CASSANDRA-6707)", but the > issue number is really 6706. > Thanks, I've fixed this. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Cqlsh hangs & closes automatically
The default page size in cqlsh is 100, so perhaps something is going on there? Try running cqlsh with --debug to see if there are any errors. On Tue, Feb 2, 2016 at 11:21 AM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > My cqlsh prompt hangs and closes if I try to fetch just 100 rows using > select * query. Cassandra-cli does the job. Any solution? > > > > Thanks > Anuj > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Cassandra 3.1.1 with respect to HeapSpace
itLogReplayException: > Unexpected error deserializing mutation; saved to > /tmp/mutation7465380878750576105dat. This may be caused by replaying a > mutation against a table with the same name but incompatible schema. > Exception follows: org.apache.cassandra.serializers.MarshalException: Not > enough bytes to read a map > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) > [apache-cassandra-3.1.1.jar:3.1.1] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) > [apache-cassandra-3.1.1.jar:3.1.1] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) > [apache-cassandra-3.1.1.jar:3.1.1] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) > [apache-cassandra-3.1.1.jar:3.1.1] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) > [apache-cassandra-3.1.1.jar:3.1.1] > at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) > [apache-cassandra-3.1.1.jar:3.1.1] > at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) > [apache-cassandra-3.1.1.jar:3.1.1] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) > [apache-cassandra-3.1.1.jar:3.1.1] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) > [apache-cassandra-3.1.1.jar:3.1.1] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) > [apache-cassandra-3.1.1.jar:3.1.1] > > I can no longer start my nodes. > > How can I restart my cluster? > Is this problem known? > Is there a better Cassandra 3 version which would behave better with > respect to this problem? > Would there be a better memory configuration to select for my nodes? > Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node. > > > Thank you very much for your advice. > > Kind regards > > Jean > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Sorting & pagination in apache cassandra 2.1
On Thu, Jan 7, 2016 at 6:45 AM, anuja jain <anujaja...@gmail.com> wrote: > My question is, what is the alternative if we need to order by col3 or > col4 in my above example without including col2 in order by clause. > The server-side alternative is to create a second table (or a materialized view, if you're using 3.0+) that uses a different clustering order. Cassandra purposefully only supports simple and efficient queries that can be handled quickly (with a few exceptions), and arbitrary ordering is not part of that, especially if you consider complications like paging. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Cassandra 3.1 - Aggregation query failure
> > > 1. Is it possible to "tune" the page size or is it hard-coded internally ? > If a page size is set for the request at the driver level, that page size will be used internally. Otherwise, it defaults to something reasonable (probably ~5k rows). > 2. Is read-repair performed on EACH page or is it done on the whole > requested rows once they are fetched ? > It's performed on each page as it's read. Do note that read repair doesn't happen for multi-partition range reads, regardless of paging or aggregation. > > Question 2. is relevant in some particular scenarios when the user is > using CL QUORUM (or more) and some replicas are out-of-sync. Even in the > case of aggregation over a single partition, if this partition is wide and > spans many fetch pages, the time the coordinator performs all the > read-repair and reconcile over QUORUM replicas, the query may timeout very > quickly. > Yes, that's possible. Timeouts for these queries should be adjusted accordingly. It's worth noting that the read_request_timeout_in_ms setting applies per-page, so coordinator-level timeouts shouldn't be severely affected by this. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Cassandra 3.1 - Aggregation query failure
On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > Cassandra will perform a full table scan and fetch all the data in memory > to apply the aggregate function. Just to clarify for others on the list: when executing aggregation functions, Cassandra *will* use paging internally, so at most one page worth of data will be held in memory at a time. However, if your aggregation function retains a large amount of data, this may contribute to heap pressure. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: [RELEASE] Apache Cassandra 3.1 released
On Fri, Dec 11, 2015 at 1:59 AM, Janne Jalkanen <janne.jalka...@ecyrd.com> wrote: > > So there is no reason why you would ever want to run 3.1 then? > Probably not. > Why was it released? > For consistency. It's the first release in the new tick-tock release scheme. Skipping that would have been a bit strange (although I'll agree it's also strange to have 3.0.1 == 3.1). > What is the lifecycle of 3.0.x? Will it become obsolete once 3.3 comes > out? > 3.0.x will continue until 4.0. > > >- If you want access to the new features introduced in even release >versions of 3.x (3.2, 3.4, 3.6), you'll want to run the latest odd version >(3.3, 3.5, 3.7, etc) after the release containing the feature you want >access to (so, if the feature's introduced in 3.4 and we haven't dropped >3.5 yet, obviously you'd need to run 3.4). > > > Are there going to be minor releases of the even releases, i.e. 3.2.1? > Not unless we discover critical bugs in 3.2, such as security vulnerabilities or corruption issues. > Or will they all be delegated to 3.3.x -series? Or will there be a > series of identical releases like 3.1 and 3.0.1 with 3.2.1 and 3.3? > There's not going to be a 3.3.x series, there will be one 3.3 release (unless there is a critical bug, as mentioned above). There are two separate release lines going on: 3.0.1 -> 3.0.2 -> 3.0.3 -> 3.0.4 -> ... (every release is a bugfix) 3.1 -> 3.2 -> 3.3 -> 3.4 -> ... (odd numbers are bugfix releases, even numbers may contain new features) > > This is only going to be the case during the transition phase from old > release cycles to tick-tock. We're targeting changes to CI and quality > focus going forward to greatly increase the stability of the odd releases > of major branches (3.1, 3.3, etc) so, for the 4.X releases, our > recommendation would be to run the highest # odd release for greatest > stability. > > > So here you tell to run 3.1, but above you tell to run 3.0.1? Why is > there a different release scheme specifically for 3.0.x instead of putting > those fixes to 3.1? > We don't know how well the tick-tock release scheme will stabilize yet. As a safety net, we're doing our traditional release scheme for 3.0.x. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: [RELEASE] Apache Cassandra 3.1 released
This explains the new release plans in detail: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ 3.0.1 and 3.1 are a special case, because they happen to be identical. However, 3.0.2 will not be the same as 3.2. The 3.0.2 will only contain bugfixes, while 3.2 will introduce new features. There will not be a 3.1.1 or 3.2.1 unless a very critical bug is discovered in 3.1 or 3.2. If you "just want to run the most stable 3.0", stick with 3.0.x for now (which is 3.0.1). If you want to use bleeding-edge features, try out 3.2 when it's released (but be warned that it may not be as stable). On Wed, Dec 9, 2015 at 8:27 AM, Hannu Kröger <hkro...@gmail.com> wrote: > Hi, > > I feel the same as well. Would you skip 3.2 when you release another round > of bug fixes after one round of bug fixes? Or would 3.2 be released after > 3.3.? :P > > BR, > Hannu > > On 09 Dec 2015, at 16:05, Kai Wang <dep...@gmail.com> wrote: > > Janne, > > You are not alone. I am also confused by that "Under normal conditions > ..." statement. I can really use some examples such as: > 3.0.0 = ? > 3.0.1 = ? > 3.1.0 = ? > 3.1.1 = ? (this should not happen under normal conditions because the fix > should be in 3.3.0 - the next bug fix release?) > > On Wed, Dec 9, 2015 at 3:05 AM, Janne Jalkanen <janne.jalka...@ecyrd.com> > wrote: > >> >> I’m sorry, I don’t understand the new release scheme at all. Both of >> these are bug fixes on 3.0? What’s the actual difference? >> >> If I just want to run the most stable 3.0, should I run 3.0.1 or 3.1? >> Will 3.0 gain new features which will not go into 3.1, because that’s a bug >> fix release on 3.0? So 3.0.x will contain more features than 3.1, as >> even-numbered releases will be getting new features? Or is 3.0.1 and 3.1 >> essentially the same thing? Then what’s the role of 3.1? Will there be more >> than one 3.1? 3.1.1? Or is it 3.3? What’s the content of that? 3.something >> + patches = 3.what? >> >> What does this statement in the referred blog post mean? "Under normal >> conditions, we will NOT release 3.x.y stability releases for x > 0.” Why >> are the normal conditions being violated already by releasing 3.1 (since 1 >> > 0)? >> >> /Janne, who is completely confused by all this, and suspects he’s the >> target of some hideous joke. >> >> On 8 Dec 2015, at 22:26, Jake Luciani <j...@apache.org> wrote: >> >> >> The Cassandra team is pleased to announce the release of Apache Cassandra >> version 3.1. This is the first release from our new Tick-Tock release >> process[4]. >> It contains only bugfixes on the 3.0 release. >> >> Apache Cassandra is a fully distributed database. It is the right choice >> when you need scalability and high availability without compromising >> performance. >> >> http://cassandra.apache.org/ >> >> Downloads of source and binary distributions are listed in our download >> section: >> >> http://cassandra.apache.org/download/ >> >> This version is a bug fix release[1] on the 3.x series. As always, please >> pay >> attention to the release notes[2] and Let us know[3] if you were to >> encounter >> any problem. >> >> Enjoy! >> >> [1]: http://goo.gl/rQJ9yd (CHANGES.txt) >> [2]: http://goo.gl/WBrlCs (NEWS.txt) >> [3]: https://issues.apache.org/jira/browse/CASSANDRA >> [4]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ >> >> >> > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Cassandra 3.0.0 connection problem
On Thu, Nov 19, 2015 at 1:13 AM, Enrico Sola <sola.enrico...@gmail.com> wrote: > Hi, I'm new to Cassandra and I've recently upgraded to 3.0.0 on Ubuntu > Linux 14.04 LTS, through apt-get upgrade not manual installation, after the > update all was fine so I could access to my keyspaces using cqlsh but I > can't access to Cassandra using DataStax PHP Driver because I get this > error: "No hosts available for the control connection”. > The connection parameters are the same of 2.2.3 version (and was working > fine). > I don't know if is this a bug or a problem of the PHP driver but my > systems use Cassandra and are now offline, so it's a known issue with a > solution? > I don't think the PHP driver supports Cassandra 3.0 yet. There were some changes to the system schema tables that are probably preventing it from connecting successfully. > I tried also to downgrade to 2.2.3 version but after that Cassandra didn't > start due to keyspace loading problem, I'm just looking for a quick > solution so doesn't matter if I have to downgrade to 2.2.3, so how can I do > the downgrade without lose my datas? > Downgrading major versions isn't supported, which is why we recommend that you take a snapshot before upgrading. Your only real option for downgrading without data loss is to dump your data (using cqlsh's COPY TO or something similar) and then re-load it on 2.2 (using cqlsh's COPY FROM or something similar). -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Cqlsh copy to and copy from
If the fields are null, COPY TO should just be generating "{field1: null, field2: null}". Would you mind opening a ticket here with steps to reproduce: https://issues.apache.org/jira/browse/CASSANDRA On Thu, Nov 19, 2015 at 1:05 AM, Vova Shelgunov <vvs...@gmail.com> wrote: > Hi all, > > I have a trouble with copy functionality in cassandra 3.0. > > When I am trying to copy my table to file, some of UDTs have the following > representation: > > {field1: , field2: } > > They have no values, and when I tried to restore this table, this rows was > not imported. > > Do you plan to fix that, e.g. fill with default values or exclude them? > > Thanks. > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Timeout with static column
Processing response from / > 192.168.169.20 [SharedPool-Worker-2] | 2015-11-11 19:38:40.754000 | > 192.168.169.10 | 330177 > > Request complete | 2015-11-11 19:38:40.813963 | 192.168.169.10 | >389963 > > This specific key has about 1900 records of around 50/100 bytes each which > makes it quite large (compared to others), and the `used` static column is > True. > > I know this is a C* anti-pattern, but regularly, smaller (older) > `sequence_nr` are deleted. > I think this isn't a problem since most of the read requests are bounded > by sequence_nr (and are pretty fast), so there are certainly many > tombstones (even though the trace above doesn't tell that). > > What's strange is that it seems the query scans the whole set of records, > even though it should return only the static column (whose by definition > has only one value indepedently of the number of records), so it should be > pretty fast, isn't it? > > Note that using `SELECT DISTINCT` doesn't seem to change anything > regarding speed (I understand that it is the recommended way of doing this > kind of queries). > > Anyone can explain me how this problem can be solved, or what could be its > root cause? > > Thanks for any answers, > -- > Brice Figureau > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Multi-column slice restrictions not respected by the returned result
Correct, it's a full tuple comparison. On Wed, Nov 11, 2015 at 1:43 PM, Yuri Shkuro <y...@uber.com> wrote: > Thanks, Tyler. > > I also realized that I misunderstood multi-column restriction. Evidently, > (a, b) > (x, y) does not imply component-wise restriction (a>x && b>y) in > CQL, it only implies full tuple comparison. That explains why my condition > (a, b) > (2, 10) was matching row (2, 11). > > On Wed, Nov 11, 2015 at 2:31 PM, Tyler Hobbs <ty...@datastax.com> wrote: > >> This is a known problem with multi-column slices and mixed ASC/DESC >> clustering orders. See >> https://issues.apache.org/jira/browse/CASSANDRA-7281 for details. >> >> On Tue, Nov 10, 2015 at 11:02 PM, Yuri Shkuro <y...@uber.com> wrote: >> >>> According to this blog: >>> http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause >>> >>> I should be able to do multi-column restrictions on clustering columns, >>> as in the blog example: WHERE (server, time) >= (‘196.8.0.0’, 12:00) AND >>> (server, time) <= (‘196.8.255.255’, 14:00) >>> >>> However, I am getting data returned from such query that does not match >>> the restrictions. Tried on Cassandra 2.17 and 2.2.3. Here's an example: >>> >>> CREATE TABLE IF NOT EXISTS dur ( >>> s text, >>> nd bigint, >>> ts bigint, >>> tidbigint, >>> PRIMARY KEY (s, nd, ts) >>> ) WITH CLUSTERING ORDER BY (nd ASC, ts DESC); >>> >>> insert INTO dur (s, nd, ts, tid) values ('x', 1, 10, 99); >>> insert INTO dur (s, nd, ts, tid) values ('x', 2, 11, 98) ; >>> insert INTO dur (s, nd, ts, tid) values ('x', 3, 10, 97) ; >>> insert INTO dur (s, nd, ts, tid) values ('x', 1, 11, 96) ; >>> insert INTO dur (s, nd, ts, tid) values ('x', 1, 12, 95) ; >>> insert INTO dur (s, nd, ts, tid) values ('x', 2, 10, 94) ; >>> insert INTO dur (s, nd, ts, tid) values ('x', 2, 12, 93) ; >>> insert INTO dur (s, nd, ts, tid) values ('x', 3, 11, 92) ; >>> insert INTO dur (s, nd, ts, tid) values ('x', 3, 12, 91) ; >>> >>> select * from dur where s='x' and (nd,ts) > (2, 11); >>> >>> s | nd | ts | tid >>> ---+++- >>> x | 2 | 10 | 94 >>> x | 3 | 12 | 91 >>> x | 3 | 11 | 92 >>> x | 3 | 10 | 97 >>> (4 rows) >>> >>> The first row in the result does not satisfy the restriction (nd,ts) > >>> (2, 11). Am I doing something incorrectly? >>> >>> Thanks, >>> --Yuri >>> >> >> >> >> -- >> Tyler Hobbs >> DataStax <http://datastax.com/> >> > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Multi-column slice restrictions not respected by the returned result
This is a known problem with multi-column slices and mixed ASC/DESC clustering orders. See https://issues.apache.org/jira/browse/CASSANDRA-7281 for details. On Tue, Nov 10, 2015 at 11:02 PM, Yuri Shkuro <y...@uber.com> wrote: > According to this blog: > http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause > > I should be able to do multi-column restrictions on clustering columns, as > in the blog example: WHERE (server, time) >= (‘196.8.0.0’, 12:00) AND > (server, time) <= (‘196.8.255.255’, 14:00) > > However, I am getting data returned from such query that does not match > the restrictions. Tried on Cassandra 2.17 and 2.2.3. Here's an example: > > CREATE TABLE IF NOT EXISTS dur ( > s text, > nd bigint, > ts bigint, > tidbigint, > PRIMARY KEY (s, nd, ts) > ) WITH CLUSTERING ORDER BY (nd ASC, ts DESC); > > insert INTO dur (s, nd, ts, tid) values ('x', 1, 10, 99); > insert INTO dur (s, nd, ts, tid) values ('x', 2, 11, 98) ; > insert INTO dur (s, nd, ts, tid) values ('x', 3, 10, 97) ; > insert INTO dur (s, nd, ts, tid) values ('x', 1, 11, 96) ; > insert INTO dur (s, nd, ts, tid) values ('x', 1, 12, 95) ; > insert INTO dur (s, nd, ts, tid) values ('x', 2, 10, 94) ; > insert INTO dur (s, nd, ts, tid) values ('x', 2, 12, 93) ; > insert INTO dur (s, nd, ts, tid) values ('x', 3, 11, 92) ; > insert INTO dur (s, nd, ts, tid) values ('x', 3, 12, 91) ; > > select * from dur where s='x' and (nd,ts) > (2, 11); > > s | nd | ts | tid > ---+++- > x | 2 | 10 | 94 > x | 3 | 12 | 91 > x | 3 | 11 | 92 > x | 3 | 10 | 97 > (4 rows) > > The first row in the result does not satisfy the restriction (nd,ts) > > (2, 11). Am I doing something incorrectly? > > Thanks, > --Yuri > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: why cassanra max is 20000/s on a node ?
> > the program use datastax driver 2.1.8 and use 5 thread to insert data to > cassandra on the same machine The client with five threads is probably your bottleneck. Try running the cassandra stress tool for comparison. You should see at least double the throughput. On Thu, Nov 5, 2015 at 9:56 AM, Eric Stevens <migh...@gmail.com> wrote: > > 512G memory , 128core cpu > > This seems dramatically oversized for a Cassandra node. You'd do *much* > better > to have a much larger cluster of much smaller nodes. > > > On Thu, Nov 5, 2015 at 8:25 AM Jack Krupansky <jack.krupan...@gmail.com> > wrote: > >> I don't know what current numbers are, but last year the idea of getting >> 1 million writes per second on a 96 node cluster was considered a >> reasonable achievement. That would be roughly 10,000 writes per second per >> node and you are getting twice that. >> >> See: >> http://www.datastax.com/1-million-writes >> >> Or this Google test which hit 1 million writes per second with 330 nodes, >> which would be roughly 3,000 writes per second per node: >> >> http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html >> >> So, is your question why your throughput is so good or are you >> disappointed that it wasn't better? >> >> Cassandra is designed for clusters with lots of nodes, so if you want to >> get an accurate measure of per-node performance you need to test with a >> reasonable number of nodes and then divide aggregate performance by the >> number of nodes, not test a single node alone. In short, testing a single >> node in isolation is not a recommended approach to testing Cassandra >> performance. >> >> >> -- Jack Krupansky >> >> On Thu, Nov 5, 2015 at 9:05 AM, 郝加来 <ha...@neusoft.com> wrote: >> >>> hi >>> veryone >>> i setup cassandra 2.2.3 on a node , the machine 's environment is >>> openjdk-1.8.0 , 512G memory , 128core cpu , 3T ssd . >>> the token num is 256 on a node , the program use datastax driver 2.1.8 >>> and use 5 thread to insert data to cassandra on the same machine , the data >>> 's capcity is 6G and 1157000 line . >>> >>> why is the throughput 2/s on the node ? >>> >>> >>> # Per-thread stack size. >>> >>> JVM_OPTS="$JVM_OPTS -Xss512k" >>> >>> >>> >>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410) >>> >>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103" >>> >>> >>> >>> # GC tuning options >>> >>> JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode" >>> >>> JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC" >>> >>> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled" >>> >>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC" >>> >>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC" >>> >>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled" >>> >>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4" >>> >>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2" >>> >>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75" >>> >>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly" >>> >>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB" >>> >>> JVM_OPTS="$JVM_OPTS >>> -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler" >>> >>> JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6" >>> >>> >>> >>> memtable_heap_space_in_mb: 1024 >>> >>> memtable_offheap_space_in_mb: 10240 >>> >>> memtable_cleanup_threshold: 0.55 >>> >>> memtable_allocation_type: heap_buffers >>> >>> >>> >>> >>> 以上 >>> 谢谢 >>> -- >>> >>> *郝加来* >>> >>> 金融华东事业部 >>> >>> 东软集团股份有限公司 >>> 上海市闵行区紫月路1000号东软软件园 >>> Postcode:200241 >>> Tel:(86 21) 33578591 >>> Fax:(86 21) *23025565-111* >>> Mobile:13764970711 >>> Email:ha...@neusoft.com >>> Http://www.neusoft.com <http://www.neusoft.com/> >>> >>> >>> >>> >>> >>> >>> >>> --- >>> Confidentiality Notice: The information contained in this e-mail and any >>> accompanying attachment(s) >>> is intended only for the use of the intended recipient and may be >>> confidential and/or privileged of >>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any >>> reader of this communication is >>> not the intended recipient, unauthorized use, forwarding, printing, >>> storing, disclosure or copying >>> is strictly prohibited, and may be unlawful.If you have received this >>> communication in error,please >>> immediately notify the sender by return e-mail, and delete the original >>> message and all copies from >>> your system. Thank you. >>> >>> --- >>> >> >> -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Error code=1000
When you say "I am using cassandra standalone", do you mean that you're running a single-node cluster? If that's the case, then I'm guessing your problem is that the replication factor for the keyspace is set to 2 or 3 (instead of 1). On Sat, Oct 31, 2015 at 3:00 PM, Ricardo Sancho <sancho.rica...@gmail.com> wrote: > One or more of your nodes, depending on your replication factor, is not > answering in time. Either they are down or have too much load that they are > not able to answer every request before the timeout expires. > On 31 Oct 2015 20:35, "Eduardo Alfaia" <eduardocalf...@gmail.com> wrote: > >> Hi guys, >> >> Could you help me with this error? >> >> cassandra.Unavailable: code=1000 [Unavailable exception] message="Cannot >> achieve consistency level LOCAL_QUORUM" info={'required_replicas': 2, >> 'alive_replicas': 1, 'consistency': 'LOCAL_QUORUM’} >> >> I am using cassandra standalone >> >> Thanks >> >> -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Error Code
That means the driver could not decode a Result message from Cassandra. Can you post the query that's failing along with your schema for that table to the Python driver mailing list? Here's a link: https://groups.google.com/a/lists.datastax.com/forum/#!forum/python-driver-user On Thu, Oct 29, 2015 at 9:43 AM, Eduardo Alfaia <eduardocalf...@gmail.com> wrote: > I am using a python driver from DataStax. Cassandra driver 2.7.2 > > On 29 Oct 2015, at 15:26, Chris Lohfink <clohfin...@gmail.com> wrote: > > It means a response (opcode 8) message couldn't be decoded. What driver > are you using? What version? What version of C*? > > Chris > > On Thu, Oct 29, 2015 at 9:19 AM, Eduardo Alfaia <eduardocalf...@gmail.com> > wrote: > >> yes, but what does it mean? >> >> On 29 Oct 2015, at 15:18, Kai Wang <dep...@gmail.com> wrote: >> >> >> https://github.com/datastax/python-driver/blob/75ddc514617304797626cc69957eb6008695be1e/cassandra/connection.py#L573 >> >> Is your error message complete? >> >> On Thu, Oct 29, 2015 at 9:45 AM, Eduardo Alfaia <eduardocalf...@gmail.com >> > wrote: >> >>> Hi Guys, >>> >>> Does anyone know what error code in cassandra is? >>> >>> Error decoding response from Cassandra. opcode: 0008; >>> >>> Thanks >>> >> >> >> > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Is there any configuration so that local program on C* node can connect using localhost and remote program using IP/name?
On Mon, Oct 19, 2015 at 7:35 PM, Ravi <ravi.ga...@gmail.com> wrote: > > I am using apache-cassandra-2.2.0. You should upgrade to 2.2.3. There were some bugs that you probably want to avoid in 2.2.0. > > Is there any configuration so that local program on C* node can connect > using localhost as connection url and remote program's using IP/name in > connection url? Set rpc_address to 0.0.0.0 to bind all interfaces. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Cassandra query degradation with high frequency updated tables.
y$DroppableRunnable.run(StorageProxy.java:2187) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.2.2.jar:2.2.2] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: > java.io.IOException: Seek position 182054 is not within mmap segment (seg > offs: 0, length: 182054) > at > org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:250) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.io.sstable.format.SSTableReader.getPosition(SSTableReader.java:1558) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.(SSTableSliceIterator.java:42) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:75) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2004) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1808) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:360) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1537) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2183) > ~[apache-cassandra-2.2.2.jar:2.2.2] > ... 4 common frames omitted > Caused by: java.io.IOException: Seek position 182054 is not within mmap > segment (seg offs: 0, length: 182054) > at > org.apache.cassandra.io.util.ByteBufferDataInput.seek(ByteBufferDataInput.java:47) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.io.util.AbstractDataInput.skipBytes(AbstractDataInput.java:33) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.io.util.FileUtils.skipBytesFully(FileUtils.java:405) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.RowIndexEntry$Serializer.skipPromotedIndex(RowIndexEntry.java:164) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.RowIndexEntry$Serializer.skip(RowIndexEntry.java:155) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:244) > ~[apache-cassandra-2.2.2.jar:2.2.2] > > > > > On Oct 9, 2015, at 9:26 AM, Carlos Alonso <i...@mrcalonso.com> wrote: > > Yeah, I was about to suggest the compaction strategy too. Leveled > compaction sounds like a better fit when records are being updated > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > On 8 October 2015 at 22:35, Tyler Hobbs <ty...@datastax.com> wrote: > >> Upgrade to 2.2.2. Your sstables are probably not compacting due to >> CASSANDRA-10270 <https://issues.apache.org/jira/browse/CASSANDRA-10270>, >> which was fixed in 2.2.2. >> >> Additionally, you may want to look into using leveled compaction ( >> http://www.datastax.com/dev/blog/when-to-use-leveled-compaction). >> >> On Thu, Oct 8, 2015 at 4:27 PM, Nazario Parsacala <dodongj...@gmail.com> >> wrote: >> >>> >>> Hi, >>> >>> so we are developing a system that computes profile of things that it >>> observes. The observation comes in form of events. Each thing that it >>> observe has an id and each thing has a set of subthings in it which has >>> measurement of some kind. Roughly there are about 500 subthings within each
Re: Cassandra query degradation with high frequency updated tables.
ocalReadRunnable.runMayThrow(StorageProxy.java:1537) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2183) > ~[apache-cassandra-2.2.2.jar:2.2.2] > ... 4 common frames omitted > Caused by: java.io.IOException: Seek position 182054 is not within mmap > segment (seg offs: 0, length: 182054) > at > org.apache.cassandra.io.util.ByteBufferDataInput.seek(ByteBufferDataInput.java:47) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.io.util.AbstractDataInput.skipBytes(AbstractDataInput.java:33) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.io.util.FileUtils.skipBytesFully(FileUtils.java:405) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.RowIndexEntry$Serializer.skipPromotedIndex(RowIndexEntry.java:164) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.db.RowIndexEntry$Serializer.skip(RowIndexEntry.java:155) > ~[apache-cassandra-2.2.2.jar:2.2.2] > at > org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:244) > ~[apache-cassandra-2.2.2.jar:2.2.2] > > > > > On Oct 9, 2015, at 9:26 AM, Carlos Alonso <i...@mrcalonso.com> wrote: > > Yeah, I was about to suggest the compaction strategy too. Leveled > compaction sounds like a better fit when records are being updated > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > On 8 October 2015 at 22:35, Tyler Hobbs <ty...@datastax.com> wrote: > >> Upgrade to 2.2.2. Your sstables are probably not compacting due to >> CASSANDRA-10270 <https://issues.apache.org/jira/browse/CASSANDRA-10270>, >> which was fixed in 2.2.2. >> >> Additionally, you may want to look into using leveled compaction ( >> http://www.datastax.com/dev/blog/when-to-use-leveled-compaction). >> >> On Thu, Oct 8, 2015 at 4:27 PM, Nazario Parsacala <dodongj...@gmail.com> >> wrote: >> >>> >>> Hi, >>> >>> so we are developing a system that computes profile of things that it >>> observes. The observation comes in form of events. Each thing that it >>> observe has an id and each thing has a set of subthings in it which has >>> measurement of some kind. Roughly there are about 500 subthings within each >>> thing. We receive events containing measurements of these 500 subthings >>> every 10 seconds or so. >>> >>> So as we receive events, we read the old profile value, calculate the >>> new profile based on the new value and save it back. We use the following >>> schema to hold the profile. >>> >>> CREATE TABLE myprofile ( >>> id text, >>> month text, >>> day text, >>> hour text, >>> subthings text, >>> lastvalue double, >>> count int, >>> stddev double, >>> PRIMARY KEY ((id, month, day, hour), subthings) >>> ) WITH CLUSTERING ORDER BY (subthings ASC) ); >>> >>> >>> This profile will then be use for certain analytics that can use in the >>> context of the ‘thing’ or in the context of specific thing and subthing. >>> >>> A profile can be defined as monthly, daily, hourly. So in case of >>> monthly the month will be set to the current month (i.e. ‘Oct’) and the day >>> and hour will be set to empty ‘’ string. >>> >>> >>> The problem that we have observed is that over time (actually in just a >>> matter of hours) we will see a huge degradation of query response for the >>> monthly profile. At the start it will be respinding in 10-100 ms and after >>> a couple of hours it will go to 2000-3000 ms . If you leave it for a couple >>> of days you will start experiencing readtimeouts . The query is basically >>> just : >>> >>> select * from myprofile where id=‘1’ and month=‘Oct’ and day=‘’ and >>> hour=‘' >>> >>> This will have only about 500 rows or so. >>> >>> >>> I believe that this is cause by the fact there are multiple updates done >>> to this specific partition. So what do we think can be done to resolve this >>> ? >>> >>> BTW, I am using Cassandra 2.2.1 . And since this is a test , this is >>> just running on a single node. >>> >>> >>> >>> >>> >> >> >> -- >> Tyler Hobbs >> DataStax <http://datastax.com/> >> > > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Cassandra query degradation with high frequency updated tables.
Upgrade to 2.2.2. Your sstables are probably not compacting due to CASSANDRA-10270 <https://issues.apache.org/jira/browse/CASSANDRA-10270>, which was fixed in 2.2.2. Additionally, you may want to look into using leveled compaction ( http://www.datastax.com/dev/blog/when-to-use-leveled-compaction). On Thu, Oct 8, 2015 at 4:27 PM, Nazario Parsacala <dodongj...@gmail.com> wrote: > > Hi, > > so we are developing a system that computes profile of things that it > observes. The observation comes in form of events. Each thing that it > observe has an id and each thing has a set of subthings in it which has > measurement of some kind. Roughly there are about 500 subthings within each > thing. We receive events containing measurements of these 500 subthings > every 10 seconds or so. > > So as we receive events, we read the old profile value, calculate the new > profile based on the new value and save it back. We use the following > schema to hold the profile. > > CREATE TABLE myprofile ( > id text, > month text, > day text, > hour text, > subthings text, > lastvalue double, > count int, > stddev double, > PRIMARY KEY ((id, month, day, hour), subthings) > ) WITH CLUSTERING ORDER BY (subthings ASC) ); > > > This profile will then be use for certain analytics that can use in the > context of the ‘thing’ or in the context of specific thing and subthing. > > A profile can be defined as monthly, daily, hourly. So in case of monthly > the month will be set to the current month (i.e. ‘Oct’) and the day and > hour will be set to empty ‘’ string. > > > The problem that we have observed is that over time (actually in just a > matter of hours) we will see a huge degradation of query response for the > monthly profile. At the start it will be respinding in 10-100 ms and after > a couple of hours it will go to 2000-3000 ms . If you leave it for a couple > of days you will start experiencing readtimeouts . The query is basically > just : > > select * from myprofile where id=‘1’ and month=‘Oct’ and day=‘’ and hour=‘' > > This will have only about 500 rows or so. > > > I believe that this is cause by the fact there are multiple updates done > to this specific partition. So what do we think can be done to resolve this > ? > > BTW, I am using Cassandra 2.2.1 . And since this is a test , this is just > running on a single node. > > > > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: CQL error when adding multiple conditional update statements in the same batch
I assume you're running Cassandra 2.0? In 2.1.1 the check for "incompatible" conditions was removed (see this comment <https://issues.apache.org/jira/browse/CASSANDRA-6839?focusedCommentId=14097793=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14097793> for details). I wouldn't be surprised if that check didn't work properly for batch statements in 2.0. On Thu, Oct 8, 2015 at 3:22 PM, sai krishnam raju potturi < pskraj...@gmail.com> wrote: > could you also provide the columnfamily schema. > > On Thu, Oct 8, 2015 at 4:13 PM, Peddi, Praveen <pe...@amazon.com> wrote: > >> Hi, >> >> I am trying to understand this error message that CQL is throwing when I >> try to update 2 different rows with different values on same conditional >> columns. Doesn't CQL support that? I am wondering why CQL has this >> restriction (since condition applies to each row independently, why does >> CQL even care if the values of the condition is same or different). >> >> BEGIN BATCH >> UPDATE activities SET state='CLAIMED',version=11 WHERE key='Key1' IF >> version=10; >> UPDATE activities SET state='ALLOCATED',version=2 WHERE key='Key2' IF >> version=1; >> APPLY BATCH; >> >> gives the following error >> >> Bad Request: Duplicate and incompatible conditions for column version >> >> Is there anyway to update more than 1 row with different conditional >> value for each row (other than executing these statements individually)? >> -Praveen >> >> > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: How are writes handled while adding nodes to cluster?
When a node is joining, writes are sent to both the current replicas *and* the joining replica. However, the joining replica does not count towards the consistency level. So, for example, if you write at ConsistencyLevel.TWO, and only one existing replica and the joining replica respond, the write will be considered a failure. On Tue, Oct 6, 2015 at 4:43 AM, Erik Forsberg <forsb...@opera.com> wrote: > Hi! > > How are writes handled while I'm adding a node to a cluster, i.e. while > the new node is in JOINING state? > > Are they queued up as hinted handoffs, or are they being written to the > joining node? > > In the former case I guess I have to make sure my max_hint_window_in_ms > is long enough for the node to become NORMAL or hints will get dropped > and I must do repair. Am I right? > > Thanks, > \EF > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Repair corrupt SSTable from power outage?
>> ... 14 common frames omitted >> >> >> I found some people recommending scrubbing the sstable so I attempted >> that and got the following error: >> >> bin/sstablescrub system sstable_activity -v >> >> >> ERROR 17:26:03 Exiting forcefully due to file system exception on >> startup, disk failure policy "stop" >> org.apache.cassandra.io.sstable.CorruptSSTableException: >> java.io.EOFException >> at >> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> at >> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> at >> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> at >> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> at >> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> at >> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> at >> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> at >> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> at >> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> at >> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> [na:1.8.0_60] >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> [na:1.8.0_60] >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> [na:1.8.0_60] >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> [na:1.8.0_60] >> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] >> Caused by: java.io.EOFException: null >> at >> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) >> ~[na:1.8.0_60] >> at java.io.DataInputStream.readUTF(DataInputStream.java:589) >> ~[na:1.8.0_60] >> at java.io.DataInputStream.readUTF(DataInputStream.java:564) >> ~[na:1.8.0_60] >> at >> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) >> ~[apache-cassandra-2.1.9.jar:2.1.9] >> ... 14 common frames omitted >> >> >> Is there a fix for this? >> > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: JSON Order By
On Thu, Oct 1, 2015 at 9:11 AM, Ashish Soni <asoni.le...@gmail.com> wrote: > I have a below structure stored in cassandra and i would like to get the > internal array sorted by a property when i select it , Please let me know > if there is way to do that . > > I need to sort the rules Array by property ruleOrder when i select > Unfortunately, that's not possible. Cassandra can only order result rows by the clustering columns. The new JSON functionality doesn't change this, it just adds a new input/output format. You'll need to sort the results client-side. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Secondary index is causing high CPU load
See https://issues.apache.org/jira/browse/CASSANDRA-10414 for an overview of why vnodes are currently less efficient for secondary index queries. On Tue, Sep 29, 2015 at 12:45 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Tue, Sep 15, 2015 at 7:44 AM, Tom van den Berge < > tom.vandenbe...@gmail.com> wrote: > >> Read queries on a secondary index are somehow causing an excessively high >> CPU load on all nodes in my DC. >> > ... > >> What really surprised me is that executing a single query on this >> secondary index makes the "Local read count" in the cfstats for the index >> go up with almost 20! When doing the same query on one of my "good" >> nodes, it only increases with a small number, as I would expect. >> >> Could it be that the use of vnodes is causing these problems? >> > > I am not too surprised to hear of this performance degradation. > > Yes, it is relatively likely to be the use of vnodes which is causing this > problem. You could verify by having one of your nodes use 64 vnodes instead > of the default 256... you will get less even distribution with current > vnode random allocation, but you will pay less of a penalty for having > multiple ranges... > > =Rob > > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: who does generate timestamp during the write?
On Sat, Sep 5, 2015 at 8:32 AM, ibrahim El-sanosi <ibrahimsaba...@gmail.com> wrote: > So in this scenario, the latest data that wrote to the replicas is [K1, > V2] which should be the correct one, but it reads [K1,V1] because of divert > clock. > > Can such scenario occur? > Yes, it most certainly can. There are a couple of pieces of advice for this. First, run NTP on all of your servers. Second, if clock drift of a second or so would cause problems for your data model (like your example), change your data model. Usually this means creating separate rows for each version of the value (by adding a timuuid to the primary key, for example), but in some cases lightweight transactions may also be suitable. -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Trace evidence for LOCAL_QUORUM ending up in remote DC
See https://issues.apache.org/jira/browse/CASSANDRA-9753 On Tue, Sep 8, 2015 at 10:22 AM, Tom van den Berge < tom.vandenbe...@gmail.com> wrote: > I've been bugging you a few times, but now I've got trace data for a query > with LOCAL_QUORUM that is being sent to a remove data center. > > The setup is as follows: > NetworkTopologyStrategy: {"DC1":"1","DC2":"2"} > Both DC1 and DC2 have 2 nodes. > In DC2, one node is currently being rebuilt, and therefore does not > contain all data (yet). > > The client app connects to a node in DC1, and sends a SELECT query with CL > LOCAL_QUORUM, which in this case means ((1/2)+1=1. > If all is ok, the query always produces a result, because the requested > rows are guaranteed to be available in DC1. > > However, the query sometimes produces no result. I've been able to record > the traces of these queries, and it turns out that the coordinator node in > DC1 sometimes sends the query to DC2, to the node that is being rebuilt, > and does not have the requested rows. I've included an example trace below. > > The coordinator node is 10.55.156.67, which is in DC1. The 10.88.4.194 node > is in DC2. > I've verified that the CL=LOCAL_QUORUM by printing it when the query is > sent (I'm using the datastax java driver). > > activity > | source | source_elapsed | thread > > ---+--++- >Message received from /10.55.156.67 > | 10.88.4.194 | 48 | MessagingService-Incoming-/10.55.156.67 > Executing single-partition query on aggregate > | 10.88.4.194 |286 | SharedPool-Worker-2 > Acquiring sstable references > | 10.88.4.194 |306 | SharedPool-Worker-2 >Merging memtable tombstones > | 10.88.4.194 |321 | SharedPool-Worker-2 > Partition index lookup allows skipping sstable 107 > | 10.88.4.194 |458 | SharedPool-Worker-2 > Bloom filter allows skipping sstable 1 > | 10.88.4.194 |489 | SharedPool-Worker-2 > Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones > | 10.88.4.194 |496 | SharedPool-Worker-2 > Merging data from memtables and 0 sstables > | 10.88.4.194 |500 | SharedPool-Worker-2 > Read 0 live and 0 tombstone cells > | 10.88.4.194 |513 | SharedPool-Worker-2 >Enqueuing response to /10.55.156.67 > | 10.88.4.194 |613 | SharedPool-Worker-2 > Sending message to /10.55.156.67 > | 10.88.4.194 |672 | MessagingService-Outgoing-/10.55.156.67 > Parsing SELECT * FROM Aggregate WHERE type=? AND typeId=?; > | 10.55.156.67 | 10 | SharedPool-Worker-4 >Sending message to /10.88.4.194 > | 10.55.156.67 | 4335 | MessagingService-Outgoing-/10.88.4.194 > Message received from /10.88.4.194 > | 10.55.156.67 | 6328 | MessagingService-Incoming-/10.88.4.194 >Seeking to partition beginning in data file > | 10.55.156.67 | 10417 | SharedPool-Worker-3 > Key cache hit for sstable 389 > | 10.55.156.67 | 10586 | SharedPool-Worker-3 > > My question is: how is it possible that the query is sent to a node in > DC2? > Since DC1 has 2 nodes and RF 1, the query should always be sent to the > other node in DC1 if the coordinator does not have a replica, right? > > Thanks, > Tom > > > > > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: who does generate timestamp during the write?
Timestamps can come from three different places, in order of precedence from highest to lowest: * The CQL query itself through the "USING TIMESTAMP" clause * The driver (or maybe application) at the protocol level when using the v3 native protocol or higher (which is available in Cassandra 2.1+). This is what I recommend using in most cases, because the driver can safely retry idempotent writes. * The coordinator node On Fri, Sep 4, 2015 at 1:06 PM, Andrey Ilinykh <ailin...@gmail.com> wrote: > I meant thrift based api. If we are talking about CQL then timestamps are > generated by node you are connected to. This is a "client". > > On Fri, Sep 4, 2015 at 10:49 AM, ibrahim El-sanosi < > ibrahimsaba...@gmail.com> wrote: > >> Hi Andrey, >> >> I just came across this articale " >> >> "Each cell in a CQL table has a corresponding timestamp >> which is taken from the clock on *the Cassandra node* *that orchestrates the >> write.* When you are reading from a Cassandra cluster the node that >> coordinates the read will compare the timestamps of the values it fetches. >> Last write(=highest timestamp) wins and will be returned to the client." >> >> What do you think? >> >> " >> >> On Fri, Sep 4, 2015 at 6:41 PM, Andrey Ilinykh <ailin...@gmail.com> >> wrote: >> >>> Coordinator doesn't generate timestamp, it is generated by client. >>> >>> On Fri, Sep 4, 2015 at 10:37 AM, ibrahim El-sanosi < >>> ibrahimsaba...@gmail.com> wrote: >>> >>>> Ok, why coordinator does generate timesamp, as the write is a part of >>>> Cassandra process after client submit the request to Cassandra? >>>> >>>> On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh <ailin...@gmail.com> >>>> wrote: >>>> >>>>> Your application. >>>>> >>>>> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi < >>>>> ibrahimsaba...@gmail.com> wrote: >>>>> >>>>>> Dear folks, >>>>>> >>>>>> When we hear about the notion of Last-Write-Wins in Cassandra >>>>>> according to timestamp, *who does generate this timestamp during the >>>>>> write, coordinator or each individual replica in which the write is going >>>>>> to be stored?* >>>>>> >>>>>> >>>>>> *Regards,* >>>>>> >>>>>> >>>>>> >>>>>> *Ibrahim* >>>>>> >>>>> >>>>> >>>> >>> >> > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: Order By limitation or bug?
This query would be reasonable to support, so I've opened https://issues.apache.org/jira/browse/CASSANDRA-10271 to fix that. On Thu, Sep 3, 2015 at 7:48 PM, Alec Collier <alec.coll...@macquarie.com> wrote: > You should be able to execute the following > > > > SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY > type, id DESC; > > > > Essentially the order by clause has to specify the clustering columns in > order in full. It doesn’t by default know that you have already essentially > filtered by type. > > > > *Alec Collier* | Workplace Service Design > > Corporate Operations Group - Technology | Macquarie Group Limited £ > > > > *From:* Robert Wille [mailto:rwi...@fold3.com] > *Sent:* Friday, 4 September 2015 7:17 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Order By limitation or bug? > > > > If you only specify the partition key, and none of the clustering columns, > you can order by in either direction: > > > > SELECT data FROM import_file WHERE roll = 1 order by type; > > SELECT data FROM import_file WHERE roll = 1 order by type DESC; > > > > These are both valid. Seems like specifying the prefix of the clustering > columns is just a specialization of an already-supported pattern. > > > > Robert > > > > On Sep 3, 2015, at 2:46 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > > > > Limitation, not bug. The reason ? > > > > On disk, data are sorted by type first, and FOR EACH type value, the data > are sorted by id. > > > > So to do an order by Id, C* will need to perform an in-memory re-ordering, > not sure how bad it is for performance. In any case currently it's not > possible, maybe you should create a JIRA to ask for lifting the limitation. > > > > On Thu, Sep 3, 2015 at 10:27 PM, Robert Wille <rwi...@fold3.com> wrote: > > Given this table: > > > > CREATE TABLE import_file ( > > roll int, > > type text, > > id timeuuid, > > data text, > > PRIMARY KEY ((roll), type, id) > > ) > > > > This should be possible: > > > > SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY id > DESC; > > > > but it results in the following error: > > > > Bad Request: Order by currently only support the ordering of columns > following their declared order in the PRIMARY KEY > > > > I am ordering in the declared order in the primary key. I don’t see why > this shouldn’t be able to be supported. Is this a known limitation or a bug? > > > > In this example, I can get the results I want by omitting the ORDER BY > clause and adding WITH CLUSTERING ORDER BY (id DESC) to the schema. > However, now I can only get descending order. I have to choose either > ascending or descending order. I cannot get both. > > > > Robert > > > > > > > > This email, including any attachments, is confidential. If you are not the > intended recipient, you must not disclose, distribute or use the > information in this email in any way. If you received this email in error, > please notify the sender immediately by return email and delete the > message. Unless expressly stated otherwise, the information in this email > should not be regarded as an offer to sell or as a solicitation of an offer > to buy any financial product or service, an official confirmation of any > transaction, or as an official statement of the entity sending this > message. Neither Macquarie Group Limited, nor any of its subsidiaries, > guarantee the integrity of any emails or attached files and are not > responsible for any changes made to them by any other person. > -- Tyler Hobbs DataStax <http://datastax.com/>
Re: TTLs on tables with *only* primary keys?
You can set the TTL on a row when you create it using an INSERT statement. For example: INSERT INTO mytable (partitionkey, clusteringkey) VALUES (0, 0) USING TTL 100; However, Cassandra doesn't support the ttl() function on primary key columns yet. The ticket to support this is https://issues.apache.org/jira/browse/CASSANDRA-9312. On Tue, Aug 4, 2015 at 9:22 PM, Kevin Burton bur...@spinn3r.com wrote: I have a table which just has primary keys. basically: create table foo ( sequence bigint, signature text, primary key( sequence, signature ) ) I need these to eventually get GCd however it doesn’t seem to work. If I then run: select ttl(sequence) from foo; I get: Cannot use selection function ttl on PRIMARY KEY part sequence … I get the same thing if I do it on the second column .. (signature). And the value doesn’t seem to be TTLd. What’s the best way to proceed here? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts -- Tyler Hobbs DataStax http://datastax.com/
Re: Thrift to cql : mixed static and dynamic columns with secondary index
This schema is something that we're providing a better CQL conversion for in 3.0. The one column you defined will become a static column, meaning there is only one copy of it per partition. The schema will look something like this: CREATE TABLE ref_file ( key text, folder text static, column1 text, value text, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE; The column1 column will hold your dynamic field names, and the value column will hold your dynamic field values. Unfortunately, we probably won't support indexing the static column in 3.0.0, but we should be able to support that pretty soon afterwards. The ticket for that is https://issues.apache.org/jira/browse/CASSANDRA-8103. If you don't want to wait for 3.x, migrating to a table like this is probably your best option: CREATE TABLE ref_file ( key text PRIMARY KEY, folder text, attributes maptext, text ) In this case, the attributes map would hold your dynamic fields. On Thu, Jul 16, 2015 at 4:22 AM, Clement Honore honor...@gmail.com wrote: Hi, I'm trying to migrate from Cassandra 1.1 and Hector to a more up-to-date stack like Cassandra 1.2+ and CQL3. I have read http://www.datastax.com/dev/blog/thrift-to-cql3 https://webmail.one.grp/owa/redir.aspx?C=d70889e7914440b0ad13875bf00770a8URL=http%3a%2f%2fwww.datastax.com%2fdev%2fblog%2fthrift-to-cql3 but my use case adds a complexity which seems not documented : I have a mixed column family with a secondary index. The column family has one explicitly declared column, which is indexed natively. In this column family, I'm also adding columns dynamically : some with predictive names, some with dynamic names. If I try to query this table in cql, I can access only the declared column (as stated in the documentation above). If I change the declaration by removing the explicitly declared column (as explained in the documentation above), I loose the secondary index on it. If I explicitly declare all the columns with an already known name (assuming I accept that I will get plenty of columns with a null value for the lines which don't have those attributes), I still can't manage columns with a dynamic name. And I can't declare a collection as my comparator is UTF8Type. Should I migrate in a new table if I want to keep all the functionalities? This is really a solution I want to avoid. Here is an example representing my actual schema : I have a column family REF_File referencing my files. A file always has a folder. The folder is indexed to easily find my files. A file may have some attributes like name, size, mime . A file may have some comments referenced by a column COM_X where X is the comment ID. Column family creation : Create column family REF_File with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type and column_metadata=[{column_name: folder, validation_class: UTF8Type, index_type: KEYS}]; set REF_File['id1']['folder']=folder1; set REF_File['id1']['name']=file1; set REF_File['id1']['size']=1234; set REF_File['id1']['COM_1']=''; set REF_File['id1']['COM_2']=''; set REF_File['id2']['folder']=folder1; set REF_File['id2']['name']=file2; set REF_File['id2']['mime']='image/jpeg'; set REF_File['id2']['COM_1']=''; Requesting : [default@DUNE_metadonnees] list REF_File; Using default limit of 100 Using default cell limit of 100 --- RowKey: id1 = (name=COM_1, value=, timestamp=1437034903045000) = (name=COM_2, value=, timestamp=1437034911121000) = (name=folder, value=folder1, timestamp=1437034833452000) = (name=name, value=file1, timestamp=1437034851993000) = (name=size, value=1234, timestamp=1437034871356000) --- RowKey: id2 = (name=COM_1, value=, timestamp=1437035169011000) = (name=folder, value=folder1, timestamp=143703506208) = (name=mime, value=image/jpeg, timestamp=1437035145227000) = (name=name, value=file2, timestamp=1437035073596000) Thanks for your help ! -- Tyler Hobbs DataStax http://datastax.com/
Re: Read Consistency
, as it expects to receive data from 2 nodes with RF=3 Scenario 2: Read query is fired and all 3 replicas have different data with different timestamps. Read query will return the data with most recent timestamp and trigger a read repair in the backend . On Tue, Jun 23, 2015 at 10:57 AM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: Hi, Need to validate my understanding.. RF=3 , Read CL = Quorum What would be returned to the client in following scenarios: Scenario 1: Read query is fired for a key, data is found on one node and not found on other two nodes who are responsible for the token corresponding to key. Options: no data is returned OR data from the only node having data is returned? Scenario 2: Read query is fired and all 3 replicas have different data with different timestamps. Options: data with latest timestamp is returned OR something else??? Thanks Anuj Sent from Yahoo Mail on Android https://overview.mail.yahoo.com/mobile/?.src=Android -- Arun -- Arun Senior Hadoop/Cassandra Engineer Cloudwick 2014 Data Impact Award Winner (Cloudera) http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html -- Arun Senior Hadoop/Cassandra Engineer Cloudwick 2014 Data Impact Award Winner (Cloudera) http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html -- Tyler Hobbs DataStax http://datastax.com/
Re: Read Consistency
On Tue, Jun 30, 2015 at 12:27 PM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: Agree Tyler. I think its our application problem. If client returns failed write in spite of retries, application must have a rollback mechanism to make sure old state is restored. Failed write may be because of the fact that CL was not met even though one node successfully wrote.Cassandra wont do cleanup or rollback on one node so you need to do it yourself to make sure that integrity of data is maintained in case strong consistency is a requirement. Right? Correct, if you get a WriteTimeout error, you don't know if any replicas have written the data or not. It's even possible that all replicas wrote the data but didn't respond to the coordinator in time. I suspect most users handle this situation by retrying the write with the same timestamp (which makes the operation idempotent). It's worth noting that if you get an Unavailable response, you are guaranteed that the data has not been written to any replicas, because the coordinator already knew that the replicas were down when it got the response. -- Tyler Hobbs DataStax http://datastax.com/
Re: Inconsistent behavior during read
On Thu, Jun 25, 2015 at 1:00 PM, Robert Coli rc...@eventbrite.com wrote: [1] or read repair set to 100% combined with a full scan of all data... which no one does... And this is only true if full scan means reading every partition individually. Reads of partition ranges (or a range slice, in old Thrift terms) don't do read repair. -- Tyler Hobbs DataStax http://datastax.com/
Re: MarshalException after upgrading to 2.1.6
(UUIDType.java:184) ... 12 more Caused by: java.text.ParseException: Unable to parse the date: currencyCode at org.apache.commons.lang3.time.DateUtils.parseDateWithLeniency(DateUtils.java:336) at org.apache.commons.lang3.time.DateUtils.parseDateStrictly(DateUtils.java:286) at org.apache.cassandra.serializers.TimestampSerializer.dateStringToTimestamp(TimestampSerializer.java:107) ... 13 more Exception encountered during startup: unable to make version 1 UUID from 'currencyCode' -- Tyler Hobbs DataStax http://datastax.com/
Re: Cassandra 2.2, 3.0, and beyond
On Wed, Jun 10, 2015 at 1:43 PM, sean_r_dur...@homedepot.com wrote: With 3.0, what happens to existing Thrift-based tables (with dynamic column names, etc.)? Just like in Cassandra 2.x, they will show up as COMPACT STORAGE tables in a format that CQL can work with. We're making a few adjustments to how the schema is presented in CQL, mostly to better deal with a mixture of defined and undefined column names (mixed static and dynamic). That mostly involves treating defined columns as static. However, the storage format for COMPACT STORAGE tables will not be (significantly) different from normal tables any more. You can read a few details about the new storage format here: https://github.com/pcmanus/cassandra/blob/8099_engine_refactor/guide_8099.md#storage-format-on-disk-and-on-wire -- Tyler Hobbs DataStax http://datastax.com/
Re: TTL and gc_grace_period
On Fri, Jun 5, 2015 at 11:02 AM, Kévin LOVATO klov...@alprema.com wrote: Great, so is there any reason I wouldn't want to set gc_grace_seconds to 0 on an insert once/ttl only column family, since it feels like the best thing to do? Nope, setting gc_grace_seconds to 0 is just fine in your case. -- Tyler Hobbs DataStax http://datastax.com/
Re: TTL and gc_grace_period
On Fri, Jun 5, 2015 at 10:30 AM, Kévin LOVATO klov...@alprema.com wrote: I have a column family with data (metrics) that is never overwritten and only deleted using TTLs, and I am wondering if it would be reasonable to have a very low gc_grace_period (even 0) on that CF. I would like to do that mainly to save space and also to prevent tombstone scanning. Yes, you can safely lower gc_grace_seconds. You would only _not_ want to lower gc_grace_seconds if you did deletes or overwrote cells with a lower TTL. From what I understand of what I could read online, when an expired TTLed column is compacted, it is replaced by a tombstone, so having gc_grace_period would prevent that. Although this would allow the appearance of ghost/zombie columns. The question I'm trying to answer here is the following: Would those ghost columns be able to appear, and if so, would it be a problem, since they would themselves be marked as expired? You don't need to worry about expired data being revived because every node that has a copy of that data will have the same TTL. -- Tyler Hobbs DataStax http://datastax.com/
Re: Coordination of expired TTLs compared to tombstones
On Fri, May 29, 2015 at 1:31 PM, Robert Wille rwi...@fold3.com wrote: I was wondering how that compares to cells with expired TTLs. Does the node get to skip sending data back to the coordinator for an expired TTL? No, it has to send expired cells. Suppose you wrote a cell with no TTL, and then updated it with a TTL. Suppose that node 1 got both writes, but node 2 only got the first one. If you asked for the cell after it expired, and node 1 did not send anything to the coordinator, it seems to me that that could violate consistency levels. Also, read repair could never fix node 2. So, how does that work? That's precisely why they have to be sent to the coordinator. On a related note, do cells with expired TTLs have to wait gc_grace_seconds before they can be compacted out? Yes. It seems to me that if they could get compacted out immediately after expiration, you could get zombie data, just like you can with tombstones. For example, write a cell with no TTL to all replicas, shut down one replica, update the cell with a TTL, compact after the TTL has expired, then bring the other node back up. Voila, the formerly down node has a value that will replicate to the other nodes. Correct, that's why they can't be purged immediately. -- Tyler Hobbs DataStax http://datastax.com/
Re: A few stupid questions...
On Tue, May 26, 2015 at 2:00 PM, Eax Melanhovich m...@eax.me wrote: First. Lets say I have a table (field1, field2, field3, field4), where (field1, field2) is a primary key and field1 is partition key. There is a secondary index for field3 column. Do I right understand that in this case query like: select ... from my_table where field1 = 123 and field3 '...'; ... would be quite efficient, i.e. request would be send only to one node, not the whole cluster? You are correct that it would only query one node (or one set of replicas, if RF 1 and CL 1) due to the partition key being restricted. However, using '' for the operator on the indexed column forces Cassandra to scan the partition instead of using the index, because secondary indexes only support '=' operations. If you care about performance, you're probably better off creating a dedicated table to serve this type of query, as described here: http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling Second. Lets say there is some data that almost never changes but is read all the time. E.g. information about smiles in social network. Or current sessions. In this case would Cassandra cache hot data in memtable? Or such data should be stored somewhere else, i.e. Redis or Couchbase? Memtables are only used for buffering writes, not for caching read data. Cassandra does have several layers of caching though. Frequently read data will end up in the key cache and the OS page cache, making reads quite efficient. Optionally, you can also enable the row cache. Since you're almost never modifying the data, the row cache is actually a decent fit, although I recommend testing it heavily with your use case for stability. The best way to find out if your performance is good enough is to benchmark it with your own usecase. -- Tyler Hobbs DataStax http://datastax.com/
Re: Clarification of property: storage_port
On Fri, May 15, 2015 at 4:17 AM, Magnus Vojbacke magnus.vojba...@digitalroute.com wrote: Function: What protocols and functions is storage_port used for? Am I right to believe that it is used for Gossip? It's used for all internode communication (gossip, requests, etc). And more importantly: It seems to me that storage_port MUST be configured to be the same port for _all_ nodes in a cluster, is this correct? That's correct. -- Tyler Hobbs DataStax http://datastax.com/
Re: Leap sec
This post has some good advice for preparing for the leap second: http://www.datastax.com/dev/blog/preparing-for-the-leap-second On Fri, May 15, 2015 at 12:25 PM, cass savy casss...@gmail.com wrote: Just curious to know on how you are preparing Prod C* clusters for leap sec. What are the workaorund other than upgrading kernel to 3.4+? Are you upgrading clusters to Java 7 or higher on client and C* servers? -- Tyler Hobbs DataStax http://datastax.com/
Re: Caching the PreparedStatement (Java driver)
On Fri, May 15, 2015 at 12:02 PM, Ajay ajay.ga...@gmail.com wrote: But I am also not sure of what happens when a cached prepared statement is executed after cassandra nodes restart. Does the server prepared statements cache is persisted or in memory?. For now, it's just in memory, so they are lost when the node is restarted. If it is in memory, how do we handle stale prepared statement in the cache? If a prepared statement ID is used that Cassandra doesn't recognize (e.g. after a node restart), it responds with a specific error to the driver. When the driver sees this error, it automatically re-prepares the statement against that node using the statement info from its own cache. After the statement has been re-prepared, it attempts to execute the query again. This all happens transparently, so your application will not even be aware of it (aside from an increase in latency). There are plans to persist prepared statements in a system table: https://issues.apache.org/jira/browse/CASSANDRA-8831 -- Tyler Hobbs DataStax http://datastax.com/
Re: CQL 3.x Update ...USING TIMESTAMP...
On Mon, Apr 20, 2015 at 4:02 PM, Sachin Nikam skni...@gmail.com wrote: #1. We have 2 data centers located close by with plans to expand to more data centers which are even further away geographically. #2. How will this impact light weight transactions when there is high level of network contention for cross data center traffic. If you are only expecting updates to a given document from one DC, then you could use LOCAL_SERIAL for the LWT operations. If you can't do that, then LWT are probably not a great option for you. #3. Do you know of any real examples where companies have used light weight transactions in a multi-data center traffic. I don't know who's doing that off the top of my head, but I imagine they're using LOCAL_SERIAL. -- Tyler Hobbs DataStax http://datastax.com/
Re: High latencies for simple queries
To clarify, that's in Cassandra 2.1+. In 2.0 and earlier, we used http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ for cqlsh. On Tue, Mar 31, 2015 at 10:40 AM, Tyler Hobbs ty...@datastax.com wrote: The python driver that we bundle with Cassandra for cqlsh is the normal python driver (https://github.com/datastax/python-driver), although sometimes it's patched for bugfixes or is not an official release. On Sat, Mar 28, 2015 at 5:36 PM, Ben Bromhead b...@instaclustr.com wrote: cqlsh runs on the internal cassandra python drivers: cassandra-pylib and cqlshlib. I would not recommend using them at all (nothing wrong with them, they are just not built with external users in mind). I have never used python-driver in anger so I can't comment on whether it is genuinely slower than the internal C* python driver, but this might be a question for python-driver folk. On 28 March 2015 at 00:34, Artur Siekielski a...@vhex.net wrote: On 03/28/2015 12:13 AM, Ben Bromhead wrote: One other thing to keep in mind / check is that doing these tests locally the cassandra driver will connect using the network stack, whereas postgres supports local connections over a unix domain socket (this is also enabled by default). Unix domain sockets are significantly faster than tcp as you don't have a network stack to traverse. I think any driver using libpq will attempt to use the domain socket when connecting locally. Good catch. I assured that psycopg2 connects through a TCP socket and the numbers increased by about 20%, but it still is an order of magnitude faster than Cassandra. But I'm going to hazard a guess something else is going on with the Cassandra connection as I'm able to get 0.5ms queries locally and that's even with trace turned on. Using python-driver? -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692 -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
Re: High latencies for simple queries
The python driver that we bundle with Cassandra for cqlsh is the normal python driver (https://github.com/datastax/python-driver), although sometimes it's patched for bugfixes or is not an official release. On Sat, Mar 28, 2015 at 5:36 PM, Ben Bromhead b...@instaclustr.com wrote: cqlsh runs on the internal cassandra python drivers: cassandra-pylib and cqlshlib. I would not recommend using them at all (nothing wrong with them, they are just not built with external users in mind). I have never used python-driver in anger so I can't comment on whether it is genuinely slower than the internal C* python driver, but this might be a question for python-driver folk. On 28 March 2015 at 00:34, Artur Siekielski a...@vhex.net wrote: On 03/28/2015 12:13 AM, Ben Bromhead wrote: One other thing to keep in mind / check is that doing these tests locally the cassandra driver will connect using the network stack, whereas postgres supports local connections over a unix domain socket (this is also enabled by default). Unix domain sockets are significantly faster than tcp as you don't have a network stack to traverse. I think any driver using libpq will attempt to use the domain socket when connecting locally. Good catch. I assured that psycopg2 connects through a TCP socket and the numbers increased by about 20%, but it still is an order of magnitude faster than Cassandra. But I'm going to hazard a guess something else is going on with the Cassandra connection as I'm able to get 0.5ms queries locally and that's even with trace turned on. Using python-driver? -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692 -- Tyler Hobbs DataStax http://datastax.com/
Re: High latencies for simple queries
Just to check, are you concerned about minimizing that latency or maximizing throughput? I'll that latency is what you're actually concerned about. A fair amount of that latency is probably happening in the python driver. Although it can easily execute ~8k operations per second (using cpython), in some scenarios it can be difficult to guarantee sub-ms latency for an individual query due to how some of the internals work. In particular, it uses python's Conditions for cross-thread signalling (from the event loop thread to the application thread). Unfortunately, python's Condition implementation includes a loop with a minimum sleep of 1ms if the Condition isn't already set when you start the wait() call. This is why, with a single application thread, you will typically see a minimum of 1ms latency. Another source of similar latencies for the python driver is the Asyncore event loop, which is used when libev isn't available. I would make sure that you can use the LibevConnection class with the driver to avoid this. On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski a...@vhex.net wrote: I'm running Cassandra locally and I see that the execution time for the simplest queries is 1-2 milliseconds. By a simple query I mean either INSERT or SELECT from a small table with short keys. While this number is not high, it's about 10-20 times slower than Postgresql (even if INSERTs are wrapped in transactions). I know that the nature of Cassandra compared to Postgresql is different, but for some scenarios this difference can matter. The question is: is it normal for Cassandra to have a minimum latency of 1 millisecond? I'm using Cassandra 2.1.2, python-driver. -- Tyler Hobbs DataStax http://datastax.com/
Re: High latencies for simple queries
Since you're executing queries sequentially, you may want to look into using callback chaining to avoid the cross-thread signaling that results in the 1ms latencies. Basically, just use session.execute_async() and attach a callback to the returned future that will execute your next query. The callback is executed on the event loop thread. The main downsides to this are that you need to be careful to avoid blocking the event loop thread (including executing session.execute() or prepare()) and you need to ensure that all exceptions raised in the callback are handled by your application code. On Fri, Mar 27, 2015 at 3:11 PM, Artur Siekielski a...@vhex.net wrote: I think that in your example Postgres spends most time on waiting for fsync() to complete. On Linux, for a battery-backed raid controller, it's safe to mount ext4 filesystem with barrier=0 option which improves fsync() performance a lot. I have partitions mounted with this option and I did a test from Python, using psycopg2 driver, and I got the following latencies, in milliseconds: - INSERT without COMMIT: 0.04 - INSERT with COMMIT: 0.12 - SELECT: 0.05 I'm also repeating benchmark runs multiple times (I'm using Python's timeit module). On 03/27/2015 07:58 PM, Ben Bromhead wrote: Latency can be so variable even when testing things locally. I quickly fired up postgres and did the following with psql: ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i)); CREATE TABLE ben=# \timing Timing is on. ben=# INSERT INTO foo VALUES(2, 'yay'); INSERT 0 1 Time: 1.162 ms ben=# INSERT INTO foo VALUES(3, 'yay'); INSERT 0 1 Time: 1.108 ms I then fired up a local copy of Cassandra (2.0.12) cqlsh CREATE KEYSPACE foo WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; cqlsh USE foo; cqlsh:foo CREATE TABLE foo(i int PRIMARY KEY, j text); cqlsh:foo TRACING ON; Now tracing requests. cqlsh:foo INSERT INTO foo (i, j) VALUES (1, 'yay'); -- Tyler Hobbs DataStax http://datastax.com/
Re: Not seeing keyspace in nodetool compactionhistory
What version of Cassandra are you using? Since it sounds like you aren't doing any reads, it could be https://issues.apache.org/jira/browse/CASSANDRA-8635. On Wed, Mar 18, 2015 at 9:37 AM, Ali Akhtar ali.rac...@gmail.com wrote: When I run nodetool compactionhistory , I'm only seeing the system keyspace, and OpsCenter keyspace in the compactions. I only see one mention of my own keyspace, but its only for the smallest table within that keyspace (containing only about 1k rows). My two other tables, containing 1.1m and 100k rows respectively, weren't to be seen. Any reason why that is? I did fill up the data in those two tables within the span of about 4 hours (I ran a script to migrate existing data from legacy rdbms dbs). Could that have something to do with it? I'm using SizeTieredCompactionStrategy for all tables. -- Tyler Hobbs DataStax http://datastax.com/
Re: Not seeing keyspace in nodetool compactionhistory
How many sstables (*-Data.db files) do each of your two tables have? On Wed, Mar 25, 2015 at 2:54 PM, Ali Akhtar ali.rac...@gmail.com wrote: I also just inserted, didn't do any updates. On Thu, Mar 26, 2015 at 12:54 AM, Ali Akhtar ali.rac...@gmail.com wrote: I'm on 2.0.12 I'm not sure if that's issue, since the size isn't growing. The size is about what i'd expect. On Thu, Mar 26, 2015 at 12:44 AM, Tyler Hobbs ty...@datastax.com wrote: What version of Cassandra are you using? Since it sounds like you aren't doing any reads, it could be https://issues.apache.org/jira/browse/CASSANDRA-8635. On Wed, Mar 18, 2015 at 9:37 AM, Ali Akhtar ali.rac...@gmail.com wrote: When I run nodetool compactionhistory , I'm only seeing the system keyspace, and OpsCenter keyspace in the compactions. I only see one mention of my own keyspace, but its only for the smallest table within that keyspace (containing only about 1k rows). My two other tables, containing 1.1m and 100k rows respectively, weren't to be seen. Any reason why that is? I did fill up the data in those two tables within the span of about 4 hours (I ran a script to migrate existing data from legacy rdbms dbs). Could that have something to do with it? I'm using SizeTieredCompactionStrategy for all tables. -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
Re: error in bulk loading
On Tue, Mar 24, 2015 at 5:30 AM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: I need to import a csv file to a table using copy command, but file contains carriage returns which causing me problem in doing so, Is there any way in cassandra to solve this You can surround the field with double-quotes to handle this (or change the quote character with the QUOTE option for COPY). -- Tyler Hobbs DataStax http://datastax.com/
Re: CQL 3.x Update ...USING TIMESTAMP...
clustering key to resolve that with LIMIT 1; also this is for DSE Solr, which wouldn't be able to query a by max b.foo anyway. So when we write to *b*, we also write to *a* with something like UPDATE a USING TIMESTAMP ${b.a_timestamp.toMicros + b.foo} SET max_b_foo = ${b.foo} WHERE id = ${b.a_id} Assuming that we don't run afoul of related antipatterns such as repeatedly overwriting the same value indefinitely, this strikes me as sound if unorthodox practice, as long as conflict resolution in Cassandra isn't broken in some subtle way. We also designed this to be safe from getting write timestamps greatly out of sync with clock time so that non-timestamped operations (especially delete) if done accidentally will still have a reasonable chance of having the expected results. So while it may not be the intended use case for write timestamps, and there are definitely gotchas if you are not careful or misunderstand the consequences, as far as I can see the logic behind it is sound but does rely on correct conflict resolution in Cassandra. I'm curious if I'm missing or misunderstanding something important. On Wed, Mar 11, 2015 at 4:11 PM, Tyler Hobbs ty...@datastax.com wrote: Don't use the version as your timestamp. It's possible, but you'll end up with problems when attempting to overwrite or delete entries. Instead, make the version part of the primary key: CREATE TABLE document_store (document_id bigint, version int, document text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER BY (version desc) That way you don't have to worry about overwriting higher versions with a lower one, and to read the latest version, you only have to do: SELECT * FROM document_store WHERE document_id = ? LIMIT 1; Another option is to use lightweight transactions (i.e. UPDATE ... SET docuement = ?, version = ? WHERE document_id = ? IF version ?), but that's going to make writes much more expensive. On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam skni...@gmail.com wrote: I am planning to use the Update...USING TIMESTAMP... statement to make sure that I do not overwrite fresh data with stale data while having to avoid doing at least LOCAL_QUORUM writes. Here is my table structure. Table=DocumentStore DocumentID (primaryKey, bigint) Document(text) Version(int) If the service receives 2 write requests with Version=1 and Version=2, regardless of the order of arrival, the business requirement is that we end up with Version=2 in the database. Can I use the following CQL Statement? Update DocumentStore using versionValue SET Document=documentValue, Version=versionValue where DocumentID=documentIDValue; Has anybody used something like this? If so was the behavior as expected? Regards Sachin -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
Re: CQL 3.x Update ...USING TIMESTAMP...
Don't use the version as your timestamp. It's possible, but you'll end up with problems when attempting to overwrite or delete entries. Instead, make the version part of the primary key: CREATE TABLE document_store (document_id bigint, version int, document text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER BY (version desc) That way you don't have to worry about overwriting higher versions with a lower one, and to read the latest version, you only have to do: SELECT * FROM document_store WHERE document_id = ? LIMIT 1; Another option is to use lightweight transactions (i.e. UPDATE ... SET docuement = ?, version = ? WHERE document_id = ? IF version ?), but that's going to make writes much more expensive. On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam skni...@gmail.com wrote: I am planning to use the Update...USING TIMESTAMP... statement to make sure that I do not overwrite fresh data with stale data while having to avoid doing at least LOCAL_QUORUM writes. Here is my table structure. Table=DocumentStore DocumentID (primaryKey, bigint) Document(text) Version(int) If the service receives 2 write requests with Version=1 and Version=2, regardless of the order of arrival, the business requirement is that we end up with Version=2 in the database. Can I use the following CQL Statement? Update DocumentStore using versionValue SET Document=documentValue, Version=versionValue where DocumentID=documentIDValue; Has anybody used something like this? If so was the behavior as expected? Regards Sachin -- Tyler Hobbs DataStax http://datastax.com/
Re: sstables remain after compaction
On Tue, Mar 3, 2015 at 3:44 AM, Jason Wee peich...@gmail.com wrote: we are in the midst of upgrading... 1.0.8 - 1.0.12 then to 1.1.0.. then to the latest of 1.1.. then to 1.2 I'm not aware of any good reason to put 1.1.0 in the middle there. I would go straight from 1.0.12 to the latest 1.1.x. -- Tyler Hobbs DataStax http://datastax.com/
Re: Documentation of batch statements
On Tue, Mar 3, 2015 at 2:39 PM, Jonathan Haddad j...@jonhaddad.com wrote: Actually, that's not true either. It's technically possible for a batch to be partially applied in the current implementation, even with logged batches. atomic is used incorrectly here, imo, since more than 2 states can be visible, unapplied applied. That's a matter of isolation, not atomicity. Although, with a long enough gap between partial and full application, the distinction becomes somewhat pedantic, I suppose. -- Tyler Hobbs DataStax http://datastax.com/