Re: Less frequent flushing with LCS
Nope, they flush every 5 to 10 minutes. On Mon, Mar 2, 2015 at 1:13 PM, Daniel Chia danc...@coursera.org wrote: Do the tables look like they're being flushed every hour? It seems like the setting memtable_flush_after_mins which I believe defaults to 60 could also affect how often your tables are flushed. Thanks, Daniel On Mon, Mar 2, 2015 at 11:49 AM, Dan Kinder dkin...@turnitin.com wrote: I see, thanks for the input. Compression is not enabled at the moment, but I may try increasing that number regardless. Also I don't think in-memory tables would work since the dataset is actually quite large. The pattern is more like a given set of rows will receive many overwriting updates and then not be touched for a while. On Fri, Feb 27, 2015 at 2:27 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Feb 27, 2015 at 2:01 PM, Dan Kinder dkin...@turnitin.com wrote: Theoretically sstable_size_in_mb could be causing it to flush (it's at the default 160MB)... though we are flushing well before we hit 160MB. I have not tried changing this but we don't necessarily want all the sstables to be large anyway, I've always wished that the log message told you *why* the SSTable was being flushed, which of the various bounds prompted the flush. In your case, the size on disk may be under 160MB because compression is enabled. I would start by increasing that size. Datastax DSE has in-memory tables for this use case. =Rob -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com
RDD partitions per executor in Cassandra Spark Connector
Hi all, I didn't find the *issues* button on https://github.com/datastax/spark-cassandra-connector/ so posting here. Any one have an idea why token ranges are grouped into one partition per executor? I expected at least one per core. Any suggestions on how to work around this? Doing a repartition is way to expensive as I just want more partitions for parallelism, not reshuffle ... Thanks in advance! Frens Jan
Re: Should a node that is bootstrapping be receiving writes in addition to the streams it is receiving?
On Mon, Mar 2, 2015 at 1:58 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: I also checked via JMX and all the write counts are zero. Is the node supposed to receive writes during bootstrap? As I understand it, yes. The other funny thing during bootstrap, is that nodetool status shows that the bootsrapping node is Up/Normal (UN), instead of Up/Joining(UJ), is this expected or is it a bug? The bootstrapping node does not even appear in the nodetool status of other nodes. Perhaps this node is not actually bootstrapping because you have configured it as a seed with no other valid seeds listed and so it has started as a cluster of one? =Rob
Re: Less frequent flushing with LCS
Do the tables look like they're being flushed every hour? It seems like the setting memtable_flush_after_mins which I believe defaults to 60 could also affect how often your tables are flushed. Thanks, Daniel On Mon, Mar 2, 2015 at 11:49 AM, Dan Kinder dkin...@turnitin.com wrote: I see, thanks for the input. Compression is not enabled at the moment, but I may try increasing that number regardless. Also I don't think in-memory tables would work since the dataset is actually quite large. The pattern is more like a given set of rows will receive many overwriting updates and then not be touched for a while. On Fri, Feb 27, 2015 at 2:27 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Feb 27, 2015 at 2:01 PM, Dan Kinder dkin...@turnitin.com wrote: Theoretically sstable_size_in_mb could be causing it to flush (it's at the default 160MB)... though we are flushing well before we hit 160MB. I have not tried changing this but we don't necessarily want all the sstables to be large anyway, I've always wished that the log message told you *why* the SSTable was being flushed, which of the various bounds prompted the flush. In your case, the size on disk may be under 160MB because compression is enabled. I would start by increasing that size. Datastax DSE has in-memory tables for this use case. =Rob -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com
Re: Should a node that is bootstrapping be receiving writes in addition to the streams it is receiving?
I'm also facing a similar issue while bootstrapping a replacement node via -Dreplace_address flag. The node is streaming data from neighbors, but cfstats shows 0 counts for all metrics of all CFs in the bootstrapping node: SSTable count: 0 SSTables in each level: [0, 0, 0, 0, 0, 0, 0, 0, 0] Space used (live), bytes: 0 Space used (total), bytes: 0 SSTable Compression Ratio: 0.0 Number of keys (estimate): 0 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 0 Local read count: 0 Local read latency: 0.000 ms Local write count: 0 Local write latency: 0.000 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used, bytes: 0 Compacted partition minimum bytes: 0 Compacted partition maximum bytes: 0 Compacted partition mean bytes: 0 Average live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 I also checked via JMX and all the write counts are zero. Is the node supposed to receive writes during bootstrap? The other funny thing during bootstrap, is that nodetool status shows that the bootsrapping node is Up/Normal (UN), instead of Up/Joining(UJ), is this expected or is it a bug? The bootstrapping node does not even appear in the nodetool status of other nodes. UN X.Y.Z.244 15.9 GB1 3.7% 52fb21e-4621-4533-b201-8c1a7adbe818 rack If I do a nodetool netstats, I see: Mode: JOINING Bootstrap 647d4b30-c11e-11e4-9249-173e73521fb44 Cheers, Paulo On Thu, Oct 16, 2014 at 3:53 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Oct 15, 2014 at 10:07 PM, Peter Haggerty peter.hagge...@librato.com wrote: The node wrote gigs of data to various CFs during the bootstrap so it was clearly writing in some sense and it has the expected behavior after the bootstrap. Is cfstats correct when it reports that there were no writes during a bootstrap? As I understand it : Writes (extra writes, from the perspective of replication factor, f/e a RF=3 cluster has effective RF=4 during bootstrap, but not relevant for consistency purposes until end of bootstrap) occur via the storage protocol during bootstrap, so I would expect to see those reflected in cfstats. I'm relatively confident it is in fact receiving those writes, so your confusion might just be a result of how it's reported? =Rob http://twitter.com/rcolidba -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200
Re: Reboot: Read After Write Inconsistent Even On A One Node Cluster
On Mon, Mar 2, 2015 at 11:44 AM, Dan Kinder dkin...@turnitin.com wrote: I had been having the same problem as in those older post: http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAORswtz+W4Eg2CoYdnEcYYxp9dARWsotaCkyvS5M7+Uo6HT1=a...@mail.gmail.com%3E As I said on that thread : It sounds unreasonable/unexpected to me, if you have a trivial repro case, I would file a JIRA. =Rob
Re: Reboot: Read After Write Inconsistent Even On A One Node Cluster
Yeah I thought that was suspicious too, it's mysterious and fairly consistent. (By the way I had error checking but removed it for email brevity, but thanks for verifying :) ) On Mon, Mar 2, 2015 at 4:13 PM, Peter Sanford psanf...@retailnext.net wrote: Hmm. I was able to reproduce the behavior with your go program on my dev machine (C* 2.0.12). I was hoping it was going to just be an unchecked error from the .Exec() or .Scan(), but that is not the case for me. The fact that the issue seems to happen on loop iteration 10, 100 and 1000 is pretty suspicious. I took a tcpdump to confirm that the gocql was in fact sending the write 100 query and then on the next read Cassandra responded with 99. I'll be interested to see what the result of the jira ticket is. -psanford -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com
Re: Reboot: Read After Write Inconsistent Even On A One Node Cluster
Hmm. I was able to reproduce the behavior with your go program on my dev machine (C* 2.0.12). I was hoping it was going to just be an unchecked error from the .Exec() or .Scan(), but that is not the case for me. The fact that the issue seems to happen on loop iteration 10, 100 and 1000 is pretty suspicious. I took a tcpdump to confirm that the gocql was in fact sending the write 100 query and then on the next read Cassandra responded with 99. I'll be interested to see what the result of the jira ticket is. -psanford
Re: Reboot: Read After Write Inconsistent Even On A One Node Cluster
Done: https://issues.apache.org/jira/browse/CASSANDRA-8892 On Mon, Mar 2, 2015 at 3:26 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Mar 2, 2015 at 11:44 AM, Dan Kinder dkin...@turnitin.com wrote: I had been having the same problem as in those older post: http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAORswtz+W4Eg2CoYdnEcYYxp9dARWsotaCkyvS5M7+Uo6HT1=a...@mail.gmail.com%3E As I said on that thread : It sounds unreasonable/unexpected to me, if you have a trivial repro case, I would file a JIRA. =Rob -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com
Re: Node stuck in joining the ring
I encountered a similar situation that streaming can not finish, not only in joining but in removing a node. My tricky solution is: restart every node in the cluster before you starting the new node. In my experience streaming stucked only shows in the node that have been running many days although I have no idea about the reason. 2015-03-03 2:42 GMT+08:00 Nate McCall n...@thelastpickle.com: Can you verify that casssandra-rackdc.properties and cassandra-topology.properties are the same on the cluster? On Thu, Feb 26, 2015 at 7:52 AM, Batranut Bogdan batra...@yahoo.com wrote: No errors in the system.log file [root@cassa09 cassandra]# grep ERROR system.log [root@cassa09 cassandra]# Nothing. On Thursday, February 26, 2015 1:55 PM, mck m...@apache.org wrote: Any errors in your log file? We saw something similar when bootstrap crashed when rebuilding secondary indexes. See CASSANDRA-8798 ~mck -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com -- Thanks, Phil Yang
Re: using or in select query in cassandra
Hi Rahul, No, you can't do this in a single query. You will need to execute two separate queries if the requirements are on different columns. However, if you'd like to select multiple rows of with restriction on the same column you can do that using the `IN` construct: select * from table where id IN (123,124); See [1] for reference. [1] http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Cheers, Jens On Mon, Mar 2, 2015 at 7:06 AM, Rahul Srivastava srivastava.robi...@gmail.com wrote: Hi I want to make uniqueness for my data so i need to add OR clause in my WHERE clause. ex: select * from table where id =123 OR name ='abc' so in above i want that i get data if my id is 123 or my name is abc . is there any possibility in cassandra to achieve this . -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: how to make unique coloumns in cassandra
Use a RDBMS There is a reason constraints were created and why Cassandra doesn't have it Sent from my iPhone On Mar 2, 2015, at 2:23 AM, Rahul Srivastava srivastava.robi...@gmail.com wrote: but what if i want to fetch the value using on table then this idea might fail On Mon, Mar 2, 2015 at 12:46 PM, Ajaya Agrawal ajku@gmail.com wrote: Make a table for each of the unique keys. For e.g. If primary key for user table is user_id and you want the phone number column to be unique then create another table wherein the primary key is (phone_number, user_id). Before inserting to main table try to insert to this table first with if not exists clause. If it succeeds then go ahead with your insert to the user table. Similarly while deleting a row from the primary table delete the corresponding row in all other tables. The order of insertion to tables matter here otherwise you would end up inducing race conditions. The catch here is, you should not be updating the unique column ever. If you do that you would have to use locks and if there are multiple nodes running your application then you would need a distributed lock. I would suggest not to update the unique columns. In stead force your users to delete the entry and recreate it. If you can't do that you need to evaluate your choice of database. Perhaps a relational database would be better suited to your requirements. Hope this helps! -Ajaya Cheers, Ajaya On Fri, Feb 27, 2015 at 5:26 PM, ROBIN SRIVASTAVA srivastava.robi...@gmail.com wrote: I want to make unique constraint in cassandra . As i want that all the value in my column be unique in my column family ex: name-rahul phone-123 address-abc now i want that in this row no values equal to rahul ,123 and abc get inserted again on searching on datastax i found that i can achieve it by doing query on partition key as IF NOT EXIST ,but not getting the solution for getting all the three values unique means if name- jacob phone-123 address-qwe this should also be not inserted into my database as my phone column has the same value as i have shown with name-rahul.
Re: How to extract all the user id from a single table in Cassandra?
Hi Check, Please avoid double posting on mailing lists. It leads to double work (respect people's time!) and makes it hard for people in the future having the same issue as you to follow discussions and answers. That said, if you have a lot of primary keys select user_id from testkeyspace.user_record; will most definitely timeout. Have a look at `SELECT DISTINCT` at [1]. More importantly, for larger datasets you will also need to split the token space into smaller segments and iteratively select your primary keys. See [2]. [1] http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html [2] http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__paging-through-unordered-results If you are having specific issues with the Java Driver I suggest you ask on that mailing list (only). Cheers, Jens On Sun, Mar 1, 2015 at 6:38 PM, Check Peck comptechge...@gmail.com wrote: Sending again as I didn't got any response on this. Any thoughts? On Fri, Feb 27, 2015 at 8:24 PM, Check Peck comptechge...@gmail.com wrote: I have a Cassandra table like this - create table user_record (user_id text, record_name text, record_value blob, primary key (user_id, record_name)); What is the best way to extract all the user_id from this table? As of now, I cannot change my data model to do this exercise so I need to find a way by which I can extract all the user_id from the above table. I am using Datastax Java driver in my project. Is there any other easy way apart from code to extract all the user_id from the above table through come cqlsh utility and dump it into some file? I am thinking below code might timed out after some time - public class TestCassandra { private Session session = null; private Cluster cluster = null; private static class ConnectionHolder { static final TestCassandra connection = new TestCassandra(); } public static TestCassandra getInstance() { return ConnectionHolder.connection; } private TestCassandra() { Builder builder = Cluster.builder(); builder.addContactPoints(127.0.0.1); PoolingOptions opts = new PoolingOptions(); opts.setCoreConnectionsPerHost(HostDistance.LOCAL, opts.getCoreConnectionsPerHost(HostDistance.LOCAL)); cluster = builder.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withPoolingOptions(opts) .withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy(PI))) .withReconnectionPolicy(new ConstantReconnectionPolicy(100L)) .build(); session = cluster.connect(); } private SetString getRandomUsers() { SetString userList = new HashSetString(); String sql = select user_id from testkeyspace.user_record;; try { SimpleStatement query = new SimpleStatement(sql); query.setConsistencyLevel(ConsistencyLevel.ONE); ResultSet res = session.execute(query); IteratorRow rows = res.iterator(); while (rows.hasNext()) { Row r = rows.next(); String user_id = r.getString(user_id); userList.add(user_id); } } catch (Exception e) { System.out.println(error= + e); } return userList; } } Adding java-driver group and Cassandra group as well to see whether there is any better way to execute this? -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Composite Keys in cassandra 1.2
AFIK it's not possible. The fact you need to query the data by partial row key indicates your data model isn't proper. What are your typical queries on the data? On Sun, Mar 1, 2015 at 7:24 AM, Yulian Oifa oifa.yul...@gmail.com wrote: Hello to all. Lets assume a scenario where key is compound type with 3 types in it ( Long , UTF8, UTF8 ). Each row stores timeuuids as column names and empty values. Is it possible to retreive data by single key part ( for example by long only ) by using java thrift? Best regards Yulian Oifa
Re: Optimal Batch size (Unlogged) for Java driver
I have a column family with 15 columns where there are timestamp, timeuuid, few text fields and rest int fields. If I calculate the size of its column name and it's value and divide 5kb (recommended max size for batch) with the value, I get result as 12. Is it correct?. Am I missing something? Thanks Ajay On 02-Mar-2015 12:13 pm, Ankush Goyal ank...@gmail.com wrote: Hi Ajay, I would suggest, looking at the approximate size of individual elements in the batch, and based on that compute max size (chunk size). Its not really a straightforward calculation, so I would further suggest making that chunk size a runtime parameter that you can tweak and play around with until you reach stable state. On Sunday, March 1, 2015 at 10:06:55 PM UTC-8, Ajay Garga wrote: Hi, I am looking at a way to compute the optimal batch size in the client side similar to the below mentioned bug in the server side (generic as we are exposing REST APIs for Cassandra, the column family and the data are different each request). https://issues.apache.org/jira/browse/CASSANDRA-6487 https://www.google.com/url?q=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-6487sa=Dsntz=1usg=AFQjCNGOSliZnS1idXqTHXIr7aNfEN3mMg How do we compute(approximately using ColumnDefintions or ColumnMetadata) the size of a row of a column family from the client side using Cassandra Java driver? Thanks Ajay To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-user+unsubscr...@lists.datastax.com.
Re: Optimal Batch size (Unlogged) for Java driver
Hi Ankush, We are already using Prepared statement and our case is a time series data as well. Thanks Ajay On 02-Mar-2015 10:00 pm, Ankush Goyal ank...@gmail.com wrote: Ajay, First of all, I would recommend using PreparedStatements, so you only would be sending the variable bound arguments over the wire. Second, I think that 5kb limit for WARN is too restrictive, and you could tune that on cassandra server side. I think if all you have is 15 columns (as long as their values are sanitized and do not go over certain limits), it should be fine to send all of them over at the same time. Chunking is necessary, when you have time-series type data (for writes) OR you might be reading a lot of data via IN query. On Monday, March 2, 2015 at 7:55:18 AM UTC-8, Ajay Garga wrote: I have a column family with 15 columns where there are timestamp, timeuuid, few text fields and rest int fields. If I calculate the size of its column name and it's value and divide 5kb (recommended max size for batch) with the value, I get result as 12. Is it correct?. Am I missing something? Thanks Ajay On 02-Mar-2015 12:13 pm, Ankush Goyal ank...@gmail.com wrote: Hi Ajay, I would suggest, looking at the approximate size of individual elements in the batch, and based on that compute max size (chunk size). Its not really a straightforward calculation, so I would further suggest making that chunk size a runtime parameter that you can tweak and play around with until you reach stable state. On Sunday, March 1, 2015 at 10:06:55 PM UTC-8, Ajay Garga wrote: Hi, I am looking at a way to compute the optimal batch size in the client side similar to the below mentioned bug in the server side (generic as we are exposing REST APIs for Cassandra, the column family and the data are different each request). https://issues.apache.org/jira/browse/CASSANDRA-6487 https://www.google.com/url?q=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-6487sa=Dsntz=1usg=AFQjCNGOSliZnS1idXqTHXIr7aNfEN3mMg How do we compute(approximately using ColumnDefintions or ColumnMetadata) the size of a row of a column family from the client side using Cassandra Java driver? Thanks Ajay To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com. To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-user+unsubscr...@lists.datastax.com.
Re: using or in select query in cassandra
I'd like to add that in() is usually a bad idea. It is convenient, but not really what you want in production. Go with Jens' original suggestion of multiple queries. I recommend reading Ryan Svihla's post on why in() is generally a bad thing: http://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/ On Mon, Mar 2, 2015 at 12:36 AM Jens Rantil jens.ran...@tink.se wrote: Hi Rahul, No, you can't do this in a single query. You will need to execute two separate queries if the requirements are on different columns. However, if you'd like to select multiple rows of with restriction on the same column you can do that using the `IN` construct: select * from table where id IN (123,124); See [1] for reference. [1] http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Cheers, Jens On Mon, Mar 2, 2015 at 7:06 AM, Rahul Srivastava srivastava.robi...@gmail.com wrote: Hi I want to make uniqueness for my data so i need to add OR clause in my WHERE clause. ex: select * from table where id =123 OR name ='abc' so in above i want that i get data if my id is 123 or my name is abc . is there any possibility in cassandra to achieve this . -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Datastax Agent 5.1+ Configuration
I recently attempted to get our cassandra instances talking securely to one another with ssl opscenter communication. We are using DSE 4.6, opscenter 5.1. While a lot of the datastax documentation is fairly good, when it comes to advanced configuration topics or security configuration, I find the docs very lacking. I setup a 3 node cluster with SSL encryption between nodes and PasswordAuthentication turned on. As it being obvious, you need to setup the user/pass in the agent configuration as well. These used to be thrift_user and thrift_pass (or something along those lines) and the ssl was thrift_keystore / thrift_truststore, etc.. In Opscenter 5.1, the system changed from using thrift to the native interface. However there is nothing in the docs about what agent properties do you need to set for the ssl security and authentication. After my dealings with Datastax Support, I thought I would post this here until they update their documentation. Agent configuration (address.yaml) C* connection options *IP addresses Before 5.1, we were using either thrift_rpc_interface (when storing metrics/settings in the same cluster) or storage_thrift_hosts (separate cluster) to determine what IP to use to connect to C*. In 5.1, both options were replaced with hosts, that accepts an array of strings (including an array w/ a single string for the same cluster case) instead of a single string: hosts: [123.234.111.11, 10.1.1.1] C* port storage_thrift_port was removed, thrift_port was supplemented by cassandra_port C* autodiscovery autodiscovery_enabled, autodiscovery_interval, and storage_dc were removed, autodiscovery can’t really be disabled for our java-driver, but we never connect to hosts that are not specified in the agent’s config. Misc thrift_socket_timeout and thrift_conn_timeout were removed. C*/DSE security PLAINTEXT AUTH thrift_user, storage_thrift_user, thift_pass, and storage_thrift_pass were replaced by cassandra_user cassandra_pass ENCRYPTION thrift_ssl_truststore and thrift_ssl_truststore_password were replaced by ssl_keystore and ssl_keystore_password, respectively. thrift_ssl_truststore_type, thrift_max_frame_size were removed. KERBEROS We completely changed the way we setup kerberos (I thought it was doc’d but apparently it wasn’t). We removed everything kerberos-related from the config except for a single option, kerberos_service. When it’s set (to the Kerberos service name) we’re using kerberos. All the configuration takes place in the kerberos.config file. opscenterd cluster configs [cassandra] send_thrift_rpc was renamed to be thrift_rpc [agents] thrift_ssl_truststore and thrift_ssl_truststore_password were renamed to ssl_keystore and ssl_keystore_password, respectively. thrift_ssl_truststore_type was removed. Hopefully this will be helpful for those running the latest opscenter and want a secure setup. Thanks to datastax for the help in this matter.
Re: using or in select query in cassandra
I would also like to add that if you avoid IN and use async queries instead, it is pretty trivial to use a semaphore or some other limiting mechanism to put a ceiling on the amount on concurrent work you are sending to the cluster. If you use a query with an IN clause with a thousand things, you’ll make the cluster look for a thousand records concurrently. If you issue a thousand asyncQueries, and use a limiting mechanism, then you can control how much load you are placing on the server. I built a nice wrapper around the Session object, and one of the things that is built into the wrapper is the ability to limit the number of concurrent async queries. It’s a really nice and simple feature to have. Robert On Mar 2, 2015, at 10:33 AM, Jonathan Haddad j...@jonhaddad.commailto:j...@jonhaddad.com wrote: I'd like to add that in() is usually a bad idea. It is convenient, but not really what you want in production. Go with Jens' original suggestion of multiple queries. I recommend reading Ryan Svihla's post on why in() is generally a bad thing: http://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/ On Mon, Mar 2, 2015 at 12:36 AM Jens Rantil jens.ran...@tink.semailto:jens.ran...@tink.se wrote: Hi Rahul, No, you can't do this in a single query. You will need to execute two separate queries if the requirements are on different columns. However, if you'd like to select multiple rows of with restriction on the same column you can do that using the `IN` construct: select * from table where id IN (123,124); See [1] for reference. [1] http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Cheers, Jens On Mon, Mar 2, 2015 at 7:06 AM, Rahul Srivastava srivastava.robi...@gmail.commailto:srivastava.robi...@gmail.com wrote: Hi I want to make uniqueness for my data so i need to add OR clause in my WHERE clause. ex: select * from table where id =123 OR name ='abc' so in above i want that i get data if my id is 123 or my name is abc . is there any possibility in cassandra to achieve this . -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.semailto:jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.sehttp://www.tink.se/ Facebookhttps://www.facebook.com/#!/tink.se Linkedinhttp://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitterhttps://twitter.com/tink
Re: Running Cassandra on mixed OS
I would really not recommend this. There's enough issues that can come up with a distributed database that can make it hard to pinpoint problems. In an ideal world, every machine would be completely identical. Don't set yourself up for fail. Pin the OS all packages to specific versions. On Mon, Mar 2, 2015 at 6:44 AM sean_r_dur...@homedepot.com wrote: Cassandra 1.2.13+/2.0.12 Have any of you run a single Cassandra cluster on a mix of OS (Red Hat 5 and 6, for example), but with the same JVM? Any issues or concerns? If there are problems, how do you handle OS upgrades? Sean R. Durity -- The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
RE: Running Cassandra on mixed OS
This is not for the long haul, but in order to accomplish an OS upgrade across the cluster, without taking an outage. Sean Durity From: Jonathan Haddad [mailto:j...@jonhaddad.com] Sent: Monday, March 02, 2015 1:15 PM To: user@cassandra.apache.org Subject: Re: Running Cassandra on mixed OS I would really not recommend this. There's enough issues that can come up with a distributed database that can make it hard to pinpoint problems. In an ideal world, every machine would be completely identical. Don't set yourself up for fail. Pin the OS all packages to specific versions. On Mon, Mar 2, 2015 at 6:44 AM sean_r_dur...@homedepot.commailto:sean_r_dur...@homedepot.com wrote: Cassandra 1.2.13+/2.0.12 Have any of you run a single Cassandra cluster on a mix of OS (Red Hat 5 and 6, for example), but with the same JVM? Any issues or concerns? If there are problems, how do you handle OS upgrades? Sean R. Durity The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment. The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: Running Cassandra on mixed OS
On Mon, Mar 2, 2015 at 6:43 AM, sean_r_dur...@homedepot.com wrote: Have any of you run a single Cassandra cluster on a mix of OS (Red Hat 5 and 6, for example), but with the same JVM? Any issues or concerns? If there are problems, how do you handle OS upgrades? If you are running the same version of Cassandra in both cases, you are probably fine. As you point out, one must inevitably upgrade one's OS; I recently went from ubuntu 1004 to 1204 (with associated (1.6/1.7) JVMs) without any problems. But you should of course do any such activity in QA and staging and let it burn in for a while before doing so in prod. =Rob
Re: Node stuck in joining the ring
Can you verify that casssandra-rackdc.properties and cassandra-topology.properties are the same on the cluster? On Thu, Feb 26, 2015 at 7:52 AM, Batranut Bogdan batra...@yahoo.com wrote: No errors in the system.log file [root@cassa09 cassandra]# grep ERROR system.log [root@cassa09 cassandra]# Nothing. On Thursday, February 26, 2015 1:55 PM, mck m...@apache.org wrote: Any errors in your log file? We saw something similar when bootstrap crashed when rebuilding secondary indexes. See CASSANDRA-8798 ~mck -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: sstables remain after compaction
On Sat, Feb 28, 2015 at 5:39 PM, Jason Wee peich...@gmail.com wrote: Hi Rob, sorry for the late response, festive season here. cassandra version is 1.0.8 and thank you, I will read on the READ_STAGE threads. 1.0.8 is pretty seriously old in 2015. I would upgrade to at least 1.2.x (via 1.1.x) ASAP. Your cluster will be much happier, in general. =Rob
set selinux context for cassandra to talk to website
Hey all, Ok I have a website being powered by Cassandra 2.1.3. And I notice if selinux is set to off, the site works beautifully! However as soon as I set selinux to on, I am seeing the following error: Warning: require_once(/McFrazier/PhpBinaryCql/CqlClient.php): failed to open stream: Permission denied in /var/www/jf-ref/includes/classes/class.CQL.php on line 2 Fatal error: require_once(): Failed opening required '/McFrazier/PhpBinaryCql/CqlClient.php' (include_path='.:/php/includes') in /var/www/jf-ref/includes/classes/class.CQL.php on line 2 I'm just wondering how I can get SELinux to allow Cassandra to connect to the web server? Thanks Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: does need to disable 'rpc_keepalive' if 'rpc_max_threads' is get larger?
On Sun, Mar 1, 2015 at 6:40 PM, pprun pprun.dra...@gmail.com wrote: rpc_max_threads is set to 2048 and the 'rpc_server_type' is 'hsha', after 2 days running, observed that there's a high I/O activity and the number of 'RCP thread' grow to '2048' and VisualVm shows most of them is 'waiting'/'sleeping' (color: yellow). I want to know if I set rpc_keepalive to false, disable it, this will help to shrink the idle rpc threads? I remembered java 8 comes with newWorkStealingPool: number of threads may grow and shrink dynamically. What version of Cassandra, and hsha or sync? How many client threads do you have? =Rob
Re: Less frequent flushing with LCS
I see, thanks for the input. Compression is not enabled at the moment, but I may try increasing that number regardless. Also I don't think in-memory tables would work since the dataset is actually quite large. The pattern is more like a given set of rows will receive many overwriting updates and then not be touched for a while. On Fri, Feb 27, 2015 at 2:27 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Feb 27, 2015 at 2:01 PM, Dan Kinder dkin...@turnitin.com wrote: Theoretically sstable_size_in_mb could be causing it to flush (it's at the default 160MB)... though we are flushing well before we hit 160MB. I have not tried changing this but we don't necessarily want all the sstables to be large anyway, I've always wished that the log message told you *why* the SSTable was being flushed, which of the various bounds prompted the flush. In your case, the size on disk may be under 160MB because compression is enabled. I would start by increasing that size. Datastax DSE has in-memory tables for this use case. =Rob -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com
Reboot: Read After Write Inconsistent Even On A One Node Cluster
Hey all, I had been having the same problem as in those older post: http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAORswtz+W4Eg2CoYdnEcYYxp9dARWsotaCkyvS5M7+Uo6HT1=a...@mail.gmail.com%3E To summarize it, on my local box with just one cassandra node I can update and then select the updated row and get an incorrect response. My understanding is this may have to do with not having fine-grained enough timestamp resolution, but regardless I'm wondering: is this actually a bug or is there any way to mitigate it? It causes sporadic failures in our unit tests, and having to Sleep() between tests isn't ideal. At least confirming it's a bug would be nice though. For those interested, here's a little go program that can reproduce the issue. When I run it I typically see: Expected 100 but got: 99 Expected 1000 but got: 999 --- main.go: --- package main import ( fmt github.com/gocql/gocql ) func main() { cf := gocql.NewCluster(localhost) db, _ := cf.CreateSession() // Keyspace ut = update test err := db.Query(`CREATE KEYSPACE IF NOT EXISTS ut WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1 }`).Exec() if err != nil { panic(err.Error()) } err = db.Query(CREATE TABLE IF NOT EXISTS ut.test (key text, val text, PRIMARY KEY(key))).Exec() if err != nil {panic(err.Error()) } err = db.Query(TRUNCATE ut.test).Exec() if err != nil { panic(err.Error()) } err = db.Query(INSERT INTO ut.test (key) VALUES ('foo')).Exec() if err != nil { panic(err.Error()) } for i := 0; i 1; i++ { val := fmt.Sprintf(%d, i) db.Query(UPDATE ut.test SET val = ? WHERE key = 'foo', val).Exec() var result string db.Query(SELECT val FROM ut.test WHERE key = 'foo').Scan(result) if result != val { fmt.Printf(Expected %v but got: %v\n, val, result) } } }
best practices for time-series data with massive amounts of records
Hi all, I am designing an application that will capture time series data where we expect the number of records per user to potentially be extremely high. I am not sure if we will eclipse the max row size of 2B elements, but I assume that we would not want our application to approach that size anyway. If we wanted to put all of the interactions in a single row, then I would make a data model that looks like: CREATE TABLE events ( id text, event_time timestamp, event blob, PRIMARY KEY (id, event_time)) WITH CLUSTERING ORDER BY (event_time DESC); The best practice for breaking up large rows of time series data is, as I understand it, to put part of the time into the partitioning key ( http://planetcassandra.org/getting-started-with-time-series-data-modeling/): CREATE TABLE events ( id text, date text, // Could also use year+month here or year+week or something else event_time timestamp, event blob, PRIMARY KEY ((id, date), event_time)) WITH CLUSTERING ORDER BY (event_time DESC); The downside of this approach is that we can no longer do a simple continuous scan to get all of the events for a given user. Some users may log lots and lots of interactions every day, while others may interact with our application infrequently, so I'd like a quick way to get the most recent interaction for a given user. Has anyone used different approaches for this problem? The only thing I can think of is to use the second table schema described above, but switch to an order-preserving hashing function, and then manually hash the id field. This is essentially what we would do in HBase. Curious if anyone else has any thoughts. Best regards, Clint
RE: sstables remain after compaction
In my experience, you do not want to stay on 1.1 very long. 1.08 was very stable. 1.1 can get bad in a hurry. 1.2 (with many things moved off-heap) is very much better. Sean Durity – Cassandra Admin, Big Data Team From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Monday, March 02, 2015 2:01 PM To: user@cassandra.apache.org Subject: Re: sstables remain after compaction On Sat, Feb 28, 2015 at 5:39 PM, Jason Wee peich...@gmail.commailto:peich...@gmail.com wrote: Hi Rob, sorry for the late response, festive season here. cassandra version is 1.0.8 and thank you, I will read on the READ_STAGE threads. 1.0.8 is pretty seriously old in 2015. I would upgrade to at least 1.2.x (via 1.1.x) ASAP. Your cluster will be much happier, in general. =Rob The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: What are the factors that affect the release time of each minor version?
Hi Phil, Right now there is no explicit scheme for minor releases scheduling. Eventually we just decide that it’s time for a new release - usually when the CHANGES list feels too long - and start the process. what are the duties to release a version? Need to build and eventually publish all the artifacts - that process is semi-automated. Need to run all the unit/distributed/long/duration tests on the tagged sha. Need to go through the whole voting process. All in all about a week. Coincidentally, we’ve been discussing our release policies recently, and minor version releases have been discussed as well. It is likely that we’ll switch to scheduled minor releases soon (every 2/3/4 weeks or when a major bug gets fixed - whatever comes first). -- AY On February 28, 2015 at 2:49:25 AM, Phil Yang (ud1...@gmail.com) wrote: Hi all As a user of Cassandra, sometimes there are some bugs in my cluster and I hope someone can fix them (Of course, if I can fix them myself I'll try to contribute my code :) ). For each bug, there is a JIRA ticket to tracking it and users can know if the bug is fixed. However, there is a lag between this bug being fixed and a new minor version being released. Although we can apply the patch of this ticket to our online version and build a special snapshot to solve the trouble in our clusters or we can use the latest code directly, I think many users still want to use an official release with higher reliability and indeed, more convenience. In addition, updating more frequently can also reduce the trouble causing by unknown bugs. So someone may often ask When the new version with this patch will be released? In my mind, not only the number of issues resolved in each version but also the time interval between two versions is not fixed. So may I know what the factors that affect the release time of each minor version? Furthermore, except a vote in dev@cassandra maillist that I can see, what are the duties to release a version? If it is not a heavy work, could we make each release more frequently? Or we may make a rule to decide if we need release a new version? For example: If the latest version was released two weeks ago, or after the latest version we have already resolved 20 issues, we should release a new minor version. -- Thanks, Phil Yang
Re: how to make unique coloumns in cassandra
Please be clear on questions and spend some time on writing questions so that other people know what you are trying to ask. I can't read your mind. :) Back to your question: Assuming that you need to search based on the values of the unique column then invert the index on auxiliary table. So instead of (phone_number, user_id) index you would have to have (user_id, phone_number) as index. Then do a query on the auxiliary table and then on user table if you want other columns. You can replicate other columns in the auxiliary table also to avoid multiple queries. Cheers, Ajaya On Mon, Mar 2, 2015 at 12:53 PM, Rahul Srivastava srivastava.robi...@gmail.com wrote: but what if i want to fetch the value using on table then this idea might fail On Mon, Mar 2, 2015 at 12:46 PM, Ajaya Agrawal ajku@gmail.com wrote: Make a table for each of the unique keys. For e.g. If primary key for user table is user_id and you want the phone number column to be unique then create another table wherein the primary key is (phone_number, user_id). Before inserting to main table try to insert to this table first with if not exists clause. If it succeeds then go ahead with your insert to the user table. Similarly while deleting a row from the primary table delete the corresponding row in all other tables. The order of insertion to tables matter here otherwise you would end up inducing race conditions. The catch here is, you should not be updating the unique column ever. If you do that you would have to use locks and if there are multiple nodes running your application then you would need a distributed lock. I would suggest not to update the unique columns. In stead force your users to delete the entry and recreate it. If you can't do that you need to evaluate your choice of database. Perhaps a relational database would be better suited to your requirements. Hope this helps! -Ajaya Cheers, Ajaya On Fri, Feb 27, 2015 at 5:26 PM, ROBIN SRIVASTAVA srivastava.robi...@gmail.com wrote: I want to make unique constraint in cassandra . As i want that all the value in my column be unique in my column family ex: name-rahul phone-123 address-abc now i want that in this row no values equal to rahul ,123 and abc get inserted again on searching on datastax i found that i can achieve it by doing query on partition key as IF NOT EXIST ,but not getting the solution for getting all the three values unique means if name- jacob phone-123 address-qwe this should also be not inserted into my database as my phone column has the same value as i have shown with name-rahul.