Re: paging through an entire table in chunks?
You may be using the async feature http://www.datastax.com/documentation/developer/java-driver/1.0/java-driver/asynchronous_t.html of the java driver. In order to manage complexity related to do several queries I used RxJava, it leverages readability and asynchronicity in a very elegant way (much more than Futures). However you may need to code some code to bridge Rx and the Java driver but it’s worth it. — Brice On Sun, Sep 28, 2014 at 12:57 AM, Kevin Burton bur...@spinn3r.com wrote: Agreed… but I’d like to parallelize it… Eventually I’ll just have too much data to do it on one server… plus, I need suspend/resume and this way if I’m doing like 10MB at a time I’ll be able to suspend / resume as well as track progress. On Sat, Sep 27, 2014 at 2:52 PM, DuyHai Doan doanduy...@gmail.com wrote: Use the java driver and paging feature: http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/Statement.html#setFetchSize(int) 1) Do you SELECT * FROM without any selection 2) Set fetchSize to a sensitive value 3) Execute the query and get an iterator from the ResultSet 4) Iterate On Sat, Sep 27, 2014 at 11:42 PM, Kevin Burton bur...@spinn3r.com wrote: I need a way to do a full table scan across all of our data. Can’t I just use token() for this? This way I could split up our entire keyspace into say 1024 chunks, and then have one activemq task work with range 0, then range 1, etc… that way I can easily just map() my whole table. and since it’s token() I should (generally) read a contiguous range from a given table. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
DSE install interfering with apache Cassandra 2.1.0
Hi All, Just come across this one, I’m at a bit of a loss on how to fix it. A user here did the following steps On a MAC Install Datastax Enterprise (DSE) using the dmg file test he can connect using the DSE cqlsh window Unistall DSE (full uninstall which stops the services) download apache cassandra 2.1.0 unzip change to the non directory run sudo ./cassandra Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin he gets Connection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection AsyncoreConnection(4528514448) 127.0.0.1:9160 (closed) is already closed',)}) This is probably related to http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E but I can’t see why the uninstall of DSE is leaving the apache cassandra release cqlsh unable to attach to the apache cassandra runtime. Ta Andy The University of Dundee is a registered Scottish Charity, No: SC015096
Re: DSE install interfering with apache Cassandra 2.1.0
Please run jps to check which Java services are still running and to make sure if c* is running. Then please check if 9160 port is in use. netstat -nltp | grep 9160 This will confirm what is happening in your case. Sent from my iPhone On 29-Sep-2014, at 7:15 pm, Andrew Cobley a.e.cob...@dundee.ac.uk wrote: Hi All, Just come across this one, I’m at a bit of a loss on how to fix it. A user here did the following steps On a MAC Install Datastax Enterprise (DSE) using the dmg file test he can connect using the DSE cqlsh window Unistall DSE (full uninstall which stops the services) download apache cassandra 2.1.0 unzip change to the non directory run sudo ./cassandra Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin he gets Connection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection AsyncoreConnection(4528514448) 127.0.0.1:9160 (closed) is already closed',)}) This is probably related to http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E but I can’t see why the uninstall of DSE is leaving the apache cassandra release cqlsh unable to attach to the apache cassandra runtime. Ta Andy The University of Dundee is a registered Scottish Charity, No: SC015096
Re: DSE install interfering with apache Cassandra 2.1.0
Without the apache cassandra running I ran jps -l on this machine ,the only result was 338 sun.tool.jps.Jps The Mac didn’t like the netstat command so I ran netstat -atp tcp | grep 9160 no result Also for the native port: netstat-atp tcp | grep 9042 gave no result (command may be wrong) So I ran port scan using the network utility (between 0 and 1). Results as shown: Port Scan has started… Port Scanning host: 127.0.0.1 Open TCP Port: 631ipp Port Scan has completed… Hope this helps. Andy On 29 Sep 2014, at 15:09, Sumod Pawgi spa...@gmail.commailto:spa...@gmail.com wrote: Please run jps to check which Java services are still running and to make sure if c* is running. Then please check if 9160 port is in use. netstat -nltp | grep 9160 This will confirm what is happening in your case. Sent from my iPhone On 29-Sep-2014, at 7:15 pm, Andrew Cobley a.e.cob...@dundee.ac.ukmailto:a.e.cob...@dundee.ac.uk wrote: Hi All, Just come across this one, I’m at a bit of a loss on how to fix it. A user here did the following steps On a MAC Install Datastax Enterprise (DSE) using the dmg file test he can connect using the DSE cqlsh window Unistall DSE (full uninstall which stops the services) download apache cassandra 2.1.0 unzip change to the non directory run sudo ./cassandra Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin he gets Connection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection AsyncoreConnection(4528514448) 127.0.0.1:9160 (closed) is already closed',)}) This is probably related to http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E but I can’t see why the uninstall of DSE is leaving the apache cassandra release cqlsh unable to attach to the apache cassandra runtime. Ta Andy The University of Dundee is a registered Scottish Charity, No: SC015096 The University of Dundee is a registered Scottish Charity, No: SC015096
Cassandra throwing java exceptions for nodetool repair on indexed tables
Hi All, We're running 2 cassandra 2.1 clusters (development and production) and whenever I run a nodetool repair on indexed tables I get an java exception about creating snapshots: Command line: [2014-09-29 11:25:24,945] Repair session 73c0d390-47e4-11e4-ba0f-c7788dc924ec for range (-7298689860784559350,-7297558156602685286] failed with error java.io.IOException: Failed during snapshot creation. [2014-09-29 11:25:24,945] Repair command #5 finished Cassandra log: ERROR [Thread-49681] 2014-09-29 11:25:24,945 StorageService.java:2689 - Repair session 73c0d390-47e4-11e4-ba0f-c7788dc924ec for range (-7298689860784559350,-7297558156602685286] failed with error java.io.IOException: Failed during snapshot creation. java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Failed during snapshot creation. at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_67] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_67] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2680) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.0.jar:2.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_67] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_67] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67] Caused by: java.lang.RuntimeException: java.io.IOException: Failed during snapshot creation. at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.0.jar:2.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_67] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_67] ... 1 common frames omitted Caused by: java.io.IOException: Failed during snapshot creation. at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) ~[apache-cassandra-2.1.0.jar:2.1.0] at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na] ... 3 common frames omitted If I drop the index, the repair returns no error: cqlsh:test drop INDEX user_pass_idx ; root@test:~# nodetool repair test user [2014-09-29 11:27:29,668] Starting repair command #6, repairing 743 ranges for keyspace test (seq=true, full=true) . . [2014-09-29 11:28:38,030] Repair session e6d40e10-47e4-11e4-ba0f-c7788dc924ec for range (-7298689860784559350,-7297558156602685286] finished [2014-09-29 11:28:38,030] Repair command #6 finished The test table: CREATE TABLE test.user ( login text PRIMARY KEY, password text ) create INDEX user_pass_idx on test.user (password) ; Am I doing anything wrong ? Or is this a bug ? I searched but I couldn't find any reference about this error. Thanks in advance for any help. Jero
Re: Cassandra throwing java exceptions for nodetool repair on indexed tables
On Mon, Sep 29, 2014 at 8:35 AM, Jeronimo de A. Barros jeronimo.bar...@gmail.com wrote: We're running 2 cassandra 2.1 clusters (development and production) and whenever I run a nodetool repair on indexed tables I get an java exception about creating snapshots: Don't run 2.1 in production yet if you don't want to deal with bugs like this in production. https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ Am I doing anything wrong ? Or is this a bug ? I searched but I couldn't find any reference about this error. I would file this as a bug in the Cassandra JIRA, especially as it relates to a just released version of the software and seems reproducable. If you do file a JIRA, please let the list know what the URL is. =Rob
Re: Repair taking long time
On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux gene.robich...@match.com wrote: I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in another. Running a repair on a large column family seems to be moving much slower than I expect. Unfortunately, as others have mentioned, the slowness/broken-ness of repair is a long running (groan!) issue and therefore currently expected. At this time, I do not recommend upgrading to 2.1 in production to attempt to fix it. I am also broadly skeptical that it as fixed in 2.1 as all that. Once can increase gc_grace_seconds to 34 days [1] and repair once a month, which should help make repair slightly more tractable. For now you should probably evaluate which of your column families you *absolutely must* repair (because you do DELETE like operations in them, etc.) and only repair those. As an aside, you just lose with vnodes and clusters of the size. I presume you plan to grow over appx 9 nodes per DC, in which case you probably do want vnodes enabled. One note : Looking at nodetool compaction stats it indicates the Validation phase is running that the total bytes is 4.5T (4505336278756). This is the uncompressed size, I'm betting your actual on disk size is closer to 2T? Even though 2.0 has improved performance for nodes with lots of data, 2T per node is still relatively fat for a Cassandra node. =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-5850
Re: simple map / table scans without hadoop?
On Fri, Sep 26, 2014 at 9:08 PM, Kevin Burton bur...@spinn3r.com wrote: I have the requirements to periodically run full tables scans on our data. It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer to do it in Java because I need something mildly trivial. http://wiki.apache.org/cassandra/FAQ#iter_world ? =Rob
Re: Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive
I have run a sysbench file io test on my home PC and office PC. The result is given below. The result shows my office PC (with a SSD) is about 3 times more performant than my home PC (with a sata hard disk). Home PC : gauss:~ sysbench --test=fileio --file-total-size=50G prepare sysbench 0.5: multi-threaded system evaluation benchmark 128 files, 409600Kb each, 51200Mb total Creating files for the test... Extra file open flags: 0 Creating file test_file.0 Creating file test_file.1 Creating file test_file.2 . Creating file test_file.125 Creating file test_file.126 Creating file test_file.127 53687091200 bytes written in 626.30 seconds (81.75 MB/sec). matmsh@gauss:~ sysbench --test=fileio --file-total-size=50G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Random number generator seed is 0 and will be ignored Extra file open flags: 0 128 files, 400Mb each 50Gb total file size Block size 16Kb Number of IO requests: 0 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 100 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random r/w test Threads started! Operations performed: 14521 reads, 9680 writes, 30976 Other = 55177 Total Read 226.89Mb Written 151.25Mb Total transferred 378.14Mb (1.2605Mb/sec) 80.67 Requests/sec executed General statistics: total time: 300.0030s total number of events: 24201 total time taken by event execution: 186.0749s response time: min: 0.00ms avg: 7.69ms max:132.43ms approx. 95 percentile: 19.57ms Threads fairness: events (avg/stddev): 24201./0.00 execution time (avg/stddev): 186.0749/0.00 gauss:~ === Office PC : shing@cauchy:~ sysbench --test=fileio --file-total-size=50G prepare sysbench 0.5: multi-threaded system evaluation benchmark 128 files, 409600Kb each, 51200Mb total Creating files for the test... Extra file open flags: 0 Creating file test_file.0 Creating file test_file.1 Creating file test_file.2 Creating file test_file.3 ...Creating file test_file.122 Creating file test_file.123 Creating file test_file.124 Creating file test_file.125 Creating file test_file.126 Creating file test_file.127 53687091200 bytes written in 175.55 seconds (291.66 MB/sec). cauchy:~ sysbench --test=fileio --file-total-size=50G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Random number generator seed is 0 and will be ignored Extra file open flags: 0 128 files, 400Mb each 50Gb total file size Block size 16Kb Number of IO requests: 0 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 100 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random r/w test Threads started! Operations performed: 43020 reads, 28680 writes, 91723 Other = 163423 Total Read 672.19Mb Written 448.12Mb Total transferred 1.0941Gb (3.7344Mb/sec) 239.00 Requests/sec executed General statistics: total time: 300.0007s total number of events: 71700 total time taken by event execution: 7.5550s response time: min: 0.00ms avg: 0.11ms max: 12.89ms approx. 95 percentile: 0.22ms Threads fairness: events (avg/stddev): 71700./0.00 execution time (avg/stddev): 7.5550/0.00 === Shing On Saturday, 27 September 2014, 10:24, Shing Hing Man mat...@yahoo.com wrote: Hi Kevin, Thanks for the reply ! I do not know the exact brand of SSD in my office PC. But the SSD is only 1 year old, and it is far from full. On both of office PC and home PC, I untared Apache Cassandra 2.1.0 and then run cassandra -f with the default config, then run cassandra-stress Both PCs have Oracle Java 1.7.0_40. I have noticed there are some parameters for SSD in cassandra.yaml, which I have adjusted, but with no improvement. It puzzles me Cassandra on my office PC, with far better hardware, could be 100% slower than my home PC. Shing On Saturday, 27 September 2014, 5:12, Kevin Burton bur...@spinn3r.com wrote: What SSD was it? There are a lot of variability in terms of SSD performance. 1. Is it a new vs old SSD? Old SSDs can become slower if they’re really worn out 2. was the office SSD near capacity holding other data?
Re: Repair taking long time
What is the recommendation on the number of tokens value? I am asking because of the issue with sequential repairs on token range after token range. Rahul Neelakantan On Sep 29, 2014, at 2:29 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux gene.robich...@match.com wrote: I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in another. Running a repair on a large column family seems to be moving much slower than I expect. Unfortunately, as others have mentioned, the slowness/broken-ness of repair is a long running (groan!) issue and therefore currently expected. At this time, I do not recommend upgrading to 2.1 in production to attempt to fix it. I am also broadly skeptical that it as fixed in 2.1 as all that. Once can increase gc_grace_seconds to 34 days [1] and repair once a month, which should help make repair slightly more tractable. For now you should probably evaluate which of your column families you *absolutely must* repair (because you do DELETE like operations in them, etc.) and only repair those. As an aside, you just lose with vnodes and clusters of the size. I presume you plan to grow over appx 9 nodes per DC, in which case you probably do want vnodes enabled. One note : Looking at nodetool compaction stats it indicates the Validation phase is running that the total bytes is 4.5T (4505336278756). This is the uncompressed size, I'm betting your actual on disk size is closer to 2T? Even though 2.0 has improved performance for nodes with lots of data, 2T per node is still relatively fat for a Cassandra node. =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-5850
Re: Repair taking long time
On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli rc...@eventbrite.com wrote: As an aside, you just lose with vnodes and clusters of the size. I presume you plan to grow over appx 9 nodes per DC, in which case you probably do want vnodes enabled. I typically only see discussion on vnodes vs. non-vnodes, but it seems to me that might be more important to discuss the number of vnodes per node. A small cluster having 256 vnodes/node is unwise given some of the sequential operations that are still done. Even if operations were done in parallel, having a 256x increase in parallelization seems an equally bad choice. I've never seen any discussion on how many vnodes per node might be an appropriate answer based a planned cluster size -- does such a thing exist? Ken
Re: Cassandra throwing java exceptions for nodetool repair on indexed tables
Hi again, On Mon, Sep 29, 2014 at 3:16 PM, Robert Coli rc...@eventbrite.com wrote: Don't run 2.1 in production yet if you don't want to deal with bugs like this in production. Well, I got the last stable cassandra... going back to 2.0 then. If you do file a JIRA, please let the list know what the URL is. JIRA filled: https://issues.apache.org/jira/browse/CASSANDRA-8020 Jero
Re: using dynamic cell names in CQL 3
On Thu, Sep 25, 2014 at 6:13 AM, shahab shahab.mok...@gmail.com wrote: It seems that I was not clear in my question, I would like to store values in the column name, for example column.name would be event_name (temperature) and column-content would be the respective value (e.g. 40.5) . And I need to know how the schema should look like in CQL 3 You cannot have dynamic column names, in the exact storage way you are thinking of them, in CQL3. You can have a simple E-A-V scheme which works more or less the same way. It is less storage efficient, but you get the CQL interface. In the opinion of the developers, this was an acceptable tradeoff. In most cases, it probably is. In other cases, I would recommend using thrift and actual dynamic columns... except that I logically presume Thrift will be eventually be deprecated. I am unable to recommend the use of a feature which I believe will eventually be removed from the product. =Rob
Re: A trigger that modifies the current Mutation
On Sat, Sep 27, 2014 at 11:08 PM, Pinak Pani nishant.has.a.quest...@gmail.com wrote: I wanted to create a trigger that alters the current mutation. (ObMetaAside : Dear God... why?) Triggers will probably not survive in their current form. If I was planning to use them for anything, I would comprehensively avail myself of the state of their development... =Rob
Re: using dynamic cell names in CQL 3
Isn’t the correct way to do this in CQL3 to use sets and user defined types (in C* 2.1) ?: create type sensorreading( date timestamp, name text, value int); CREATE TABLE sensordata ( name text, data setfrozen sensorreading, PRIMARY KEY (name) ); insert into keyspace2.sensordata (name, data) values ('1234', {{date:'2012-10-2 12:10',name:'temp',value:4}}); update sensordata set data = data+{{date:'2012-10-2 12:10',name:'humidity',value:30}} where name='1234'; update sensordata set data = data+{{date:'2012-10-2 12:12:30',name:'temp',value:5}} where name='1234'; update sensordata set data = data+{{date:'2012-10-2 12:12:30',name:'humidity',value:31}} where name='1234'; select * from sensordata; Perhaps not what you are after, but may be a start ? Andy On 29 Sep 2014, at 20:56, Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com wrote: On Thu, Sep 25, 2014 at 6:13 AM, shahab shahab.mok...@gmail.commailto:shahab.mok...@gmail.com wrote: It seems that I was not clear in my question, I would like to store values in the column name, for example column.namehttp://column.name/ would be event_name (temperature) and column-content would be the respective value (e.g. 40.5) . And I need to know how the schema should look like in CQL 3 You cannot have dynamic column names, in the exact storage way you are thinking of them, in CQL3. You can have a simple E-A-V scheme which works more or less the same way. It is less storage efficient, but you get the CQL interface. In the opinion of the developers, this was an acceptable tradeoff. In most cases, it probably is. In other cases, I would recommend using thrift and actual dynamic columns... except that I logically presume Thrift will be eventually be deprecated. I am unable to recommend the use of a feature which I believe will eventually be removed from the product. =Rob The University of Dundee is a registered Scottish Charity, No: SC015096
Not-Equals (!=) in Where Clause
Looking through the CQL 3.1 grammar for Cassandra 2.1, I noticed that the not-equals operator (!=) is in the grammar definition, but I can't seem to find any legal way to use it. Is != supported as part of the where clause in Cassandra? Or is it the grammar for some other purpose?
Re: Running out of disk at bootstrap in low-disk situation
On Sat, Sep 20, 2014 at 12:11 AM, Erik Forsberg forsb...@opera.com wrote: I've added all the 15 nodes, with some time inbetween - definitely more than the 2-minute rule. But it seems like compaction is not keeping up with the incoming data. Or at least that's my theory. I personally would not combine vnodes and trying to add more than one node at a time, at this time. I understand that you have a lot of nodes to add, but this is potentially confounding the situation. I conjecture that you are using level compaction. There is in your version a pathological behavior during bootstrap where one ends up doing a lot of compaction. I *think*, but am not sure, that the workaround is to use size tiered compaction during bootstrap. I *believe* that is what the patch upstream effectively does. Probably unthrottling compaction will help, assuming you are not CPU or i/o bound there. #cassandra on freenode is probably a slightly better forum for interactive discusson of detailed operational questions about production environments. =Rob
Re: unreadable partitions
On Sun, Sep 28, 2014 at 3:45 AM, tommaso barbugli tbarbu...@gmail.com wrote: I see some data stored in Cassandra (2.0.7) being not readable from CQL; this affects entire partitions, querying this partitions raise a Java exception: If the SSTable is not corrupt but is not readable via CQL and generates an exception, that sounds like a bug to me. Were I you, I would : 0) look for an existing JIRA 1) file a JIRA on http://issues.apache.org 2) reply to this thread with the URL of that JIRA for future googlers =Rob
Re: Node Joining, Not Streaming
On Wed, Sep 24, 2014 at 11:01 AM, Gene Robichaux gene.robich...@match.com wrote: I just added two nodes, one in DC-A and one in DC-B. The node in DC-A started and immediately started to stream files from its piers. The node in DC-B has been in the JOINING state for nearly 24 hours and I have not seen any streams started. Adding more than one node at a time is not really supported, and you can end up in bad cases. Future versions of Cassandra will Strongly Discourage you from doing this. https://issues.apache.org/jira/browse/CASSANDRA-7069 If I were you, I would : 1) stop the DC-B node's bootstrap by stopping it and wiping its partially bootstrapped state 2) wait for DC-A to finish bootstrapping 3) re-bootstrap DC-B node. =Rob http://twitter.com/rcolidba
Re: Is there harm from having all the nodes in the seed list?
On Tue, Sep 23, 2014 at 10:31 AM, Donald Smith donald.sm...@audiencescience.com wrote: Is there any harm from having all the nodes listed in the seeds list in cassandra.yaml? Yes, seed nodes cannot bootstrap. https://issues.apache.org/jira/browse/CASSANDRA-5836 See comments there for details on how this actually doesn't make any sense. The correct solution is almost certainly to have a dynamic seed provider, which is why DSE and Priam both do that. But in practice it mostly doesn't matter except in the annoying yet common CASSANDRA-5836 case. =Rob
Re: Reading SSTables Potential File Descriptor Leak 1.2.18
On Tue, Sep 23, 2014 at 5:47 PM, Tim Heckman t...@pagerduty.com wrote: As best I could tell, the majority of the file descriptors open were for a single SSTable '.db' file. Looking in the error logs I found quite a few exceptions that looked to have been identical: ... Before opening a JIRA ticket I thought I'd reach out to the list to see if anyone has seen any similar behavior as well as do a bit of source-diving to try and verify that the descriptor is actually leaking. I would (search for, and failing to find one..) open a JIRA, and let the list know its URL. =Rob
timeout for port 7000 on stateful firewall? streaming_socket_timeout_in_ms?
We have a stateful firewallhttp://en.wikipedia.org/wiki/Stateful_firewall between data centers for port 7000 (inter-cluster). How long should the idle timeout be for the connections on the firewall? Similarly what's appropriate for streaming_socket_timeout_in_ms in cassandra.yaml? The default is 0 (no timeout). I presume that streaming_socket_timeout_in_ms refers to streams such as for bootstrapping and rebuilding. Thanks Donald A. Smith | Senior Software Engineer P: 425.201.3900 x 3866 C: (206) 819-5965 F: (646) 443-2333 dona...@audiencescience.commailto:dona...@audiencescience.com [AudienceScience]
Re: Indexes Fragmentation
On Sun, Sep 28, 2014 at 9:49 AM, Arthur Zubarev arthur.zuba...@aol.com wrote: There are 200+ times more updates and 50x inserts than analytical loads. In Cassandra to just be able to query (in CQL) on a column I have to have an index, the question is what tall the fragmentation coming from the frequent updates and inserts has on a CF? Do I also need to manually defrug? You have appeared to have just asked if maintaing indexes which have a high rate of change in a log structured database with immutable data files is likely to be more performant than maintaining them in a database with modify-in-place semantics. No. =Rob
best practice for waiting for schema changes to propagate
Hi all, I often have problems with code that I write that uses the DataStax Java driver to create / modify a keyspace or table and then soon after reads the metadata for the keyspace to verify that whatever changes I made the keyspace or table are complete. As an example, I may create a table called `myTableName` and then very soon after do something like: assert(session .getCluster() .getMetaData() .getKeyspace(myKeyspaceName) .getTable(myTableName) != null) I assume this fails sometimes because the default round-robin load balancing policy for the Java driver will send my create-table request to one node and the metadata read to another, and because it takes some time for the table creation to propagate across all of the nodes in my cluster. What is the best way to deal with this problem? Is there a standard way to wait for schema changes to propagate? Best regards, Clint
Re: Casssandra cluster setup.
On Mon, Sep 22, 2014 at 6:32 AM, Muthu Kumar smk.mu...@gmail.com wrote: I am trying to configure a Cassandra cluster with two nodes. I am new to Cassandra. I am using datastax distribution of Cassandra ( windows). I have installed the same in two nodes and configured it works as a separate instance but not as cluster. As a general statement, help with first time installations of Cassandra are probably best handled interactively on #cassandra on freenode. Posting such a debugging issue to a mailing list carries meaningful risk of Warnocking. [1] =Rob [1] http://en.wikipedia.org/wiki/Warnock's_dilemma
Re: Saving file content to ByteBuffer and to column does not retrieve the same size of data
On Mon, Sep 22, 2014 at 3:50 AM, Carlos Scheidecker nando@gmail.com wrote: I can successfully read a file to a ByteBuffer and then write to a Cassandra blob column. However, when I retrieve the value of the column, the size of the ByteBuffer retrieved is bigger than the original ByteBuffer where the file was read from. Writing to the disk, corrupts the image. Probably don't write binary blobs like images into a database, use a distributed filesystem? https://github.com/mogilefs/ But I agree that this behavior sounds like a bug, I would probably file it as a JIRA on http://issues.apache.org and then tell the list the URL of the JIRA you filed. =Rob