Re: paging through an entire table in chunks?

2014-09-29 Thread Brice Dutheil
You may be using the async feature
http://www.datastax.com/documentation/developer/java-driver/1.0/java-driver/asynchronous_t.html
of the java driver. In order to manage complexity related to do several
queries I used RxJava, it leverages readability and asynchronicity in a
very elegant way (much more than Futures). However you may need to code
some code to bridge Rx and the Java driver but it’s worth it.

— Brice

On Sun, Sep 28, 2014 at 12:57 AM, Kevin Burton bur...@spinn3r.com wrote:

Agreed… but I’d like to parallelize it… Eventually I’ll just have too much
 data to do it on one server… plus, I need suspend/resume and this way if
 I’m doing like 10MB at a time I’ll be able to suspend / resume as well as
 track progress.

 On Sat, Sep 27, 2014 at 2:52 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Use the java driver and paging feature:
 http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/Statement.html#setFetchSize(int)

 1) Do you SELECT * FROM without any selection
 2) Set fetchSize to a sensitive value
 3) Execute the query and get an iterator from the ResultSet
 4) Iterate



 On Sat, Sep 27, 2014 at 11:42 PM, Kevin Burton bur...@spinn3r.com
 wrote:

 I need a way to do a full table scan across all of our data.

 Can’t I just use token() for this?

 This way I could split up our entire keyspace into say 1024 chunks, and
 then have one activemq task work with range 0, then range 1, etc… that way
 I can easily just map() my whole table.

 and since it’s token() I should (generally) read a contiguous range from
 a given table.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com

  ​


DSE install interfering with apache Cassandra 2.1.0

2014-09-29 Thread Andrew Cobley
Hi All,

Just come across this one, I’m at a bit of a loss on how to fix it.

A user here did the following steps

On a MAC
Install Datastax Enterprise (DSE) using the dmg file
test he can connect using the DSE cqlsh window
Unistall DSE (full uninstall which stops the services)

download apache cassandra 2.1.0
unzip
change to the non directory run sudo ./cassandra

Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin   he 
gets

Connection error: ('Unable to connect to any servers', {'127.0.0.1': 
ConnectionShutdown('Connection AsyncoreConnection(4528514448) 127.0.0.1:9160 
(closed) is already closed',)})

This is probably related to
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E

but I can’t see why the uninstall of DSE is leaving the apache cassandra 
release cqlsh unable to attach to the apache cassandra runtime.

Ta
Andy



The University of Dundee is a registered Scottish Charity, No: SC015096


Re: DSE install interfering with apache Cassandra 2.1.0

2014-09-29 Thread Sumod Pawgi
Please run jps to check which Java services are still running and to make sure 
if c* is running. Then please check if 9160 port is in use. netstat -nltp | 
grep 9160

This will confirm what is happening in your case.

Sent from my iPhone

 On 29-Sep-2014, at 7:15 pm, Andrew Cobley a.e.cob...@dundee.ac.uk wrote:
 
 Hi All,
 
 Just come across this one, I’m at a bit of a loss on how to fix it.
 
 A user here did the following steps
 
 On a MAC
 Install Datastax Enterprise (DSE) using the dmg file
 test he can connect using the DSE cqlsh window
 Unistall DSE (full uninstall which stops the services)
 
 download apache cassandra 2.1.0
 unzip
 change to the non directory run sudo ./cassandra
 
 Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin   he 
 gets
 
 Connection error: ('Unable to connect to any servers', {'127.0.0.1': 
 ConnectionShutdown('Connection AsyncoreConnection(4528514448) 127.0.0.1:9160 
 (closed) is already closed',)})
 
 This is probably related to 
 http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E
 
 but I can’t see why the uninstall of DSE is leaving the apache cassandra 
 release cqlsh unable to attach to the apache cassandra runtime.
 
 Ta
 Andy
 
 
 
 The University of Dundee is a registered Scottish Charity, No: SC015096


Re: DSE install interfering with apache Cassandra 2.1.0

2014-09-29 Thread Andrew Cobley
Without the apache cassandra running I ran jps -l on this machine ,the only 
result was

338 sun.tool.jps.Jps

The Mac didn’t like the netstat command so I ran

netstat -atp tcp |  grep 9160

no result

Also  for the native port:

netstat-atp tcp | grep 9042

gave no result (command may be wrong)

So I ran port scan using the network utility (between 0 and 1).  Results as 
shown:


Port Scan has started…

Port Scanning host: 127.0.0.1

 Open TCP Port: 631ipp
Port Scan has completed…


Hope this helps.

Andy


On 29 Sep 2014, at 15:09, Sumod Pawgi 
spa...@gmail.commailto:spa...@gmail.com wrote:

Please run jps to check which Java services are still running and to make sure 
if c* is running. Then please check if 9160 port is in use. netstat -nltp | 
grep 9160

This will confirm what is happening in your case.

Sent from my iPhone

On 29-Sep-2014, at 7:15 pm, Andrew Cobley 
a.e.cob...@dundee.ac.ukmailto:a.e.cob...@dundee.ac.uk wrote:

Hi All,

Just come across this one, I’m at a bit of a loss on how to fix it.

A user here did the following steps

On a MAC
Install Datastax Enterprise (DSE) using the dmg file
test he can connect using the DSE cqlsh window
Unistall DSE (full uninstall which stops the services)

download apache cassandra 2.1.0
unzip
change to the non directory run sudo ./cassandra

Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin   he 
gets

Connection error: ('Unable to connect to any servers', {'127.0.0.1': 
ConnectionShutdown('Connection AsyncoreConnection(4528514448) 127.0.0.1:9160 
(closed) is already closed',)})

This is probably related to
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E

but I can’t see why the uninstall of DSE is leaving the apache cassandra 
release cqlsh unable to attach to the apache cassandra runtime.

Ta
Andy



The University of Dundee is a registered Scottish Charity, No: SC015096


The University of Dundee is a registered Scottish Charity, No: SC015096


Cassandra throwing java exceptions for nodetool repair on indexed tables

2014-09-29 Thread Jeronimo de A. Barros
Hi All,

We're running 2 cassandra 2.1 clusters (development and production) and
whenever I run a nodetool repair on indexed tables I get an java exception
about creating snapshots:

Command line:

[2014-09-29 11:25:24,945] Repair session
73c0d390-47e4-11e4-ba0f-c7788dc924ec for range
(-7298689860784559350,-7297558156602685286] failed with error
java.io.IOException: Failed during snapshot creation.
[2014-09-29 11:25:24,945] Repair command #5 finished

Cassandra log:

ERROR [Thread-49681] 2014-09-29 11:25:24,945 StorageService.java:2689 -
Repair session 73c0d390-47e4-11e4-ba0f-c7788dc924ec for range
(-7298689860784559350,-7297558156602685286] failed with error
java.io.IOException: Failed during snapshot creation.
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.io.IOException: Failed during snapshot creation.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
[na:1.7.0_67]
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
[na:1.7.0_67]
at
org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2680)
~[apache-cassandra-2.1.0.jar:2.1.0]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
[apache-cassandra-2.1.0.jar:2.1.0]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
[na:1.7.0_67]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[na:1.7.0_67]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
Caused by: java.lang.RuntimeException: java.io.IOException: Failed during
snapshot creation.
at com.google.common.base.Throwables.propagate(Throwables.java:160)
~[guava-16.0.jar:na]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
[apache-cassandra-2.1.0.jar:2.1.0]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
[na:1.7.0_67]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[na:1.7.0_67]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_67]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
~[na:1.7.0_67]
... 1 common frames omitted
Caused by: java.io.IOException: Failed during snapshot creation.
at
org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
~[apache-cassandra-2.1.0.jar:2.1.0]
at
org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128)
~[apache-cassandra-2.1.0.jar:2.1.0]
at
com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
~[guava-16.0.jar:na]
... 3 common frames omitted

If I drop the index, the repair returns no error:

cqlsh:test drop INDEX user_pass_idx ;

root@test:~# nodetool repair test user
[2014-09-29 11:27:29,668] Starting repair command #6, repairing 743 ranges
for keyspace test (seq=true, full=true)
.
.
[2014-09-29 11:28:38,030] Repair session
e6d40e10-47e4-11e4-ba0f-c7788dc924ec for range
(-7298689860784559350,-7297558156602685286] finished
[2014-09-29 11:28:38,030] Repair command #6 finished

The test table:

CREATE TABLE test.user (
login text PRIMARY KEY,
password text
)
create INDEX user_pass_idx on test.user (password) ;

Am I doing anything wrong ? Or is this a bug ? I searched but I couldn't
find any reference about this error.

Thanks in advance for any help.

Jero


Re: Cassandra throwing java exceptions for nodetool repair on indexed tables

2014-09-29 Thread Robert Coli
On Mon, Sep 29, 2014 at 8:35 AM, Jeronimo de A. Barros 
jeronimo.bar...@gmail.com wrote:

 We're running 2 cassandra 2.1 clusters (development and production) and
 whenever I run a nodetool repair on indexed tables I get an java exception
 about creating snapshots:


Don't run 2.1 in production yet if you don't want to deal with bugs like
this in production.

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

Am I doing anything wrong ? Or is this a bug ? I searched but I couldn't
 find any reference about this error.


I would file this as a bug in the Cassandra JIRA, especially as it relates
to a just released version of the software and seems reproducable.

If you do file a JIRA, please let the list know what the URL is.

=Rob


Re: Repair taking long time

2014-09-29 Thread Robert Coli
On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux gene.robich...@match.com
wrote:

  I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and
 4 in another.



 Running a repair on a large column family seems to be moving much slower
 than I expect.


Unfortunately, as others have mentioned, the slowness/broken-ness of repair
is a long running (groan!) issue and therefore currently expected.

At this time, I do not recommend upgrading to 2.1 in production to attempt
to fix it. I am also broadly skeptical that it as fixed in 2.1 as all that.

Once can increase gc_grace_seconds to 34 days [1] and repair once a month,
which should help make repair slightly more tractable.

For now you should probably evaluate which of your column families you
*absolutely must* repair (because you do DELETE like operations in them,
etc.) and only repair those.

As an aside, you just lose with vnodes and clusters of the size. I
presume you plan to grow over appx 9 nodes per DC, in which case you
probably do want vnodes enabled.

One note :

  Looking at nodetool compaction stats it indicates the Validation phase
 is running that the total bytes is 4.5T (4505336278756).


This is the uncompressed size, I'm betting your actual on disk size is
closer to 2T? Even though 2.0 has improved performance for nodes with lots
of data, 2T per node is still relatively fat for a Cassandra node.


=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-5850


Re: simple map / table scans without hadoop?

2014-09-29 Thread Robert Coli
On Fri, Sep 26, 2014 at 9:08 PM, Kevin Burton bur...@spinn3r.com wrote:

 I have the requirements to periodically run full tables scans on our
 data.  It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer
 to do it in Java because I need something mildly trivial.


http://wiki.apache.org/cassandra/FAQ#iter_world

?

=Rob


Re: Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive

2014-09-29 Thread Shing Hing Man
I have run a sysbench  file io test on my home PC and office PC. The result is  
given below. The result shows my office PC (with a SSD) is about 3 times more 
performant than my home PC (with a sata hard disk).

Home PC :

gauss:~ sysbench --test=fileio --file-total-size=50G prepare
sysbench 0.5:  multi-threaded system evaluation benchmark

128 files, 409600Kb each, 51200Mb total
Creating files for the test...
Extra file open flags: 0
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
.
Creating file test_file.125
Creating file test_file.126
Creating file test_file.127
53687091200 bytes written in 626.30 seconds (81.75 MB/sec).
matmsh@gauss:~ sysbench --test=fileio --file-total-size=50G 
--file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored


Extra file open flags: 0
128 files, 400Mb each
50Gb total file size
Block size 16Kb
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!

Operations performed:  14521 reads, 9680 writes, 30976 Other = 55177 Total
Read 226.89Mb  Written 151.25Mb  Total transferred 378.14Mb  (1.2605Mb/sec)
   80.67 Requests/sec executed

General statistics:
total time:  300.0030s
total number of events:  24201
total time taken by event execution: 186.0749s
response time:
 min:  0.00ms
 avg:  7.69ms
 max:132.43ms
 approx.  95 percentile:  19.57ms

Threads fairness:
events (avg/stddev):   24201./0.00
execution time (avg/stddev):   186.0749/0.00

gauss:~ 
===
Office PC :
shing@cauchy:~ sysbench --test=fileio --file-total-size=50G prepare
sysbench 0.5:  multi-threaded system evaluation benchmark

128 files, 409600Kb each, 51200Mb total
Creating files for the test...
Extra file open flags: 0
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
Creating file test_file.3
...Creating file test_file.122
Creating file test_file.123
Creating file test_file.124
Creating file test_file.125
Creating file test_file.126
Creating file test_file.127
53687091200 bytes written in 175.55 seconds (291.66 MB/sec).
cauchy:~ sysbench --test=fileio --file-total-size=50G --file-test-mode=rndrw 
--init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored

Extra file open flags: 0
128 files, 400Mb each
50Gb total file size
Block size 16Kb
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!

Operations performed:  43020 reads, 28680 writes, 91723 Other = 163423 Total
Read 672.19Mb  Written 448.12Mb  Total transferred 1.0941Gb  (3.7344Mb/sec)
  239.00 Requests/sec executed

General statistics:
total time:  300.0007s
total number of events:  71700
total time taken by event execution: 7.5550s
response time:
 min:  0.00ms
 avg:  0.11ms
 max: 12.89ms
 approx.  95 percentile:   0.22ms

Threads fairness:
events (avg/stddev):   71700./0.00
execution time (avg/stddev):   7.5550/0.00
===



Shing



On Saturday, 27 September 2014, 10:24, Shing Hing Man mat...@yahoo.com wrote:
 


Hi Kevin,
   Thanks for the reply !
I do not know the exact brand of SSD in my office PC. But the SSD is  only 1 
year old,  and it is far from full. 

On both of office PC and home PC, I untared Apache Cassandra 2.1.0 and then 

run cassandra -f  with the default config,   then

run cassandra-stress 

Both PCs  have Oracle Java 1.7.0_40.

I have noticed there are some parameters for SSD in cassandra.yaml, which I 
have adjusted, but with no improvement. 


It  puzzles me Cassandra on  my office PC, with far better hardware,  could be 
100% slower than my home PC. 



Shing







On Saturday, 27 September 2014, 5:12, Kevin Burton bur...@spinn3r.com wrote:
 


What SSD was it?  There are a lot of variability in terms of SSD performance.

1.  Is it a new vs old SSD?  Old SSDs can become slower if they’re really worn 
out

2.  was the office SSD near capacity holding other data?


Re: Repair taking long time

2014-09-29 Thread Rahul Neelakantan
What is the recommendation on the number of tokens value? I am asking because 
of the issue with sequential repairs on token range after token range.

Rahul Neelakantan

 On Sep 29, 2014, at 2:29 PM, Robert Coli rc...@eventbrite.com wrote:
 
 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux gene.robich...@match.com 
 wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in 
 another.
 
  
 
 Running a repair on a large column family seems to be moving much slower 
 than I expect.
 
 
 Unfortunately, as others have mentioned, the slowness/broken-ness of repair 
 is a long running (groan!) issue and therefore currently expected. 
 
 At this time, I do not recommend upgrading to 2.1 in production to attempt to 
 fix it. I am also broadly skeptical that it as fixed in 2.1 as all that.
 
 Once can increase gc_grace_seconds to 34 days [1] and repair once a month, 
 which should help make repair slightly more tractable.
 
 For now you should probably evaluate which of your column families you 
 *absolutely must* repair (because you do DELETE like operations in them, 
 etc.) and only repair those.
 
 As an aside, you just lose with vnodes and clusters of the size. I presume 
 you plan to grow over appx 9 nodes per DC, in which case you probably do want 
 vnodes enabled.
 
 One note :
  Looking at nodetool compaction stats it indicates the Validation phase is 
 running that the total bytes is 4.5T (4505336278756).
 
 This is the uncompressed size, I'm betting your actual on disk size is closer 
 to 2T? Even though 2.0 has improved performance for nodes with lots of data, 
 2T per node is still relatively fat for a Cassandra node.
 
 
 =Rob
 [1] https://issues.apache.org/jira/browse/CASSANDRA-5850


Re: Repair taking long time

2014-09-29 Thread Ken Hancock
On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli rc...@eventbrite.com wrote:


 As an aside, you just lose with vnodes and clusters of the size. I
 presume you plan to grow over appx 9 nodes per DC, in which case you
 probably do want vnodes enabled.


I typically only see discussion on vnodes vs. non-vnodes, but it seems to
me that might be more important to discuss the number of vnodes per node.
A small cluster having 256 vnodes/node is unwise given some of the
sequential operations that are still done.  Even if operations were done in
parallel, having a 256x increase in parallelization seems an equally bad
choice.

I've never seen any discussion on how many vnodes per node might be an
appropriate answer based a planned cluster size -- does such a thing exist?

Ken


Re: Cassandra throwing java exceptions for nodetool repair on indexed tables

2014-09-29 Thread Jeronimo de A. Barros
Hi again,

On Mon, Sep 29, 2014 at 3:16 PM, Robert Coli rc...@eventbrite.com wrote:

 Don't run 2.1 in production yet if you don't want to deal with bugs like
 this in production.


Well, I got the last stable cassandra... going back to 2.0 then.


 If you do file a JIRA, please let the list know what the URL is.


JIRA filled:  https://issues.apache.org/jira/browse/CASSANDRA-8020

Jero


Re: using dynamic cell names in CQL 3

2014-09-29 Thread Robert Coli
On Thu, Sep 25, 2014 at 6:13 AM, shahab shahab.mok...@gmail.com wrote:

 It seems that I was not clear in my question, I would like to store values
 in the column name, for example column.name would be event_name
 (temperature) and column-content would be the respective value (e.g.
 40.5) . And I need to know how the schema should look like in CQL 3


You cannot have dynamic column names, in the exact storage way you are
thinking of them, in CQL3.

You can have a simple E-A-V scheme which works more or less the same way.
It is less storage efficient, but you get the CQL interface. In the opinion
of the developers, this was an acceptable tradeoff. In most cases, it
probably is.

In other cases, I would recommend using thrift and actual dynamic
columns... except that I logically presume Thrift will be eventually be
deprecated. I am unable to recommend the use of a feature which I believe
will eventually be removed from the product.

=Rob


Re: A trigger that modifies the current Mutation

2014-09-29 Thread Robert Coli
On Sat, Sep 27, 2014 at 11:08 PM, Pinak Pani 
nishant.has.a.quest...@gmail.com wrote:

 I wanted to create a trigger that alters the current mutation.


(ObMetaAside : Dear God... why?)

Triggers will probably not survive in their current form. If I was planning
to use them for anything, I would comprehensively avail myself of the state
of their development...

=Rob


Re: using dynamic cell names in CQL 3

2014-09-29 Thread Andrew Cobley
Isn’t the correct way to do this in CQL3 to use sets and user defined types (in 
C* 2.1) ?:

create type sensorreading( date timestamp, name text, value int);
CREATE TABLE sensordata (
name text,
data setfrozen sensorreading,
PRIMARY KEY (name)
);

insert into keyspace2.sensordata (name, data) values ('1234', {{date:'2012-10-2 
12:10',name:'temp',value:4}});
update sensordata set data = data+{{date:'2012-10-2 
12:10',name:'humidity',value:30}} where name='1234';
update sensordata set data = data+{{date:'2012-10-2 
12:12:30',name:'temp',value:5}} where name='1234';
update sensordata set data = data+{{date:'2012-10-2 
12:12:30',name:'humidity',value:31}} where name='1234';

select * from sensordata;


Perhaps not what you are after, but may be a start ?

Andy



On 29 Sep 2014, at 20:56, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com wrote:

On Thu, Sep 25, 2014 at 6:13 AM, shahab 
shahab.mok...@gmail.commailto:shahab.mok...@gmail.com wrote:
It seems that I was not clear in my question, I would like to store values in 
the column name, for example column.namehttp://column.name/ would be 
event_name (temperature) and column-content would be the respective value 
(e.g. 40.5) . And I need to know how the schema should look like in CQL 3

You cannot have dynamic column names, in the exact storage way you are thinking 
of them, in CQL3.

You can have a simple E-A-V scheme which works more or less the same way. It is 
less storage efficient, but you get the CQL interface. In the opinion of the 
developers, this was an acceptable tradeoff. In most cases, it probably is.

In other cases, I would recommend using thrift and actual dynamic columns... 
except that I logically presume Thrift will be eventually be deprecated. I am 
unable to recommend the use of a feature which I believe will eventually be 
removed from the product.

=Rob


The University of Dundee is a registered Scottish Charity, No: SC015096


Not-Equals (!=) in Where Clause

2014-09-29 Thread Timmy Turner
Looking through the CQL 3.1 grammar for Cassandra 2.1, I noticed that the
not-equals operator (!=) is in the grammar definition, but I can't seem to
find any legal way to use it.

Is != supported as part of the where clause in Cassandra? Or is it the
grammar for some other purpose?


Re: Running out of disk at bootstrap in low-disk situation

2014-09-29 Thread Robert Coli
On Sat, Sep 20, 2014 at 12:11 AM, Erik Forsberg forsb...@opera.com wrote:

 I've added all the 15 nodes, with some time inbetween - definitely more
 than the 2-minute rule. But it seems like compaction is not keeping up with
 the incoming data. Or at least that's my theory.


I personally would not combine vnodes and trying to add more than one node
at a time, at this time. I understand that you have a lot of nodes to add,
but this is potentially confounding the situation.

I conjecture that you are using level compaction. There is in your version
a pathological behavior during bootstrap where one ends up doing a lot of
compaction. I *think*, but am not sure, that the workaround is to use size
tiered compaction during bootstrap. I *believe* that is what the patch
upstream effectively does.

Probably unthrottling compaction will help, assuming you are not CPU or i/o
bound there.

#cassandra on freenode is probably a slightly better forum for interactive
discusson of detailed operational questions about production environments.

=Rob


Re: unreadable partitions

2014-09-29 Thread Robert Coli
On Sun, Sep 28, 2014 at 3:45 AM, tommaso barbugli tbarbu...@gmail.com
wrote:

 I see some data stored in Cassandra (2.0.7) being not readable from CQL;
 this affects entire partitions, querying this partitions raise a Java
 exception:


If the SSTable is not corrupt but is not readable via CQL and generates an
exception, that sounds like a bug to me.

Were I you, I would :

0) look for an existing JIRA
1) file a JIRA on http://issues.apache.org
2) reply to this thread with the URL of that JIRA for future googlers

=Rob


Re: Node Joining, Not Streaming

2014-09-29 Thread Robert Coli
On Wed, Sep 24, 2014 at 11:01 AM, Gene Robichaux gene.robich...@match.com
wrote:

  I just added two nodes, one in DC-A and one in DC-B.



 The node in DC-A started and immediately started to stream files from its
 piers. The node in DC-B has been in the JOINING state for nearly 24 hours
 and I have not seen any streams started.


Adding more than one node at a time is not really supported, and you can
end up in bad cases.

Future versions of Cassandra will Strongly Discourage you from doing this.

https://issues.apache.org/jira/browse/CASSANDRA-7069

If I were you, I would :

1) stop the DC-B node's bootstrap by stopping it and wiping its partially
bootstrapped state
2) wait for DC-A to finish bootstrapping
3) re-bootstrap DC-B node.

=Rob
http://twitter.com/rcolidba


Re: Is there harm from having all the nodes in the seed list?

2014-09-29 Thread Robert Coli
On Tue, Sep 23, 2014 at 10:31 AM, Donald Smith 
donald.sm...@audiencescience.com wrote:

  Is there any harm from having all the nodes listed in the seeds list in
 cassandra.yaml?


Yes, seed nodes cannot bootstrap.

https://issues.apache.org/jira/browse/CASSANDRA-5836

See comments there for details on how this actually doesn't make any sense.

The correct solution is almost certainly to have a dynamic seed provider,
which is why DSE and Priam both do that. But in practice it mostly doesn't
matter except in the annoying yet common CASSANDRA-5836 case.

=Rob


Re: Reading SSTables Potential File Descriptor Leak 1.2.18

2014-09-29 Thread Robert Coli
On Tue, Sep 23, 2014 at 5:47 PM, Tim Heckman t...@pagerduty.com wrote:

 As best I could tell, the majority of the file descriptors open were for a
 single SSTable '.db' file. Looking in the error logs I found quite a few
 exceptions that looked to have been identical:

...

 Before opening a JIRA ticket I thought I'd reach out to the list to see if
 anyone has seen any similar behavior as well as do a bit of source-diving
 to try and verify that the descriptor is actually leaking.


I would (search for, and failing to find one..) open a JIRA, and let the
list know its URL.

=Rob


timeout for port 7000 on stateful firewall? streaming_socket_timeout_in_ms?

2014-09-29 Thread Donald Smith
We have a stateful firewallhttp://en.wikipedia.org/wiki/Stateful_firewall 
between data centers for port 7000 (inter-cluster). How long should the idle 
timeout be for the connections on the firewall?

Similarly what's appropriate for streaming_socket_timeout_in_ms in 
cassandra.yaml?  The default is 0 (no timeout).  I presume that 
streaming_socket_timeout_in_ms refers to streams such as for bootstrapping and 
rebuilding.

Thanks

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Re: Indexes Fragmentation

2014-09-29 Thread Robert Coli
On Sun, Sep 28, 2014 at 9:49 AM, Arthur Zubarev arthur.zuba...@aol.com
wrote:

 There are 200+ times more updates and 50x inserts than analytical loads.
 In Cassandra to just be able to query (in CQL) on a column I have to have
 an index, the question is what tall the fragmentation coming from the
 frequent updates and inserts has on a CF? Do I also need to manually
 defrug?


You have appeared to have just asked if maintaing indexes which have a high
rate of change in a log structured database with immutable data files is
likely to be more performant than maintaining them in a database with
modify-in-place semantics.

No.

=Rob


best practice for waiting for schema changes to propagate

2014-09-29 Thread Clint Kelly
Hi all,

I often have problems with code that I write that uses the DataStax Java
driver to create / modify a keyspace or table and then soon after reads the
metadata for the keyspace to verify that whatever changes I made the
keyspace or table are complete.

As an example, I may create a table called `myTableName` and then very soon
after do something like:

assert(session
  .getCluster()
  .getMetaData()
  .getKeyspace(myKeyspaceName)
  .getTable(myTableName) != null)

I assume this fails sometimes because the default round-robin load
balancing policy for the Java driver will send my create-table request to
one node and the metadata read to another, and because it takes some time
for the table creation to propagate across all of the nodes in my cluster.

What is the best way to deal with this problem?  Is there a standard way to
wait for schema changes to propagate?

Best regards,
Clint


Re: Casssandra cluster setup.

2014-09-29 Thread Robert Coli
On Mon, Sep 22, 2014 at 6:32 AM, Muthu Kumar smk.mu...@gmail.com wrote:

  I  am trying to configure a Cassandra cluster with two nodes. I am new
 to Cassandra.
  I am using datastax distribution of Cassandra ( windows). I have
 installed the same in two nodes and configured it  works as a separate
 instance but not as cluster.

As a general statement, help with first time installations of Cassandra are
probably best handled interactively on #cassandra on freenode.

Posting such a debugging issue to a mailing list carries meaningful risk of
Warnocking. [1]

=Rob
 [1] http://en.wikipedia.org/wiki/Warnock's_dilemma


Re: Saving file content to ByteBuffer and to column does not retrieve the same size of data

2014-09-29 Thread Robert Coli
On Mon, Sep 22, 2014 at 3:50 AM, Carlos Scheidecker nando@gmail.com
wrote:

 I can successfully read a file to a ByteBuffer and then write to a
 Cassandra blob column. However, when I retrieve the value of the column,
 the size of the ByteBuffer retrieved is bigger than the original ByteBuffer
 where the file was read from. Writing to the disk, corrupts the image.


Probably don't write binary blobs like images into a database, use a
distributed filesystem?

https://github.com/mogilefs/

But I agree that this behavior sounds like a bug, I would probably file it
as a JIRA on http://issues.apache.org and then tell the list the URL of the
JIRA you filed.

=Rob