Anything special about upgrading from 2.0 to 2.1

2015-10-22 Thread Robert Wille
I’m on 2.0.16 and want to upgrade to the latest 2.1.x. I’ve seen some comments 
about issues with counters not migrating properly. I have a lot of counters. 
Any concerns there? Do I need to run nodetool upgradesstables? Any other 
gotchas?

Thanks

Robert



Re: Object Mapping VS Direct Queries

2015-10-22 Thread DuyHai Doan
Cons:

- depending on the object mapper and the features, you may (or may not)
have some slight overhead at runtime
- the CQL query may be "hidden" from the developer, though some object
mappers like Achilles has an option to display DML statements in the logs

Pros:
- make your life easier by removing the burden of manual mapping


On Thu, Oct 22, 2015 at 1:26 AM, Ashish Soni  wrote:

> Hi All ,
>
> Please let me know if there are any disadvantages of using Object Mapping
> instead of writing direct CQL queries.
>
> Ashish
>


Data Streamed successfully but not queryable

2015-10-22 Thread Jason Turner
Apologies for the long post but it's a complicated story...

I am running Cassandra in 3 datacenters 1,2 & 3. DCs 1 and 2 are working fine 
but I am having trouble selecting any data at all in DC 3.

I've stripped DC3 down from 4 nodes to a single brand new node to make 
debugging logs/traces etc easier. It's an AWS t2.medium, Cassandra 2.1.10 
running on Amazon Linux AMI 2015.09. Java 1.8.0.

I am using the CQLSSTableWriter/BulkLoader utility to stream up a very small 
amount of test data to an empty table. The Bulkloader output reports no errors 
and the System log looks fine - session completed, all bytes received. I can 
see the data.db files in the appropriate data directory flushed to disk, 
however I can't select any of the data via CQL. No exceptions thrown, no rows 
returned. It's like it's simply ignoring the data which I know exists.

My first thought was the timestamps on the data might somehow be set in the 
future but SSTable2json shows the relevant data exactly as I'd expect it with 
timestamps that match the upload times and should be readable. It's a fresh 
install with NTP running so I am ruling out any timedrift across the nodes.

It's worth noting that the CQLSSTableWriter/BulkLoader utility works fine 
streaming in data centers 1 & 2. Data streamed from DCs 1 & 2 into DC 3 is 
immediately selectable. Data streamed from DC3 into DCs 1&2 is also immediately 
selectable. Only data streamed from DC3 into the DC3 cluster is ignored by 
select statements.

As an example I've streamed the same test row into DC3 from DC1 and DC3. There 
are now be 2 rows/partitions in the table in DC3, here they are in json via 
SSTable2json -

Doesn't Show (Streamed from same datacenter):

{"key": "e3589ff7-f753-4ff3-9809-365322306825",
"cells": [["bdff0420-78d2-11e5-a0cc-0607dbae485d:","",1445528225152000],
   
["bdff0420-78d2-11e5-a0cc-0607dbae485d:c001","Value1",1445528225152000],
   
["bdff0420-78d2-11e5-a0cc-0607dbae485d:c002","Value2",1445528225152000],
   
["bdff0420-78d2-11e5-a0cc-0607dbae485d:c003","Value3",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c004","",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c005","",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c006","",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c007","",1445528225152000],
   
["bdff0420-78d2-11e5-a0cc-0607dbae485d:c008","Value4",1445528225152000],
   
["bdff0420-78d2-11e5-a0cc-0607dbae485d:c009","Value5",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c010","",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c011","",1445528225152000],
   
["bdff0420-78d2-11e5-a0cc-0607dbae485d:c012","Value6",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c013","",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c014","",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c015","",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c016","",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c017","",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c018","",1445528225152000],
   ["bdff0420-78d2-11e5-a0cc-0607dbae485d:c019","",1445528225152000],
   
["bdff0420-78d2-11e5-a0cc-0607dbae485d:c020","Value7",1445528225152000]]}

Shows Correctly (Streamed over VPN from different datacenter):

{"key": "1448b4de-7ebe-4d46-a2a4-365443970109",
"cells": [["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:","",1445529108746000],
   
["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c001","Value1",1445529108746000],
   
["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c002","Value2",1445529108746000],
   
["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c003","Value3",1445529108746000],
   ["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c004","",1445529108746000],
   ["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c005","",1445529108746000],
   ["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c006","",1445529108746000],
   ["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c007","",1445529108746000],
   
["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c008","Value4",1445529108746000],
   
["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c009","Value5",1445529108746000],
   ["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c010","",1445529108746000],
   ["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c011","",1445529108746000],
   
["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c012","Value6",1445529108746000],
   ["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c013","",1445529108746000],
   ["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c014","",1445529108746000],
   ["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c015","",1445529108746000],
   ["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:c016","",1445529108746000],
   

RE: Hiper-V snapshot and Cassandra

2015-10-22 Thread Raul D'Opazo
Hi, so what is the usual thing to take care of backups:

-Take Cassandra snapshots

-Take this snapshots to a backup system
?


From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: Tuesday, October 20, 2015 5:07 PM
To: user@cassandra.apache.org
Subject: Re: Hiper-V snapshot and Cassandra

On Tue, Oct 20, 2015 at 4:22 AM, Raul D'Opazo 
> wrote:
I only have one node, in one server (Windows 2012), and Cassandra will grow up 
to 4TB approx. It is a hiper-v virtual machine, with enough resources.

This is an extremely unusual and probably degenerate use of Cassandra.

I have done snapshots and it is ok, because we don’t double the size in each 
snapshot, but I need to have other solution in case of disks problems.

I have no idea how snapshots work in Windows; if like linux, each snapshot is 
hard links to the actual data files.

I am thinking if hiper-v virtual machine snapshots can be used to recover 
Cassandra in a consistence way. Is it possible?

Sure? If you quiesce writes to the system or if you don't care about the delta 
in the commit log between snapshot+hiper-v snapshot, your snapshot will contain 
all the immutable data files you need to restore.

Finally, I re-iterate my confusion at why you wish to do this unusual thing?

=Rob



Re: C* Table Changed and Data Migration with new primary key

2015-10-22 Thread DuyHai Doan
Use Spark to distribute the job of copying data all over the cluster and
help accelerating the migration. The Spark connector does auto paging in
the background with the Java Driver
Le 22 oct. 2015 11:03, "qihuang.zheng"  a
écrit :

> I tried using java driver with *auto paging query: setFetchSize* instead
> of token function. as Cass has this feature already.
> ref from here:
> http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
>
> But I tried in test envrionment with only 1million data read then insert 3
> tables, It’s too slow.
> After running 20 min, Exception like NoHostAvailableException happen,
> offcourse data did’t sync completed.
> And our product env has nearly 25 billion data. which is unacceptble for
> this case. It’s there other ways?
>
> --
> Thanks & Regards,
> qihuang.zheng
>
>  原始邮件
> *发件人:* Jeff Jirsa
> *收件人:* user@cassandra.apache.org
> *发送时间:* 2015年10月22日(周四) 13:52
> *主题:* Re: C* Table Changed and Data Migration with new primary key
>
> Because the data format has changed, you’ll need to read it out and write
> it back in again.
>
> This means using either a driver (java, python, c++, etc), or something
> like spark.
>
> In either case, split up the token range so you can parallelize it for
> significant speed improvements.
>
>
>
> From: "qihuang.zheng"
> Reply-To: "user@cassandra.apache.org"
> Date: Wednesday, October 21, 2015 at 6:18 PM
> To: user
> Subject: C* Table Changed and Data Migration with new primary key
>
> Hi All:
>
>   We have a table defined only one partition key and some cluster key.
> CREATE TABLE test1 (
>   attribute text,
>   partner text,
>   app text,
>   "timestamp" bigint,
>   event text,
>   PRIMARY KEY ((attribute), partner, app, "timestamp")
> )
> And now we want to split  original test1 table to 3 tables like this:
> test_global :  PRIMARY KEY ((attribute), “timestamp")
> test_partner:  PRIMARY KEY ((attribute, partner), "timestamp”)
> test_app:   PRIMARY KEY ((attribute, partner, app), “timestamp”)
>
> Why we split original table because when query *global data* by timestamp
> desc like this:
> select * from test1 where attribute=? order by timestamp desc
> is not support in Cass. As class order by support should use all
> clustering key.
> But sql like this:
> select * from test1 where attribute=? order by partner desc,app desc,
> timestamp desc
> can’t query the right global data by ts desc.
> After Split table we could do globa data query right: select * from
> test_global where attribute=? order by timestamp desc.
>
> Now we have a problem of* data migration*.
> As I Know, *sstableloader* is the most easy way,but could’t deal with
> different table name. (Am I right?)
> And *cp* cmd in cqlsh can’t fit our situation because our data is two
> large. (10Nodes, one nodes has 400G data)
> I alos try JavaAPI by query the origin table and then insert into 3
> different splited table.But seems too slow
>
> Any Solution aboult quick data migration?
> TKS!!
>
> PS: Cass version: 2.0.15
>
>
>
> --
> Thanks & Regards,
> qihuang.zheng
>


Re: C* Table Changed and Data Migration with new primary key

2015-10-22 Thread Jack Krupansky
Consider the new 3.0 Materialized Views feature - you keep the existing
table and create three MVs, each with a different a primary key. Cassandra
will then populate the new MVs from the existing base table data.

See:
https://issues.apache.org/jira/browse/CASSANDRA-6477

-- Jack Krupansky

On Wed, Oct 21, 2015 at 9:18 PM, qihuang.zheng  wrote:

> Hi All:
>
>   We have a table defined only one partition key and some cluster key.
> CREATE TABLE test1 (
>   attribute text,
>   partner text,
>   app text,
>   "timestamp" bigint,
>   event text,
>   PRIMARY KEY ((attribute), partner, app, "timestamp")
> )
> And now we want to split  original test1 table to 3 tables like this:
> test_global :  PRIMARY KEY ((attribute), “timestamp")
> test_partner:  PRIMARY KEY ((attribute, partner), "timestamp”)
> test_app:   PRIMARY KEY ((attribute, partner, app), “timestamp”)
>
> Why we split original table because when query *global data* by timestamp
> desc like this:
> select * from test1 where attribute=? order by timestamp desc
> is not support in Cass. As class order by support should use all
> clustering key.
> But sql like this:
> select * from test1 where attribute=? order by partner desc,app desc,
> timestamp desc
> can’t query the right global data by ts desc.
> After Split table we could do globa data query right: select * from
> test_global where attribute=? order by timestamp desc.
>
> Now we have a problem of* data migration*.
> As I Know, *sstableloader* is the most easy way,but could’t deal with
> different table name. (Am I right?)
> And *cp* cmd in cqlsh can’t fit our situation because our data is two
> large. (10Nodes, one nodes has 400G data)
> I alos try JavaAPI by query the origin table and then insert into 3
> different splited table.But seems too slow
>
> Any Solution aboult quick data migration?
> TKS!!
>
> PS: Cass version: 2.0.15
>
>
>
> --
> Thanks & Regards,
> qihuang.zheng
>


Re: Anything special about upgrading from 2.0 to 2.1

2015-10-22 Thread Robert Coli
On Thu, Oct 22, 2015 at 12:35 PM, Robert Wille  wrote:

> I’m on 2.0.16 and want to upgrade to the latest 2.1.x. I’ve seen some
> comments about issues with counters not migrating properly. I have a lot of
> counters. Any concerns there? Do I need to run nodetool upgradesstables?
> Any other gotchas?
>

The answers to these questions and more in NEWS.txt!

=Rob


Automatic pagination does not get all results

2015-10-22 Thread Sid Tantia
Hello,

Has anyone had a problem with automatic pagination returning different
results everytime (this is for a table with ~180,000 rows)? I'm going
through each page and inserting the results into an array and each time I
go through all the pages, the resultant array has a different size.

This happens whether I use a SELECT query with automatic paging using the
Ruby driver or a COPY to CSV command with cqlsh.

-Sid


Re: C* Table Changed and Data Migration with new primary key

2015-10-22 Thread qihuang.zheng
I tried using java driver with auto paging query: setFetchSize instead of token 
function. as Cass has this feature already.
ref from 
here:http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0


But I tried in test envrionment with only 1million data read then insert 3 
tables, It’s too slow.
After running 20 min, Exception likeNoHostAvailableException happen, offcourse 
data did’t sync completed.
And our product env has nearly 25 billion data. which is unacceptble for this 
case. It’s there other ways?



Thanks  Regards,
qihuang.zheng


原始邮件
发件人:Jeff jirsajeff.ji...@crowdstrike.com
收件人:user@cassandra.apache.orgu...@cassandra.apache.org
发送时间:2015年10月22日(周四) 13:52
主题:Re: C* Table Changed and Data Migration with new primary key


Because the data format has changed, you’ll need to read it out and write it 
back in again.


This means using either a driver (java, python, c++, etc), or something like 
spark.


In either case, split up the token range so you can parallelize it for 
significant speed improvements.






From: "qihuang.zheng"
Reply-To: "user@cassandra.apache.org"
Date: Wednesday, October 21, 2015 at 6:18 PM
To: user
Subject: C* Table Changed and Data Migration with new primary key



Hi All:
 We have a table defined only one partition key and some cluster key.
CREATE TABLE test1 (
 attribute text,   
 partner text,  
 app text,
 "timestamp" bigint, 
 event text, 
 PRIMARY KEY ((attribute), partner, app, "timestamp")
)
And now we want to split original test1 table to 3 tables like this: 
test_global : PRIMARY KEY ((attribute),“timestamp")
test_partner: PRIMARY KEY ((attribute, partner), "timestamp”)
test_app:PRIMARY KEY ((attribute, partner, app), “timestamp”)


Why we split original table because when queryglobal databy timestamp desc like 
this:
select * from test1 where attribute=? order by timestamp desc
is not support in Cass. As class order by support should use all clustering key.
But sql like this:
select * from test1 where attribute=? order by partner desc,app desc, timestamp 
desc
can’t query the right global data by ts desc.
After Split table we could do globa data query right: select * from test_global 
where attribute=? order by timestamp desc.


Now we have a problem ofdata migration.
As I Know,sstableloaderis the most easy way,but could’t deal with different 
table name. (Am I right?)
Andcpcmd in cqlsh can’t fit our situation because our data is two large. 
(10Nodes, one nodes has 400G data)
I alos try JavaAPI by query the origin table and then insert into 3 different 
splited table.But seems too slow


Any Solution aboult quick data migration?
TKS!!


PS: Cass version: 2.0.15




Thanks  Regards,
qihuang.zheng

[ANNOUNCE] YCSB 0.4.0 Release

2015-10-22 Thread Robert J. Moore
On behalf of the development community, I am pleased to announce the 
release of YCSB 0.4.0.


Highlights:

* Default measurement changed from histogram to hdrhistogram.
* Users who want previous behavior can set the 'measurementtype' 
property to 'histogram'.
* Reported 95th and 99th percentile latencies now in microseconds 
(previously in milliseconds).
* The HBase Binding has been split into 3 seperate bindings based on 
your version of HBase and

the names have changed to hbase10, hbase098, and hbase094.

Bug Fixes:

* Previously, with hdrhistogram, the 95th percentile actually reported 
the 90th percentile value. It now reports the actual 95th percentile value.

* Fixed a race condition between insert and read/update operations.


Full release notes, including links to source and convenience binaries:

https://github.com/brianfrankcooper/YCSB/releases/tag/0.4.0

This release covers changes from the last 2 months.

--
Rob


Re: Is replication possible with already existing data?

2015-10-22 Thread Ajay Garg
Hi Carlos.


I setup a following setup ::

CAS11 and CAS12 in DC1
CAS21 and CAS22 in DC2

a)
Brought all the 4 up, replication worked perfect !!!

b)
Thereafter, downed CAS11 via "sudo service cassandra stop".
Replication continued to work fine on CAS12, CAS21 and CAS22.

c)
Thereafter, upped CAS11 via "sudo service cassandra start".


However, CAS11 refuses to come up now.
Following is the error in /var/log/cassandra/system.log ::



ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
configuration error
org.apache.cassandra.exceptions.ConfigurationException: Cannot change the
number of tokens from 1 to 256
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:966)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:734)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
[apache-cassandra-2.1.10.jar:2.1.10]
INFO  [StorageServiceShutdownHook] 2015-10-23 03:07:34,271
Gossiper.java:1442 - Announcing shutdown
INFO  [GossipStage:1] 2015-10-23 03:07:34,282 OutboundTcpConnection.java:97
- OutboundTcpConnection using coalescing strategy DISABLED
ERROR [StorageServiceShutdownHook] 2015-10-23 03:07:34,305
CassandraDaemon.java:227 - Exception in thread
Thread[StorageServiceShutdownHook,5,main]
java.lang.NullPointerException: null
at
org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1624)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1632)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1686)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1510)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationStateInternal(Gossiper.java:1412)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationStates(Gossiper.java:1427)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1417)
~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.gms.Gossiper.stop(Gossiper.java:1443)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:678)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-2.1.10.jar:2.1.10]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]



Ideas?


Thanks and Regards,
Ajay



On Mon, Oct 12, 2015 at 3:46 PM, Carlos Alonso  wrote:

> Yes Ajay, in your particular scenario, after all hints are delivered, both
> CAS11 and CAS12 will have the exact same data.
>
> Cheers!
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 11 October 2015 at 05:21, Ajay Garg  wrote:
>
>> Thanks a ton Anuja for the help !!!
>>
>> On Fri, Oct 9, 2015 at 12:38 PM, anuja jain  wrote:
>> > Hi Ajay,
>> >
>> >
>> > On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg 
>> wrote:
>> >>
>> > In this case, it will be the responsibility of APP1 to start connection
>> to
>> > CAS12. On the other hand if your APP1 is connecting to cassandra using
>> Java
>> > driver, you can add multiple contact points(CAS11 and CAS12 here) so
>> that if
>> > CAS11 is down it will directly connect to CAS12.
>>
>> Great .. Java-driver it will be :)
>>
>>
>>
>>
>> >>
>> > In such a case, CAS12 will store hints for the data to be stored on
>> CAS11
>> > (the tokens of which lies within the range of tokens CAS11 holds)  and
>> > whenever CAS11 is up again, the hints will be transferred to it and the
>> data
>> > will be distributed evenly.
>> >
>>
>> Evenly?
>>
>> Should not the data be """EXACTLY""" equal after CAS11 comes back up
>> and the sync/transfer/whatever happens?
>> After all, before CAS11 went down, CAS11 and CAS12 were replicating all
>> data.
>>
>>
>> Once again, thanks for your help.
>> I will be even 

Re: Is replication possible with already existing data?

2015-10-22 Thread Michael Shuler

On 10/22/2015 10:14 PM, Ajay Garg wrote:

However, CAS11 refuses to come up now.
Following is the error in /var/log/cassandra/system.log ::



ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
configuration error
org.apache.cassandra.exceptions.ConfigurationException: Cannot change
the number of tokens from 1 to 256


Check your cassandra.yaml - this node has vnodes enabled in the 
configuration when it did not, previously. Check all nodes. Something 
changed. Mixed vnode/non-vnode clusters is bad juju.


--
Kind regards,
Michael


Re: Automatic pagination does not get all results

2015-10-22 Thread Jeff Jirsa
It’s possible that it could be different depending on your consistency level 
(on write and on read).

It’s also possible it’s a bug, but you didn’t give us much information – here 
are some questions to help us help you:

What version? 
What results are you seeing? 
What’s the “right” result? 
What CL did you use to write the data? 
What CL did you use to read the data? 
Have you run repair since writing the data?


From:  Sid Tantia
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, October 22, 2015 at 5:49 PM
To:  user
Subject:  Automatic pagination does not get all results

Hello,

Has anyone had a problem with automatic pagination returning different results 
everytime (this is for a table with ~180,000 rows)? I'm going through each page 
and inserting the results into an array and each time I go through all the 
pages, the resultant array has a different size. 

This happens whether I use a SELECT query with automatic paging using the Ruby 
driver or a COPY to CSV command with cqlsh.

-Sid




smime.p7s
Description: S/MIME cryptographic signature


Re: Is replication possible with already existing data?

2015-10-22 Thread Ajay Garg
Hi Michael.

Please find below the contents of cassandra.yaml for CAS11 (the files on
the rest of the three nodes are also exactly the same, except the
"initial_token" and "listen_address" fields) ::

CAS11 ::


cluster_name: 'InstaMsg Cluster'
num_tokens: 256
initial_token: -9223372036854775808
hinted_handoff_enabled: true
max_hint_window_in_ms: 1080 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog

disk_failure_policy: stop
commit_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  - seeds: "104.239.200.33,119.9.92.77"

concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

memtable_allocation_type: heap_buffers

index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: 104.239.200.33
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: localhost
rpc_port: 9160
rpc_keepalive: true

rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true

tombstone_warn_threshold: 1000
tombstone_failure_threshold: 10

column_index_size_in_kb: 64
batch_size_warn_threshold_in_kb: 5

compaction_throughput_mb_per_sec: 16
compaction_large_partition_warning_threshold_mb: 100

sstable_preemptive_open_interval_in_mb: 50

read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 1

write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 6
request_timeout_in_ms: 1
cross_node_timeout: false
endpoint_snitch: PropertyFileSnitch

dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 60
dynamic_snitch_badness_threshold: 0.1

request_scheduler: org.apache.cassandra.scheduler.NoScheduler

server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra

client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra

internode_compression: all
inter_dc_tcp_nodelay: false



What changes need to be made, so that whenever a downed server comes back
up, the missing data comes back over to it?

Thanks and Regards,
Ajay



On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler 
wrote:

> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>
>> However, CAS11 refuses to come up now.
>> Following is the error in /var/log/cassandra/system.log ::
>>
>>
>> 
>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>> configuration error
>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>> the number of tokens from 1 to 256
>>
>
> Check your cassandra.yaml - this node has vnodes enabled in the
> configuration when it did not, previously. Check all nodes. Something
> changed. Mixed vnode/non-vnode clusters is bad juju.
>
> --
> Kind regards,
> Michael
>



-- 
Regards,
Ajay


cassandra bootstrapping

2015-10-22 Thread Lou Kamenov
Hey everyone,

I keep on seeing that there should be a 2 minute delay when bootstrapping a
cluster, and
I have few questions round that.

For starters, is there any reasoning why this is 2min and not less or more?
Is this valid mostly for bootstraping an empty cluster ring  or for
restarting an existing established cluster?

Thank you!
L


Re: cassandra bootstrapping

2015-10-22 Thread Nate McCall
> I keep on seeing that there should be a 2 minute delay when bootstrapping
a cluster, and
> I have few questions round that.
>
> For starters, is there any reasoning why this is 2min and not less or
more?
> Is this valid mostly for bootstraping an empty cluster ring  or for
> restarting an existing established cluster?

There is a good comment at the top of StorageService#joinTokenRing which
explains the process at a high level:
https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/service/StorageService.java#L791-L803

The method itself is long, but readible and has a series of comments that
explain some of the decisions taken and even reference some issues which
have been encountered over the years.

You can change this value if you really want by passing
"cassandra.ring_delay_ms" as a system property at startup.


--
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com