Re: mysql based columnar DB to Cassandra DB - Migration

2014-11-27 Thread Kiran Ayyagari
On Fri, Nov 28, 2014 at 1:06 PM, Akshay Ballarpure <
akshay.ballarp...@tcs.com> wrote:

> Thanks Kiran for reply.
> How about other column based databases like infobright , hbase .. can we
> really migrate it to cassandra ?
>
> no, this is only for migrating data from RDBMS to Cassandra

>
>
>
> From:Kiran Ayyagari 
> To:user@cassandra.apache.org
> Date:11/28/2014 08:27 AM
> Subject:Re: mysql based columnar DB to Cassandra DB - Migration
> --
>
>
>
>
>
> On Wed, Nov 26, 2014 at 2:15 PM, Akshay Ballarpure <
> *akshay.ballarp...@tcs.com* > wrote:
> Hello Folks,
> I have one mysql based columnar DB, i want to migrate it to Cassandra. How
> its possible ?
>
> see if Troop[1] helps, it was only tested with mysql 5.x and Cassandra
> 2.0.10
> [1] *https://github.com/kayyagari/troop*
> 
> Best Regards
> Akshay Ballarpure
> Tata Consultancy Services
> Cell:- *9985084075* <9985084075>
> Mailto: *akshay.ballarp...@tcs.com* 
> Website: *http://www.tcs.com* 
> 
> Experience certainty.IT Services
>Business Solutions
>Consulting
> 
>
>
>
> From:Akshay Ballarpure/HYD/TCS
> To:*user@cassandra.apache.org* 
> Date:11/18/2014 09:00 PM
> Subject:mysql based columnar DB to Cassandra DB - Migration
>  --
>
>
>
> I have one mysql based columnar DB, i want to migrate it to Cassandra. How
> its possible ?
>
> Best Regards
> Akshay Ballarpure
> Tata Consultancy Services
> Cell:- *9985084075* <9985084075>
> Mailto: *akshay.ballarp...@tcs.com* 
> Website: *http://www.tcs.com* 
> 
> Experience certainty. IT Services
> Business Solutions
> Consulting
> 
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
>
> --
> Kiran Ayyagari
> *http://keydap.com* 
>



-- 
Kiran Ayyagari
http://keydap.com


Re: mysql based columnar DB to Cassandra DB - Migration

2014-11-27 Thread Akshay Ballarpure
Thanks Kiran for reply.
How about other column based databases like infobright , hbase .. can we 
really migrate it to cassandra ?




From:   Kiran Ayyagari 
To: user@cassandra.apache.org
Date:   11/28/2014 08:27 AM
Subject:Re: mysql based columnar DB to Cassandra DB - Migration





On Wed, Nov 26, 2014 at 2:15 PM, Akshay Ballarpure <
akshay.ballarp...@tcs.com> wrote:
Hello Folks, 
I have one mysql based columnar DB, i want to migrate it to Cassandra. How 
its possible ? 

see if Troop[1] helps, it was only tested with mysql 5.x and Cassandra 
2.0.10
[1] https://github.com/kayyagari/troop
Best Regards
Akshay Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty.IT Services
   Business Solutions
   Consulting
 



From:Akshay Ballarpure/HYD/TCS 
To:user@cassandra.apache.org 
Date:11/18/2014 09:00 PM 
Subject:mysql based columnar DB to Cassandra DB - Migration 



I have one mysql based columnar DB, i want to migrate it to Cassandra. How 
its possible ? 

Best Regards
Akshay Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty. IT Services
Business Solutions
Consulting
 
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you



-- 
Kiran Ayyagari
http://keydap.com


Re: mysql based columnar DB to Cassandra DB - Migration

2014-11-27 Thread Kiran Ayyagari
On Wed, Nov 26, 2014 at 2:15 PM, Akshay Ballarpure <
akshay.ballarp...@tcs.com> wrote:

> Hello Folks,
> I have one mysql based columnar DB, i want to migrate it to Cassandra. How
> its possible ?
>
> see if Troop[1] helps, it was only tested with mysql 5.x and Cassandra
2.0.10
[1] https://github.com/kayyagari/troop

> Best Regards
> Akshay Ballarpure
> Tata Consultancy Services
> Cell:- 9985084075
> Mailto: akshay.ballarp...@tcs.com
> Website: http://www.tcs.com
> 
> Experience certainty.IT Services
>Business Solutions
>Consulting
> 
>
>
>
> From:Akshay Ballarpure/HYD/TCS
> To:user@cassandra.apache.org
> Date:11/18/2014 09:00 PM
> Subject:mysql based columnar DB to Cassandra DB - Migration
> --
>
>
>
> I have one mysql based columnar DB, i want to migrate it to Cassandra. How
> its possible ?
>
> Best Regards
> Akshay Ballarpure
> Tata Consultancy Services
> Cell:- 9985084075
> Mailto: akshay.ballarp...@tcs.com
> Website: *http://www.tcs.com* 
> 
> Experience certainty. IT Services
> Business Solutions
> Consulting
> 
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>


-- 
Kiran Ayyagari
http://keydap.com


Re: Storing time-series and geospatial data in C*

2014-11-27 Thread Jabbar Azam
Spico,
Here's a link flor the time series data
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

You'll also need to understand the composite key format
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refCompositePk.html

Mike Malone has done videos and slides on how they used an older version of
cassandra for storing geo
information http://readwrite.com/2011/02/17/video-simplegeo-cassandra

Or you can use elastic search for for working with geospatial information.
http://blog.florian-hopf.de/2014/08/use-cases-for-elasticsearch-geospatial.html

A word of warning though with elastic search. It does not provide simple
linear scalability like cassandra, nor is it easy to setup for cross
datacentre operation.


Datastax enterprise has Solr integrated so you could use that
http://digbigdata.com/geospatial-search-cassandra-datastax-enterprise/

Jabbar Azam



On Thu Nov 27 2014 at 12:39:59 PM Spico Florin 
wrote:

> Hello!
>   Can you please recommend me some new articles and case studies were
> Cassandra was used to store time-series and geo-spatial data? I'm
> particular interested in best practices, data models and retrieval
> techniques.
>  Thanks.
>  Regards,
>  Florin
>
>


Re: Column family ID mismatch-Error on concurrent schema modifications

2014-11-27 Thread Jens-U. Mozdzen

Hi DuyHai,

Zitat von DuyHai Doan :

Hello Peter


I'm working with Peter and am the one initiating the table creation in  
my code.



For safe concurrent table creation, use CREATE TABLE xxx IF NOT EXISTS. It


unfortunately, my code already has the "IF NOT EXISTS" clause in the  
create statement, but we see the exceptions nevertheless.


(I'll leave it to Peter to post his test case statements, if required,  
since he has created an isolated test to reliably reproduce the  
problem.)



will use light weight transaction and you'll have to pay some penalty in
term of performance but at least the table creation will be linearizable
Le 27 nov. 2014 14:26, "Peter Lange"  a écrit :


Hi,

We use a four-node Cassandra-Cluster in Version 2.1.2. Our
Client-Applications creates Tables dynamically. At one point two (or more)
of our Clients connected to two (or more) different Cassandra-Nodes will
create the same table simultaneously. We get the "Column family ID
mismatch"-Error-Messages on every node. Why is this simultanous schema
modification not possible? How can we handle this? Every Help is
appreciated.
[...]


Regards,
Jens



Re: Column family ID mismatch-Error on concurrent schema modifications

2014-11-27 Thread Jens-Uwe Mozdzen

Hi Eric,

Zitat von Eric Stevens :

Be careful with creating many dynamically created column families unless
you're cleaning up old ones to keep the total number of CF's reasonable.
Having many column families will increase memory pressure and reduce
overall performance.


will "inactive" CFs be released from C*'s memory after i.e. a few days  
or when under resource pressure? We do intend to have quite some  
hundreds of tables, but with only 10% actively used (no reads nor  
writes) after some time.


These CFs are used as "time buckets", but are to be kept for speedy  
recovery. Of course, this design may be substituted by some external  
backup mechanism, but I'd like to keep that as "plan C" ;)


Regards,
Jens
--
Jens-U. Mozdzen voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15   mobile  : +49-179-4 98 21 98
D-22423 Hamburg e-mail  : jmozd...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983



Re: Column family ID mismatch-Error on concurrent schema modifications

2014-11-27 Thread Eric Stevens
Be careful with creating many dynamically created column families unless
you're cleaning up old ones to keep the total number of CF's reasonable.
Having many column families will increase memory pressure and reduce
overall performance.

On Thu Nov 27 2014 at 8:19:35 AM DuyHai Doan  wrote:

> Hello Peter
>
> For safe concurrent table creation, use CREATE TABLE xxx IF NOT EXISTS. It
> will use light weight transaction and you'll have to pay some penalty in
> term of performance but at least the table creation will be linearizable
> Le 27 nov. 2014 14:26, "Peter Lange"  a écrit :
>
> Hi,
>>
>> We use a four-node Cassandra-Cluster in Version 2.1.2. Our
>> Client-Applications creates Tables dynamically. At one point two (or more)
>> of our Clients connected to two (or more) different Cassandra-Nodes will
>> create the same table simultaneously. We get the "Column family ID
>> mismatch"-Error-Messages on every node. Why is this simultanous schema
>> modification not possible? How can we handle this? Every Help is
>> appreciated.
>>
>> The lengthy Error-Messages from two nodes follows:
>>
>> On Node1 we got:
>>
>> INFO  [SharedPool-Worker-2] 2014-11-26 13:37:28,987
>> MigrationManager.java:248 - Create new ColumnFamily:
>> org.apache.cassandra.config.CFMetaData@7edad3a3[cfId=
>> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_
>> table,
>> INFO  [MigrationStage:1] 2014-11-26 13:37:29,607 DefsTables.java:373 -
>> Loading org.apache.cassandra.config.CFMetaData@7adc8efd[cfId=
>> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_
>> table,
>> INFO  [MigrationStage:1] 2014-11-26 13:37:29,629
>> ColumnFamilyStore.java:284 - Initializing myplayground.test_table
>> ERROR [MigrationStage:1] 2014-11-26 13:37:30,282 CassandraDaemon.java:153
>> - Exception in thread Thread[MigrationStage:1,5,main]
>> java.lang.RuntimeException: 
>> org.apache.cassandra.exceptions.ConfigurationException:
>> Column family ID mismatch (
>> found fbd275c0-7568-11e4-b9ea-3934eddce895; expected
>> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
>> at 
>> org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1171)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at 
>> org.apache.cassandra.db.DefsTables.updateColumnFamily(DefsTables.java:422)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at 
>> org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:295)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at 
>> org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at 
>> org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at org.apache.cassandra.db.DefinitionsUpdateVerbHandler$
>> 1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_25]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> ~[na:1.8.0_25]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_25]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_25]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
>> Caused by: org.apache.cassandra.exceptions.ConfigurationException:
>> Column family ID mismatch (found fbd275c0-7568-11e4-b9ea-3934eddce895;
>> expected fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
>> at 
>> org.apache.cassandra.config.CFMetaData.validateCompatility(CFMetaData.java:1254)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:1186)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at 
>> org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1167)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> ... 11 common frames omitted
>>
>> On Node2 we got:
>>
>> INFO  [SharedPool-Worker-1] 2014-11-26 13:37:28,989
>> MigrationManager.java:248 - Create new ColumnFamily:
>> org.apache.cassandra.config.CFMetaData@16d0bc0d[cfId=
>> fbd275c0-7568-11e4-b9ea-3934eddce895,ksName=myplayground,cfName=test_
>> table,
>> INFO  [MigrationStage:1] 2014-11-26 13:37:29,539 DefsTables.java:373 -
>> Loading org.apache.cassandra.config.CFMetaData@3777e24b[cfId=
>> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_
>> table,
>> INFO  [MigrationStage:1] 2014-11-26 13:37:29,541
>> ColumnFamilyStore.java:284 - Initializing myplayground.test_table
>> ERROR [SharedPool-Worker-1] 2014-11-26 13:37:29,984 QueryMessage.java:130
>> - Unexpected error during query
>> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
>> java.lang.RuntimeException: 
>> org.apache.cassandra.exceptions.Configura

Re: Data synchronization between 2 running clusters on different availability zone

2014-11-27 Thread Eric Stevens
There's no reason you can't run on multiple cloud providers as long as you
treat them as logically distinct DC's.  It should largely work the same way
as running in several AWS regions, but you'll need to use something
like GossipingPropertyFileSnitch
because the EC2 snitches are specific to AWS.

On Thu Nov 27 2014 at 2:26:27 AM Spico Florin  wrote:

> Hello!
>   I have another question. What about the following scenario: two
> Cassandra instances installed on different cloud providers (EC2, Flexiant)?
> How do you synchronize them? Can you use some internal tools or do I have
> to implement my own mechanism?
> Thanks.
>  Florin
>
>
> On Thu, Nov 27, 2014 at 11:18 AM, Spico Florin 
> wrote:
>
>> Hello, Rob!
>>   Thank you very much for the detailed support.
>> Regards,
>>  Florin
>>
>> On Wed, Nov 26, 2014 at 12:41 AM, Robert Coli 
>> wrote:
>>
>>> On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin 
>>> wrote:
>>>
 1. For ensuring high availability I would like to install one Cassandra
 cluster on one availability zone
 (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon
 EC2 US-west).

>>>
>>> One cluster, replication factor of 2, cluster configured with a rack
>>> aware snitch is how this is usually done. Well, more accurately, people
>>> usually deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is
>>> also required to use QUORUM Consistency Level.
>>>
>>> If you will always operate only out of EC2, you probably want to look
>>> into the EC2Snitch. If you plan to ultimately go multi-region,
>>> EC2MultiRegionSnitch. If maybe hybrid in the future,
>>> GossipingPropertyFileSnitch.
>>>
>>>
>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html
>>>
>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html
>>>
>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html
>>>
>>> For some good meta on the internals here :
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-3810
>>>
>>> =Rob
>>> http://twitter.com/rcolidba
>>>
>>>
>>
>>
>


Re: Storing time-series and geospatial data in C*

2014-11-27 Thread Jack Krupansky
How you store the data will be mostly a matter of how you wish to access the 
data after it is stored. IOW, what kinds of queries or batch processing and how 
you intend to sequence through the data. And also what categories of “warmth” 
you intend to maintain, especially for data to be queried most frequently. For 
example, will aged data be deleted and with what frequency. It also depends on 
what types of aggregations, or rollups, you might want to perform.

-- Jack Krupansky

From: Spico Florin 
Sent: Thursday, November 27, 2014 7:38 AM
To: user@cassandra.apache.org 
Subject: Storing time-series and geospatial data in C*

Hello! 
  Can you please recommend me some new articles and case studies were Cassandra 
was used to store time-series and geo-spatial data? I'm particular interested 
in best practices, data models and retrieval techniques.
Thanks.
Regards,
Florin


Re: multiple threads updating result in TransportException

2014-11-27 Thread Eric Stevens
A lot of people do a lot of multi-threaded work with Datastax Java Driver.
It looks like you're using Cassandra Driver 2.0.0-RC2, might I suggest as a
first step, at least upgrade to 2.0.0 final?  RC2 wasn't even the final
release candidate for 2.0.0.

On Wed Nov 26 2014 at 8:44:07 AM Brian Tarbox  wrote:

> We're running into a problem where things are fine if our client runs
> single threaded but gets TransportException if we use multiple threads.
> The datastax driver gets an NIO checkBounds error.
>
> Here is a link to a stack overflow question we found that describes the
> problem we're seeing.  This question was asked 7 months ago and got no
> answers.
>
> We're running C* 2.0.9 and see the problem on our single node test cluster.
>
> Here is the stack trace we see:
>
> at java.nio.Buffer.checkBounds(Buffer.java:559) ~[na:1.7.0_55]
>
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:143)
> ~[na:1.7.0_55]
>
> at
> org.jboss.netty.buffer.HeapChannelBuffer.setBytes(HeapChannelBuffer.java:136)
> ~[netty-3.7.0.Final.jar:na]
>
> at
> org.jboss.netty.buffer.AbstractChannelBuffer.writeBytes(AbstractChannelBuffer.java:472)
> ~[netty-3.7.0.Final.jar:na]
>
> at com.datastax.driver.core.CBUtil.writeValue(CBUtil.java:272)
> ~[cassandra-driver-core-2.0.0-rc2.jar:na]
>
> at com.datastax.driver.core.CBUtil.writeValueList(CBUtil.java:297)
> ~[cassandra-driver-core-2.0.0-rc2.jar:na]
>
> at
> com.datastax.driver.core.Requests$QueryProtocolOptions.encode(Requests.java:223)
> ~[cassandra-driver-core-2.0.0-rc2.jar:na]
>
> at
> com.datastax.driver.core.Requests$Execute$1.encode(Requests.java:122)
> ~[cassandra-driver-core-2.0.0-rc2.jar:na]
>
> at
> com.datastax.driver.core.Requests$Execute$1.encode(Requests.java:119)
> ~[cassandra-driver-core-2.0.0-rc2.jar:na]
>
> at
> com.datastax.driver.core.Message$ProtocolEncoder.encode(Message.java:184)
> ~[cassandra-driver-core-2.0.0-rc2.jar:na]
>
> at
> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:66)
> ~[netty-3.7.0.Final.jar:na]
>
> at
> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
> ~[netty-3.7.0.Final.jar:na]
>
> at org.jboss.netty.channel.Channels.write(Channels.java:704)
> ~[netty-3.7.0.Final.jar:na]
>
> at org.jboss.netty.channel.Channels.write(Channels.java:671)
> ~[netty-3.7.0.Final.jar:na]
>
> at org.jboss.netty.channel.Ab
>
> --
> http://about.me/BrianTarbox
>


Re: Column family ID mismatch-Error on concurrent schema modifications

2014-11-27 Thread DuyHai Doan
Hello Peter

For safe concurrent table creation, use CREATE TABLE xxx IF NOT EXISTS. It
will use light weight transaction and you'll have to pay some penalty in
term of performance but at least the table creation will be linearizable
Le 27 nov. 2014 14:26, "Peter Lange"  a écrit :

> Hi,
>
> We use a four-node Cassandra-Cluster in Version 2.1.2. Our
> Client-Applications creates Tables dynamically. At one point two (or more)
> of our Clients connected to two (or more) different Cassandra-Nodes will
> create the same table simultaneously. We get the "Column family ID
> mismatch"-Error-Messages on every node. Why is this simultanous schema
> modification not possible? How can we handle this? Every Help is
> appreciated.
>
> The lengthy Error-Messages from two nodes follows:
>
> On Node1 we got:
>
> INFO  [SharedPool-Worker-2] 2014-11-26 13:37:28,987
> MigrationManager.java:248 - Create new ColumnFamily:
> org.apache.cassandra.config.CFMetaData@7edad3a3[cfId=
> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_
> table,
> INFO  [MigrationStage:1] 2014-11-26 13:37:29,607 DefsTables.java:373 -
> Loading org.apache.cassandra.config.CFMetaData@7adc8efd[cfId=
> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_
> table,
> INFO  [MigrationStage:1] 2014-11-26 13:37:29,629
> ColumnFamilyStore.java:284 - Initializing myplayground.test_table
> ERROR [MigrationStage:1] 2014-11-26 13:37:30,282 CassandraDaemon.java:153
> - Exception in thread Thread[MigrationStage:1,5,main]
> java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ConfigurationException:
> Column family ID mismatch (
> found fbd275c0-7568-11e4-b9ea-3934eddce895; expected
> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
> at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1171)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.db.DefsTables.updateColumnFamily(DefsTables.java:422)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:295)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at org.apache.cassandra.db.DefinitionsUpdateVerbHandler$
> 1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_25]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_25]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_25]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_25]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
> Caused by: org.apache.cassandra.exceptions.ConfigurationException: Column
> family ID mismatch (found fbd275c0-7568-11e4-b9ea-3934eddce895; expected
> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
> at 
> org.apache.cassandra.config.CFMetaData.validateCompatility(CFMetaData.java:1254)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:1186)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1167)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> ... 11 common frames omitted
>
> On Node2 we got:
>
> INFO  [SharedPool-Worker-1] 2014-11-26 13:37:28,989
> MigrationManager.java:248 - Create new ColumnFamily:
> org.apache.cassandra.config.CFMetaData@16d0bc0d[cfId=
> fbd275c0-7568-11e4-b9ea-3934eddce895,ksName=myplayground,cfName=test_
> table,
> INFO  [MigrationStage:1] 2014-11-26 13:37:29,539 DefsTables.java:373 -
> Loading org.apache.cassandra.config.CFMetaData@3777e24b[cfId=
> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_
> table,
> INFO  [MigrationStage:1] 2014-11-26 13:37:29,541
> ColumnFamilyStore.java:284 - Initializing myplayground.test_table
> ERROR [SharedPool-Worker-1] 2014-11-26 13:37:29,984 QueryMessage.java:130
> - Unexpected error during query
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ConfigurationException:
> Column family ID mismatch (
> found fbd275c0-7568-11e4-b9ea-3934eddce895; expected
> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
> at 
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:397)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:374)
> ~[apache-cassandra-2.1.1.jar:2.1.1]
>  

Column family ID mismatch-Error on concurrent schema modifications

2014-11-27 Thread Peter Lange

Hi,

We use a four-node Cassandra-Cluster in Version 2.1.2. Our  
Client-Applications creates Tables dynamically. At one point two (or  
more) of our Clients connected to two (or more) different  
Cassandra-Nodes will create the same table simultaneously. We get the  
"Column family ID mismatch"-Error-Messages on every node. Why is this  
simultanous schema modification not possible? How can we handle this?  
Every Help is appreciated.


The lengthy Error-Messages from two nodes follows:

On Node1 we got:

INFO  [SharedPool-Worker-2] 2014-11-26 13:37:28,987  
MigrationManager.java:248 - Create new ColumnFamily:  
org.apache.cassandra.config.CFMetaData@7edad3a3[cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,607 DefsTables.java:373 -  
Loading  
org.apache.cassandra.config.CFMetaData@7adc8efd[cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,629  
ColumnFamilyStore.java:284 - Initializing myplayground.test_table
ERROR [MigrationStage:1] 2014-11-26 13:37:30,282  
CassandraDaemon.java:153 - Exception in thread  
Thread[MigrationStage:1,5,main]
java.lang.RuntimeException:  
org.apache.cassandra.exceptions.ConfigurationException: Column family  
ID mismatch (
found fbd275c0-7568-11e4-b9ea-3934eddce895; expected  
fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
at  
org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1171)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.db.DefsTables.updateColumnFamily(DefsTables.java:422)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:295)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)  
~[na:1.8.0_25]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)  
~[na:1.8.0_25]
at  
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)  
~[na:1.8.0_25]
at  
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)  
[na:1.8.0_25]

at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
Caused by: org.apache.cassandra.exceptions.ConfigurationException:  
Column family ID mismatch (found fbd275c0-7568-11e4-b9ea-3934eddce895;  
expected fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
at  
org.apache.cassandra.config.CFMetaData.validateCompatility(CFMetaData.java:1254)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:1186)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1167)  
~[apache-cassandra-2.1.1.jar:2.1.1]

... 11 common frames omitted

On Node2 we got:

INFO  [SharedPool-Worker-1] 2014-11-26 13:37:28,989  
MigrationManager.java:248 - Create new ColumnFamily:  
org.apache.cassandra.config.CFMetaData@16d0bc0d[cfId=fbd275c0-7568-11e4-b9ea-3934eddce895,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,539 DefsTables.java:373 -  
Loading  
org.apache.cassandra.config.CFMetaData@3777e24b[cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,541  
ColumnFamilyStore.java:284 - Initializing myplayground.test_table
ERROR [SharedPool-Worker-1] 2014-11-26 13:37:29,984  
QueryMessage.java:130 - Unexpected error during query
java.lang.RuntimeException: java.util.concurrent.ExecutionException:  
java.lang.RuntimeException:  
org.apache.cassandra.exceptions.ConfigurationException: Column family  
ID mismatch (
found fbd275c0-7568-11e4-b9ea-3934eddce895; expected  
fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
at  
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:397)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:374)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.service.MigrationManager.announceNewColumnFamily(MigrationManager.java:249)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.cql3.statements.CreateTableStatement.announceMigration(CreateTableStatement.java:114)  
~[apache-cassandra-2.1.1.jar:2.1.1]
at  
org.apache.cassandra.cql3.statements.SchemaAlteringStatement.exe

Re: Cassandra COPY to CSV and DateTieredCompactionStrategy

2014-11-27 Thread Paulo Ricardo Motta Gomes
Regarding the first question you need to configure your application to
write to both CFs (old and new) during the migration phase.

I'm not sure about the second question, but my guess is that only the
writeTime will be taken into account.

On Thu, Nov 27, 2014 at 10:54 AM, Batranut Bogdan 
wrote:

> Hello all,
>
> I have a few things that I need to understand.
>
> 1 . Here is the scenario:
> we have a HUGE cf where there are daily writes it is like a time series.
> Now we want to change the type of a column in primary key. What I think we
> can do is to export to csv, create the new table and write back the
> transformed data. But here is the catch... the constant writes in the cf. I
> assume that by the time the export finishes, new data will be inserted in
> the source cf. So is there a tool that will export data without having to
> stop the writes?
>
> 2. I have seen that there is a new compaction strategy: DTCS, that will
> better fit historical data. This compaction strategy will take into account
> writeTime() of an entry or will it be smart enough and detect that the
> column family is a time series and take into account those timestamps when
> creating the time windows? I am asking this since when we write to the cf,
> the time for a particular record is 00:00h of a given day, so basicaly all
> entries have the same timestamp value in the cf but of course different
> writeTime() .
>



-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br *
+55 48 3232.3200


Cassandra COPY to CSV and DateTieredCompactionStrategy

2014-11-27 Thread Batranut Bogdan
Hello all,
I have a few things that I need to understand.
1 . Here is the scenario: we have a HUGE cf where there are daily writes it is 
like a time series. Now we want to change the type of a column in primary key. 
What I think we can do is to export to csv, create the new table and write back 
the transformed data. But here is the catch... the constant writes in the cf. I 
assume that by the time the export finishes, new data will be inserted in the 
source cf. So is there a tool that will export data without having to stop the 
writes? 
2. I have seen that there is a new compaction strategy: DTCS, that will better 
fit historical data. This compaction strategy will take into account 
writeTime() of an entry or will it be smart enough and detect that the column 
family is a time series and take into account those timestamps when creating 
the time windows? I am asking this since when we write to the cf, the time for 
a particular record is 00:00h of a given day, so basicaly all entries have the 
same timestamp value in the cf but of course different writeTime() .

Storing time-series and geospatial data in C*

2014-11-27 Thread Spico Florin
Hello!
  Can you please recommend me some new articles and case studies were
Cassandra was used to store time-series and geo-spatial data? I'm
particular interested in best practices, data models and retrieval
techniques.
 Thanks.
 Regards,
 Florin


RE: A question to adding a new data center

2014-11-27 Thread Lu, Boying
If the node-to-node encryption is enabled among all current connected DCs, how 
to add a new DC in this case?

After adding the new DC’s public key into the trust store file, are the current 
connected DCs needed to be restart?

Thanks

Boying


From: Mark Reddy [mailto:mark.l.re...@gmail.com]
Sent: 2014年11月21日 18:07
To: user@cassandra.apache.org
Subject: Re: A questiion to adding a new data center

Hi Boying,

I'm not sure I fully understand your question here, so some clarification may 
be needed. However, if you are asking what steps need to be performed on the 
current datacenter or on the new datacenter:

Step 1 - Current DC
Step 2 - New DC
Step 3 - Depending on the snitch you may need to make changes on both the 
current and new DCs
Step 4 - Client config
Step 5 - Client config
Step 6 - New DC
Step 7 - New DC
Step 8 - New DC


Mark

On 21 November 2014 03:27, Lu, Boying 
mailto:boying...@emc.com>> wrote:
Hi, all,

I read the document about how to adding a new data center to existing clusters 
posted at 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
But I have a question: Are all those steps executed only at the new adding 
cluster or on  existing clusters also? ( Step 7 is to be executed on the new 
cluster according to the document).

Thanks

Boying




Re: Repair completes successfully but data is still inconsistent

2014-11-27 Thread André Cruz
On 26 Nov 2014, at 19:07, Robert Coli  wrote:
> 
> Yes. Do you know if 5748 was created as a result of compaction or via a flush 
> from a memtable?

It was the result of a compaction:

 INFO [CompactionExecutor:22422] 2014-11-13 13:08:41,926 CompactionTask.java 
(line 262) Compacted 2 sstables to 
[/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5744,].
  10,732,071 bytes to 10,824,798 (~100% of original) in 2,950ms = 3.499435MB/s. 
 36,632 total rows, 35,545 unique.  Row merge counts were {1:34460, 2:1086, }
 INFO [CompactionExecutor:22528] 2014-11-13 14:55:09,944 CompactionTask.java 
(line 262) Compacted 2 sstables to 
[/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5746,].
  15,790,121 bytes to 14,913,599 (~94% of original) in 3,491ms = 4.074110MB/s.  
46,167 total rows, 43,745 unique.  Row merge counts were {1:41323, 2:2422, }
 INFO [CompactionExecutor:22590] 2014-11-13 15:26:50,087 CompactionTask.java 
(line 262) Compacted 2 sstables to 
[/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5748,].
  16,088,983 bytes to 16,110,649 (~100% of original) in 2,741ms = 5.605367MB/s. 
 48,332 total rows, 47,392 unique.  Row merge counts were {1:46452, 2:940, } 
<---
 INFO [CompactionExecutor:22718] 2014-11-13 18:05:36,326 CompactionTask.java 
(line 262) Compacted 2 sstables to 
[/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5750,].
  21,508,530 bytes to 21,342,786 (~99% of original) in 3,461ms = 5.880979MB/s.  
66,361 total rows, 63,219 unique.  Row merge counts were {1:60077, 2:3142, }
 INFO [CompactionExecutor:22817] 2014-11-13 19:06:04,564 CompactionTask.java 
(line 262) Compacted 2 sstables to 
[/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5752,].
  23,232,660 bytes to 23,087,822 (~99% of original) in 3,144ms = 7.003264MB/s.  
68,968 total rows, 67,602 unique.  Row merge counts were {1:66236, 2:1366, }


Regarding flushes I have:

INFO [FlushWriter:3079] 2014-11-13 13:08:38,972 Memtable.java (line 436) 
Completed flushing 
/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5743-Data.db
 (4698473 bytes) for commitlog position ReplayPosition(segmentId=1413900469571, 
position=6240)
 INFO [FlushWriter:3093] 2014-11-13 14:55:06,436 Memtable.java (line 436) 
Completed flushing 
/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5745-Data.db
 (4965323 bytes) for commitlog position ReplayPosition(segmentId=1413900469603, 
position=2518)
 INFO [FlushWriter:3101] 2014-11-13 15:26:47,336 Memtable.java (line 436) 
Completed flushing 
/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5747-Data.db
 (1175384 bytes) for commitlog position ReplayPosition(segmentId=1413900469635, 
position=9984)
 INFO [FlushWriter:3121] 2014-11-13 18:05:32,853 Memtable.java (line 436) 
Completed flushing 
/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5749-Data.db
 (5397881 bytes) for commitlog position ReplayPosition(segmentId=1413900469667, 
position=8533)
 INFO [FlushWriter:3134] 2014-11-13 19:06:01,416 Memtable.java (line 436) 
Completed flushing 
/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5751-Data.db
 (1889874 bytes) for commitlog position ReplayPosition(segmentId=1413900469699, 
position=108)
 INFO [FlushWriter:3147] 2014-11-13 21:20:58,312 Memtable.java (line 436) 
Completed flushing 
/servers/storage/cassandra-data/Disco/NamespaceFile2/Disco-NamespaceFile2-ic-5753-Data.db
 (3283519 bytes) for commitlog position ReplayPosition(segmentId=1413900469731, 
position=6848)


André

Re: Cassandra backup via snapshots in production

2014-11-27 Thread Jens Rantil
Late answer; You can find my backup script here: 
https://gist.github.com/JensRantil/a8150e998250edfcd1a3


Basically you need to set S3_BUCKET, PGP_KEY_RECIPIENT, configure s3cmd (using 
s3cmd --configure) and then issue `./backup-keyspace.sh your-keyspace` to 
backup it to S3. We run the script is run periodically on every node.




Regarding “s3cmd --configure”, I executed it once and then copied “~/.s3cfg” to 
all nodes.




Like I said, there’s lots of love that can be put into a backup system. Note 
that the script has the following limitations:

 * It does not checksum the files. However s3cmd website states that it by 
default compares MD5 and file size on upload.

 * It does not do purging of files on S3 (which you could configure using 
“Object Lifecycles”).

 * It does not warn you that a backup fails. Check your logs periodically.

 * It does not do any advanced logging. Make sure to pipe the output to a file 
or the `syslog` utility.

 * It does not do continuous/point-in-time backup.




That said, it does its job for us for now.




Feel free to propose improvements!




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Fri, Nov 21, 2014 at 7:36 PM, William Arbaugh  wrote:

> Jens,
> I'd be interested in seeing your script. We've been thinking of doing exactly 
> that but uploading to Glacier instead.
> Thanks, Bill
>> On Nov 21, 2014, at 11:40 AM, Jens Rantil  wrote:
>> 
>> > The main purpose is to protect us from human errors (eg. unexpected 
>> > manipulations: delete, drop tables, …).
>> 
>> If that is the main purpose, having "auto_snapshot: true” in cassandra.yaml 
>> will be enough to protect you.
>> 
>> Regarding backup, I have a small script that creates a named snapshot and 
>> for each sstable; encrypts, uploads to S3 and deletes the snapshotted 
>> sstable. It took me an hour to write and roll out to all our nodes. The 
>> whole process is currently logged, but eventually I will also send an e-mail 
>> if backup fails.
>> 
>> ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: 
>> +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
>> 
>> 
>> On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO  
>> wrote:
>> 
>> Hello all,
>> 
>> 
>> 
>> 
>>  
>> 
>> We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 
>> nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
>> 
>> 
>> 
>> The main purpose is to protect us from human errors (eg. unexpected 
>> manipulations: delete, drop tables, …).
>> 
>> 
>> 
>> 
>>  
>> 
>> We are thinking of:
>> 
>> 
>> 
>> -  Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.
>> 
>> 
>> 
>> -  Restore: load the most recent snapshots or latest “non-corrupted” 
>> ones and replay missing data imports from other data source.
>> 
>> 
>> 
>> 
>>  
>> 
>> We would like to know if somebody are using Cassandra’s backup feature in 
>> production and could share your experience with us.
>> 
>> 
>> 
>> 
>>  
>> 
>> Your help would be greatly appreciated.
>> 
>> 
>> 
>> Best regards,
>> 
>> 
>> 
>> Minh
>> 
>> 
>> 
>> 
>> This message and any attachments (the "message") is
>> intended solely for the intended addressees and is confidential. 
>> If you receive this message in error,or are not the intended recipient(s), 
>> please delete it and any copies from your systems and immediately notify
>> the sender. Any unauthorized view, use that does not comply with its 
>> purpose, 
>> dissemination or disclosure, either whole or partial, is prohibited. Since 
>> the internet 
>> cannot guarantee the integrity of this message which may not be reliable, 
>> BNP PARIBAS 
>> (and its subsidiaries) shall not be liable for the message if modified, 
>> changed or falsified. 
>> Do not print this message unless it is necessary,consider the environment.
>> 
>> --
>> 
>> Ce message et toutes les pieces jointes (ci-apres le "message") 
>> sont etablis a l'intention exclusive de ses destinataires et sont 
>> confidentiels.
>> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
>> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
>> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation 
>> de 
>> ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
>> publication, totale ou partielle, est interdite. L'Internet ne permettant 
>> pas d'assurer
>> l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
>> (et ses filiales) decline(nt) toute responsabilite au titre de ce message 
>> dans l'hypothese
>> ou il aurait ete modifie, deforme ou falsifie. 
>> N'imprimez ce message que si necessaire, pensez a l'environnement.
>> 
>> 

RE: Cassandra backup via snapshots in production

2014-11-27 Thread Ngoc Minh VO
Thanks a lot for your answers!

What we plan to do is:

-  auto_snapshot = true

-  if the human errors happened on D-5:

o   we will bring the cluster offline

o   purge all data

o   import snapshots prior D-5 (and delete snapshots after D-5)

o   upload all missing data between D-5 and D

o   bring the cluster online

Do you think it would work?

From: Jens Rantil [mailto:jens.ran...@tink.se]
Sent: mardi 25 novembre 2014 10:03
To: user@cassandra.apache.org
Subject: Re: Cassandra backup via snapshots in production

> Truncate does trigger snapshot creation though

Doesn’t it? With “auto_snapshot: true” it should.

——— Jens Rantil Backend engineer Tink AB Email: 
jens.ran...@tink.se Phone: +46 708 84 18 32 Web: 
www.tink.se Facebook Linkedin Twitter


On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan 
mailto:doanduy...@gmail.com>> wrote:

True

Delete in CQL just create tombstone so from the storage engine pov it's just 
adding some physical columns

Truncate does trigger snapshot creation though
Le 21 nov. 2014 19:29, "Robert Coli" 
mailto:rc...@eventbrite.com>> a écrit :
On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil 
mailto:jens.ran...@tink.se>> wrote:
> The main purpose is to protect us from human errors (eg. unexpected 
> manipulations: delete, drop tables, …).

If that is the main purpose, having "auto_snapshot: true” in cassandra.yaml 
will be enough to protect you.

OP includes "delete" in their list of "unexpected manipulations", and 
auto_snapshot: true will not protect you in any way from DELETE.

=Rob
http://twitter.com/rcolidba



This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message") 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.


Re: Data synchronization between 2 running clusters on different availability zone

2014-11-27 Thread Spico Florin
Hello!
  I have another question. What about the following scenario: two Cassandra
instances installed on different cloud providers (EC2, Flexiant)? How do
you synchronize them? Can you use some internal tools or do I have to
implement my own mechanism?
Thanks.
 Florin


On Thu, Nov 27, 2014 at 11:18 AM, Spico Florin 
wrote:

> Hello, Rob!
>   Thank you very much for the detailed support.
> Regards,
>  Florin
>
> On Wed, Nov 26, 2014 at 12:41 AM, Robert Coli 
> wrote:
>
>> On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin 
>> wrote:
>>
>>> 1. For ensuring high availability I would like to install one Cassandra
>>> cluster on one availability zone
>>> (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon
>>> EC2 US-west).
>>>
>>
>> One cluster, replication factor of 2, cluster configured with a rack
>> aware snitch is how this is usually done. Well, more accurately, people
>> usually deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is
>> also required to use QUORUM Consistency Level.
>>
>> If you will always operate only out of EC2, you probably want to look
>> into the EC2Snitch. If you plan to ultimately go multi-region,
>> EC2MultiRegionSnitch. If maybe hybrid in the future,
>> GossipingPropertyFileSnitch.
>>
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html
>>
>> For some good meta on the internals here :
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-3810
>>
>> =Rob
>> http://twitter.com/rcolidba
>>
>>
>
>


Re: Data synchronization between 2 running clusters on different availability zone

2014-11-27 Thread Spico Florin
Hello, Rob!
  Thank you very much for the detailed support.
Regards,
 Florin

On Wed, Nov 26, 2014 at 12:41 AM, Robert Coli  wrote:

> On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin 
> wrote:
>
>> 1. For ensuring high availability I would like to install one Cassandra
>> cluster on one availability zone
>> (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2
>> US-west).
>>
>
> One cluster, replication factor of 2, cluster configured with a rack aware
> snitch is how this is usually done. Well, more accurately, people usually
> deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is also
> required to use QUORUM Consistency Level.
>
> If you will always operate only out of EC2, you probably want to look into
> the EC2Snitch. If you plan to ultimately go multi-region,
> EC2MultiRegionSnitch. If maybe hybrid in the future,
> GossipingPropertyFileSnitch.
>
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html
>
> For some good meta on the internals here :
>
> https://issues.apache.org/jira/browse/CASSANDRA-3810
>
> =Rob
> http://twitter.com/rcolidba
>
>