RE: Migrating Cassandra to New Nodes

2014-04-29 Thread Arindam Barua

What you have described below should work just fine.
When I was replacing nodes in my ring, I ended up creating a new datacenter 
with the new nodes, but I was upgrading to vnodes too at the time.

-Arindam

From: nash [mailto:nas...@gmail.com]
Sent: Monday, April 28, 2014 10:52 PM
To: user@cassandra.apache.org
Subject: Migrating Cassandra to New Nodes

I have a new set of nodes and I'd like to migrate my entire cluster onto them 
without any downtime. I believe that I can launch the new cluster and have them 
join the ring and then use nodetool to decommission the old nodes one at a 
time. But, I'm wondering what is the safest way to update the seeds in the 
cassandra.yaml files? AFAICT, there is nothing particularly special about the 
choices of seeds? So, prior to starting decom, I was figuring I could update 
all the seeds to some subset of the new cluster. Is that reliable?

TIA,

--nash


Re: Load balancing issue with virtual nodes

2014-04-29 Thread DuyHai Doan
Thanks you Ben for the links




On Tue, Apr 29, 2014 at 3:40 AM, Ben Bromhead b...@instaclustr.com wrote:

 Some imbalance is expected and considered normal:

 See http://wiki.apache.org/cassandra/VirtualNodes/Balance

 As well as

 https://issues.apache.org/jira/browse/CASSANDRA-7032

 Ben Bromhead
 Instaclustr | www.instaclustr.com | 
 @instaclustrhttp://twitter.com/instaclustr |
 +61 415 936 359

 On 29 Apr 2014, at 7:30 am, DuyHai Doan doanduy...@gmail.com wrote:

 Hello all

  Some update about the issue.

  After wiping completely all sstable/commitlog/saved_caches folder and
 restart the cluster from scratch, we still experience weird figures. After
 the restart, nodetool status does not show an exact balance of 50% of data
 for each node :


 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 -- Address  Load Tokens Owns (effective) Host ID Rack
 UN host1 48.57 KB 256 *51.6%*  d00de0d1-836f-4658-af64-3a12c00f47d6 rack1
 UN host2 48.57 KB 256 *48.4%*  e9d2505b-7ba7-414c-8b17-af3bbe79ed9c rack1


 As you can see, the % is very close to 50% but not exactly 50%

  What can explain that ? Can it be network connection issue during token
 initial shuffle phase ?

 P.S: both host1 and host2 are supposed to have exactly the same hardware

 Regards

  Duy Hai DOAN


 On Thu, Apr 24, 2014 at 11:20 PM, Batranut Bogdan batra...@yahoo.comwrote:

 I don't know about hector but the datastax java driver needs just one ip
 from the cluster and it will discover the rest of the nodes. Then by
 default it will do a round robin when sending requests. So if Hector does
 the same the patterb will againg appear.
 Did you look at the size of the dirs?
 That documentation is for C* 0.8. It's old. But depending on your boxes
 you might reach CPU bottleneck. Might want to google for write path in
 cassandra..  According to that, there is not much to do when writes come
 in...
   On Friday, April 25, 2014 12:00 AM, DuyHai Doan doanduy...@gmail.com
 wrote:
  I did some experiments.

  Let's say we have node1 and node2

 First, I configured Hector with node1  node2 as hosts and I saw that
 only node1 has high CPU load

 To eliminate the client connection issue, I re-test with only node2
 provided as host for Hector. Same pattern. CPU load is above 50% on node1
 and below 10% on node2.

 It means that node2 is playing as coordinator and forward many write/read
 request to node1

  Why did I look at CPU load and not iostat  al ?

  Because I have a very intensive write work load with read-only-once
 pattern. I've read here (
 http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning)
 that heavy write in C* is more CPU bound but maybe the info may be outdated
 and no longer true

  Regards

  Duy Hai DOAN


 On Thu, Apr 24, 2014 at 10:00 PM, Michael Shuler 
 mich...@pbandjelly.orgwrote:

 On 04/24/2014 10:29 AM, DuyHai Doan wrote:

   Client used = Hector 1.1-4
   Default Load Balancing connection policy
   Both nodes addresses are provided to Hector so according to its
 connection policy, the client should switch alternatively between both
 nodes


 OK, so is only one connection being established to one node for one bulk
 write operation? Or are multiple connections being made to both nodes and
 writes performed on both?

 --
 Michael









Migrating from Snappy to LZ4 on C* 1.2

2014-04-29 Thread Katriel Traum
Hello,

I am running mostly Cassandra 1.2 on my clusters, and wanted to migrate my
current Snappy compressed CF's to LZ4.

Changing the schema is easy, my questions are:
1. Will previous, Snappy compressed tables still be readable?
2. Will upgradesstables convert my current CFs from Snappy to LZ4? Or do I
have to run major compaction?

Thanks,
Katriel


Re: JDK 8

2014-04-29 Thread Alain RODRIGUEZ
Looks like it will be like with the version 7... Cassandra has been
compatible with this version for a long time, but there were no official
validations and Datastax recommended during a long time (still now ?) to
use Java 6.

The best thing would be to use older versions. If for some reason you use
Java 8, run some tests and let us know how things goes :).

Good luck with this.


2014-04-29 1:09 GMT+02:00 Colin co...@clark.ws:

 It seems to run ok, but I havent seen it yet in production on 8.

 --
 *Colin Clark*
 +1-320-221-9531


 On Apr 28, 2014, at 4:01 PM, Ackerman, Mitchell 
 mitchell.acker...@pgi.com wrote:

  I've been searching around, but cannot find any information as to
 whether Cassandra runs on JRE 8.  Any information on that?



 Thanks, Mitchell




Re: Migrating from Snappy to LZ4 on C* 1.2

2014-04-29 Thread Alain RODRIGUEZ
Hi, I would say:

1 - Yes
2 - Yes (No major compaction needed, upgradesstables should do the job)

As always in case of doubt, as always, test it. ìn this case you can even
do it using a local machine.

Alain


2014-04-29 9:57 GMT+02:00 Katriel Traum katr...@google.com:

 Hello,

 I am running mostly Cassandra 1.2 on my clusters, and wanted to migrate my
 current Snappy compressed CF's to LZ4.

 Changing the schema is easy, my questions are:
 1. Will previous, Snappy compressed tables still be readable?
 2. Will upgradesstables convert my current CFs from Snappy to LZ4? Or do I
 have to run major compaction?

 Thanks,
  Katriel



Re: Can the seeds list be changed at runtime?

2014-04-29 Thread Mark Reddy
Hi Boying,

From Datastax documentation:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architectureGossipAbout_c.html

 The seed node designation has no purpose other than bootstrapping the
 gossip process for new nodes joining the cluster. Seed nodes are not a
 single point of failure, nor do they have any other special purpose in
 cluster operations beyond the bootstrapping of nodes.


For this reason you can change the seed list on existing node at any time,
as the node itself will already be aware of the cluster and would not need
to rely on the seed list to join. For new nodes that you want to bootstrap
into the cluster you can specify any nodes you wish.


Mark


On Tue, Apr 29, 2014 at 2:57 AM, Lu, Boying boying...@emc.com wrote:

 Hi, All,



 I wonder if I can change the seeds list at runtime. i.e. without change
 the yaml file and restart DB service?



 Thanks



 Boying





Re: JDK 8

2014-04-29 Thread Mark Reddy

 Datastax recommended during a long time (still now ?) to use Java 6


Java 6 is recommended for version 1.2
Java 7 is required for version 2.0


Mark


On Tue, Apr 29, 2014 at 10:19 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Looks like it will be like with the version 7... Cassandra has been
 compatible with this version for a long time, but there were no official
 validations and Datastax recommended during a long time (still now ?) to
 use Java 6.

 The best thing would be to use older versions. If for some reason you use
 Java 8, run some tests and let us know how things goes :).

 Good luck with this.


 2014-04-29 1:09 GMT+02:00 Colin co...@clark.ws:

 It seems to run ok, but I havent seen it yet in production on 8.

 --
 *Colin Clark*
 +1-320-221-9531


 On Apr 28, 2014, at 4:01 PM, Ackerman, Mitchell 
 mitchell.acker...@pgi.com wrote:

  I’ve been searching around, but cannot find any information as to
 whether Cassandra runs on JRE 8.  Any information on that?



 Thanks, Mitchell





Re: JDK 8

2014-04-29 Thread Alain RODRIGUEZ
Thanks for the upgrade Mark.


2014-04-29 11:35 GMT+02:00 Mark Reddy mark.re...@boxever.com:

 Datastax recommended during a long time (still now ?) to use Java 6


 Java 6 is recommended for version 1.2
 Java 7 is required for version 2.0


 Mark


 On Tue, Apr 29, 2014 at 10:19 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Looks like it will be like with the version 7... Cassandra has been
 compatible with this version for a long time, but there were no official
 validations and Datastax recommended during a long time (still now ?) to
 use Java 6.

 The best thing would be to use older versions. If for some reason you use
 Java 8, run some tests and let us know how things goes :).

 Good luck with this.


 2014-04-29 1:09 GMT+02:00 Colin co...@clark.ws:

 It seems to run ok, but I havent seen it yet in production on 8.

 --
 *Colin Clark*
 +1-320-221-9531


 On Apr 28, 2014, at 4:01 PM, Ackerman, Mitchell 
 mitchell.acker...@pgi.com wrote:

  I've been searching around, but cannot find any information as to
 whether Cassandra runs on JRE 8.  Any information on that?



 Thanks, Mitchell






Re: row caching for frequently updated column

2014-04-29 Thread Jonathan Lacefield
Hello,


  Iirc writing a new value to a row will invalidate the row cache for that
value.  Row cache is only populated after a read operation.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_configuring_caches_c.html?scroll=concept_ds_n35_nnr_ck

  Cassandra provides the ability to preheat key and page cache, but I
don't believe this is possible for row cache.

  Hope that helps.

Jonathan


Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
http://www.linkedin.com/in/jlacefield

http://www.datastax.com/cassandrasummit14



On Mon, Apr 28, 2014 at 10:27 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 I am wondering if there is any negative impact on Cassandra write
 operation, if I turn on row caching for a table that has mostly 'static
 columns' but few frequently write columns (like timestamp).

 The application will frequently write to a few columns, and the
 application will also frequently query entire row.

 How Cassandra handle update column to a cached row?
 does it update both memtables value and also the row cached row's
 column(which dealing with memory update so it is very fast) ?
 or in order to update the cached row, entire row need to read back from
 sstable?


 thanks




Re: JDK 8

2014-04-29 Thread Cyril Scetbon
Hi,

When we look at the wiki it's said :

Cassandra requires the most stable version of Java 7 you can deploy, preferably 
the Oracle/Sun JVM.

And in chapter 4 we see that they are using Cassandra 1.2
Connected to Test Cluster at localhost:9160.
[cqlsh 2.3.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.
In DataStax documentation concernning the installation on Debian they say :
Install the latest version of Oracle Java SE Runtime Environment (JRE) 6 or 7. 
See Installing Oracle JRE on Debian or Ubuntu Systems.
The fact that public updates stopped for java 6 since February 2013 should help 
to choose between those 2 versioins :)

FYI, we choose to use Java 7 for 2 years and are happy with that in production !

Regards 
-- 
Cyril SCETBON

On 29 Apr 2014, at 11:35, Mark Reddy mark.re...@boxever.com wrote:

 Datastax recommended during a long time (still now ?) to use Java 6
 
 Java 6 is recommended for version 1.2 
 Java 7 is required for version 2.0



Re: Migrating from Snappy to LZ4 on C* 1.2

2014-04-29 Thread Katriel Traum
Thanks for the answer.

I've tested it by myself now, and indeed it works.
Only note I have is that you have to run nodetool upgradesstables -a, so
all sstables are updated.

Katriel


On Tue, Apr 29, 2014 at 12:22 PM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi, I would say:

 1 - Yes
 2 - Yes (No major compaction needed, upgradesstables should do the job)

 As always in case of doubt, as always, test it. ìn this case you can even
 do it using a local machine.

 Alain


 2014-04-29 9:57 GMT+02:00 Katriel Traum katr...@google.com:

 Hello,

 I am running mostly Cassandra 1.2 on my clusters, and wanted to migrate
 my current Snappy compressed CF's to LZ4.

 Changing the schema is easy, my questions are:
 1. Will previous, Snappy compressed tables still be readable?
 2. Will upgradesstables convert my current CFs from Snappy to LZ4? Or do
 I have to run major compaction?

 Thanks,
  Katriel





Re: Point in Time Recovery

2014-04-29 Thread Dennis Schwan
Hi Rob,

I know it has been a while but we managed to perform a point-in-time recovery.
I am not really sure what the problem was but I guess it has to do with not 
reading exactly (use GMT and not local time zone, copying archivelogs to the 
wrong place, etc.).

So everything should work as described but I think there should be a little 
more automation in it.

Thanks all,
Dennis

Am 11.04.2014 21:11, schrieb Robert Coli:
On Fri, Apr 11, 2014 at 1:21 AM, Dennis Schwan 
dennis.sch...@1und1.demailto:dennis.sch...@1und1.de wrote:
The archived commitlogs are copied to the restore directory and afterwards 
cassandra is replaying those commitlogs but still we only see the data from the 
snapshot, not the commitlogs.

If you turn up debug log4j settings, you should be able to see whether the 
replay is correctly applying mutations to memtables.

Do you see a flush of memtables to sstables at the end of commitlog replay? If 
not, memtables are not being created by commitlog replay.

=Rob



--
Dennis Schwan

Oracle DBA
Mail Core

11 Internet AG | Brauerstraße 48 | 76135 Karlsruhe | Germany
Phone: +49 721 91374-8738
E-Mail: dennis.sch...@1und1.demailto:dennis.sch...@1und1.de | Web: 
www.1und1.dehttp://www.1und1.de

Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484

Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas 
Hofmann, Markus Huhn, Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen, Christian 
Würst
Aufsichtsratsvorsitzender: Michael Scheeren

Member of United Internet

Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen 
enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail 
irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten 
Sie diese Email. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt, 
diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise 
auch immer zu verwenden.

This E-Mail may contain confidential and/or privileged information. If you are 
not the intended recipient of this E-Mail, you are hereby notified that saving, 
distribution or use of the content of this E-Mail in any way is prohibited. If 
you have received this E-Mail in error, please notify the sender and delete the 
E-Mail.


RE: JDK 8

2014-04-29 Thread Ackerman, Mitchell
Thanks everyone

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Sent: Tuesday, April 29, 2014 3:47 AM
To: user@cassandra.apache.org
Subject: Re: JDK 8

Thanks for the upgrade Mark.

2014-04-29 11:35 GMT+02:00 Mark Reddy 
mark.re...@boxever.commailto:mark.re...@boxever.com:
Datastax recommended during a long time (still now ?) to use Java 6

Java 6 is recommended for version 1.2
Java 7 is required for version 2.0


Mark

On Tue, Apr 29, 2014 at 10:19 AM, Alain RODRIGUEZ 
arodr...@gmail.commailto:arodr...@gmail.com wrote:
Looks like it will be like with the version 7... Cassandra has been compatible 
with this version for a long time, but there were no official validations and 
Datastax recommended during a long time (still now ?) to use Java 6.

The best thing would be to use older versions. If for some reason you use Java 
8, run some tests and let us know how things goes :).

Good luck with this.

2014-04-29 1:09 GMT+02:00 Colin co...@clark.wsmailto:co...@clark.ws:

It seems to run ok, but I havent seen it yet in production on 8.

--
Colin Clark
+1-320-221-9531tel:%2B1-320-221-9531


On Apr 28, 2014, at 4:01 PM, Ackerman, Mitchell 
mitchell.acker...@pgi.commailto:mitchell.acker...@pgi.com wrote:
I've been searching around, but cannot find any information as to whether 
Cassandra runs on JRE 8.  Any information on that?

Thanks, Mitchell





Re: row caching for frequently updated column

2014-04-29 Thread Jimmy Lin
hi,
 writing a new value to a row will invalidate the row cache for that
value
do you mean the entire row will be invalidate ? or just the column it was
being updated ?

I was reading through
http://planetcassandra.org/blog/post/cassandra-11-tuning-for-frequent-column-updates/
that seems to indicate it just write through it and not invalidate the
entire row.

if Cassandra invalidate the  row cache upon a single column update to that
row, that seems very inefficient.





On Tue, Apr 29, 2014 at 4:43 AM, Jonathan Lacefield jlacefi...@datastax.com
 wrote:

 Hello,


   Iirc writing a new value to a row will invalidate the row cache for that
 value.  Row cache is only populated after a read operation.
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_configuring_caches_c.html?scroll=concept_ds_n35_nnr_ck

   Cassandra provides the ability to preheat key and page cache, but I
 don't believe this is possible for row cache.

   Hope that helps.

 Jonathan


 Jonathan Lacefield
 Solutions Architect, DataStax
 (404) 822 3487
 http://www.linkedin.com/in/jlacefield

 http://www.datastax.com/cassandrasummit14



 On Mon, Apr 28, 2014 at 10:27 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 I am wondering if there is any negative impact on Cassandra write
 operation, if I turn on row caching for a table that has mostly 'static
 columns' but few frequently write columns (like timestamp).

 The application will frequently write to a few columns, and the
 application will also frequently query entire row.

 How Cassandra handle update column to a cached row?
 does it update both memtables value and also the row cached row's
 column(which dealing with memory update so it is very fast) ?
 or in order to update the cached row, entire row need to read back from
 sstable?


 thanks





Re: row caching for frequently updated column

2014-04-29 Thread Nate McCall


 if Cassandra invalidate the  row cache upon a single column update to that
 row, that seems very inefficient.



Yes. For the most recent direction, take a look at:
https://issues.apache.org/jira/browse/CASSANDRA-5357




-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder  Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: row caching for frequently updated column

2014-04-29 Thread Robert Coli
On Tue, Apr 29, 2014 at 9:30 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 if Cassandra invalidate the  row cache upon a single column update to that
 row, that seems very inefficient.


https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634

=Rob


Re: Migrating Cassandra to New Nodes

2014-04-29 Thread Robert Coli
On Mon, Apr 28, 2014 at 10:52 PM, nash nas...@gmail.com wrote:

 I have a new set of nodes and I'd like to migrate my entire cluster onto
 them without any downtime. I believe that I can launch the new cluster and
 have them join the ring and then use nodetool to decommission the old nodes
 one at a time. But, I'm wondering what is the safest way to update the
 seeds in the cassandra.yaml files? AFAICT, there is nothing particularly
 special about the choices of seeds? So, prior to starting decom, I was
 figuring I could update all the seeds to some subset of the new cluster. Is
 that reliable?


The fastest way to vertically scale a node is :

https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

As a minor note, you do lose any hints destined for that node while you are
doing the copy, so use pre-copy techniques (rsync, then re-rsync with
--delete) and then immediately repair to shorten the window of
inconsistency if you read at CL.ONE.

=Rob


Re: Point in Time Recovery

2014-04-29 Thread Robert Coli
On Tue, Apr 29, 2014 at 7:46 AM, Dennis Schwan dennis.sch...@1und1.dewrote:

  I know it has been a while but we managed to perform a point-in-time
 recovery.
 I am not really sure what the problem was but I guess it has to do with
 not reading exactly (use GMT and not local time zone, copying archivelogs
 to the wrong place, etc.).


Glad to hear things are working, thank you for sharing your experience back
with the list community. :)

=Rob


Re: Can the seeds list be changed at runtime?

2014-04-29 Thread Robert Coli
On Mon, Apr 28, 2014 at 6:57 PM, Lu, Boying boying...@emc.com wrote:

 I wonder if I can change the seeds list at runtime. i.e. without change
 the yaml file and restart DB service?


There are dynamic seed providers, Priam for example uses one.

https://issues.apache.org/jira/browse/CASSANDRA-5836

Is a JIRA about the current confusion of the yaml based seed list and what
it means to be a seed, specifically in the context of bootstrapping.

There is a trivial case that illustrates why seed lists need to be dynamic :

1) 3 node cluster, A B C, RF=1.
2) A is a seed, started first. B starts second, C starts third.
3) A and B fail. C does not fail.
4) A and B now have no seed to bootstrap from. C does not consider itself a
seed in its own seed list.
5) C no longer has a node it gossips to once a gossip round, which is one
of the only other seed related difference.

Of course in practice you can just remove A from its own seed list and put
C in A's bootstrap list and bootstrap it. But really what you should do
is make C the seed in a dynamic seed provider.

Datastax said :


 The seed node designation has no purpose other than bootstrapping the
 gossip process for new nodes joining the cluster. Seed nodes are not a
 single point of failure, nor do they have any other special purpose in
 cluster operations beyond the bootstrapping of nodes.


Seed nodes are also gossiped to once per round, which some might argue
makes them special.

=Rob


Re: Cassandra data retention policy

2014-04-29 Thread Redmumba
Just a heads up--this is only available in the latest version of Cassandra
2.0.6, and is not available in Cassandra 1.2.


On Mon, Apr 28, 2014 at 12:57 PM, Donald Smith 
donald.sm...@audiencescience.com wrote:

  CQL lets you specify a default TTL per column family/table:  and
 default_time_to_live=86400 .



 *From:* Redmumba [mailto:redmu...@gmail.com]
 *Sent:* Monday, April 28, 2014 12:51 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Cassandra data retention policy



 Have you looked into using a TTL?  You can set this per insert
 (unfortunately, it can't be set per CF) and values will be tombstoned after
 that amount of time.  I.e.,

 INSERT INTO  VALUES ... TTL 15552000

 Keep in mind, after the values have expired, they will essentially become
 tombstones--so you will still need to run clean-ups (probably daily) to
 clear up space.

 Does this help?

 One caveat is that this is difficult to apply to existing rows--i.e., you
 can't bulk-update a bunch of rows with this data.  As such, another good
 suggestion is to simply have a secondary index on a date field of some
 kind, and run a bulk remove (and subsequent clean-up) daily/weekly/whatever.



 On Mon, Apr 28, 2014 at 11:31 AM, Han Jia johnideal...@gmail.com wrote:

 Hi guys,





 We have a processing system that just uses the data for the past six
 months in Cassandra. Any suggestions on the best way to manage the old data
 in order to save disk space? We want to keep it as backup but it will not
 be used unless we need to do recovery. Thanks in advance!





 -John





Re: Running hadoop jobs over compressed column familes with datastatx

2014-04-29 Thread marlon hendred
I was able to solve the issue. There was another layer of compression
happening in the DAO that was using java.util.zip.Deflater/Inflater, along
with the snappy compression defined on the CF. The solution was to extend
CassandraStorage and override the getNext() method. The new implementation
calls super.getNext() and inflates the Tuples where appropriate.

-Marlon


On Wed, Apr 23, 2014 at 1:39 PM, marlon hendred mhend...@gmail.com wrote:

 Hi,

 I'm attempting to dump a pig relation of a compressed column family. Its a
 single column whose value is a json blob. It's compressed via snappy
 compression and the value validator is BytesType. After I create the
 relation and dump I get garbage. Here is the describe:

 ColumnFamily: CF
   Key Validation Class: org.apache.cassandra.db.marshal.TimeUUIDType
   Default column value validator:
 org.apache.cassandra.db.marshal.BytesType
   Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
   GC grace seconds: 86400
   Compaction min/max thresholds: 2/32
   Read repair chance: 0.1
   DC Local Read repair chance: 0.0
   Populate IO Cache on flush: false
   Replicate on write: true
   Caching: KEYS_ONLY
   Bloom Filter FP chance: default
   Built indexes: []
   Compaction Strategy:
 org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
   Compression Options:
 sstable_compression:
 org.apache.cassandra.io.compress.SnappyCompressor

 Pig stuff:
 rows = LOAD 'cql://Keyspace/CF' using CqlStorage();

 I've tried to overwrite the schema by adding 'as (key: chararray, col1:
 chararray, value: chararray)' but when I dump this it still looks like its
 binary.

 Do I need to implement my own CqlStorage() here that uncompress or am I
 just missing something? I've done some googling but haven't seen anything
 on the subject.  Also I am using Datastax Enterprise. 3.1. Thanks in
 advance!

 -m



Re: row caching for frequently updated column

2014-04-29 Thread Brian Lam
Are these issues 'resolved' only in 2.0 or later release?

What about 1.2 version?



On Apr 29, 2014, at 9:40 AM, Robert Coli rc...@eventbrite.com wrote:

On Tue, Apr 29, 2014 at 9:30 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 if Cassandra invalidate the  row cache upon a single column update to that
 row, that seems very inefficient.


https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634

=Rob


Re: row caching for frequently updated column

2014-04-29 Thread Robert Coli
On Tue, Apr 29, 2014 at 1:53 PM, Brian Lam y2k...@gmail.com wrote:

 Are these issues 'resolved' only in 2.0 or later release?

 What about 1.2 version?


As I understand it :

1.2 version has the on-heap row cache and off-heap row cache. It does not
have the new partition cache.
2.0 version has only the off-heap row cache. It does not have the on-heap
row cache or the new partition cache.
2.1 version has the new partition cache.

In summary, you probably don't want to use any of these half-baked,
immature internal row/etc. caches unless you are very, very certain that
you have an ideal case for them.

=Rob


Connect Cassandra rings in datacenter and ec2

2014-04-29 Thread Trung Tran
Hi,

We're planning to deploy 3 cassandra rings, one in our datacenter (with
more node/power) and two others in EC2. We don't have enough public IP to
assign for each individual node in our data center, so i wonder how could
we connect the cluster together?

Have any one tried this before, and if this is a good way to deploy
cassandra?

Thanks,
Trung.


[no subject]

2014-04-29 Thread Ebot Tabi
Hi there,
We are working on an API service that receives arbitrary json data, these
data can be nested json data or just normal json data. We started using
Astyanax but we noticed we couldn't use CQL3 to target the arbitrary
columns, in CQL3 those arbitrary columns ain't available. Ad-hoc query are
to be ran against these arbitrary data stored in Cassandra.


-- 
Ebot T.


Re:

2014-04-29 Thread Otávio Gonçalves de Santana
Hi Elder.
Welcome.
We hope help you.


On Tue, Apr 29, 2014 at 9:28 PM, Ebot Tabi ebot.t...@gmail.com wrote:

 Hi there,
 We are working on an API service that receives arbitrary json data, these
 data can be nested json data or just normal json data. We started using
 Astyanax but we noticed we couldn't use CQL3 to target the arbitrary
 columns, in CQL3 those arbitrary columns ain't available. Ad-hoc query are
 to be ran against these arbitrary data stored in Cassandra.


 --
 Ebot T.




-- 
Cheers!.

Otávio Gonçalves de Santana

blog: http://otaviosantana.blogspot.com.br/
twitter: http://twitter.com/otaviojava
site: *http://about.me/otaviojava http://about.me/otaviojava*
55 (11) 98255-3513


Re:

2014-04-29 Thread Ebot Tabi
I am hoping as well to get help on how to handle such scenario, the reason
we choose Cassandra was its performance for heavy writes.


On Wed, Apr 30, 2014 at 12:38 AM, Otávio Gonçalves de Santana 
otaviopolianasant...@gmail.com wrote:

 Hi Elder.
 Welcome.
 We hope help you.


 On Tue, Apr 29, 2014 at 9:28 PM, Ebot Tabi ebot.t...@gmail.com wrote:

 Hi there,
 We are working on an API service that receives arbitrary json data, these
 data can be nested json data or just normal json data. We started using
 Astyanax but we noticed we couldn't use CQL3 to target the arbitrary
 columns, in CQL3 those arbitrary columns ain't available. Ad-hoc query are
 to be ran against these arbitrary data stored in Cassandra.


 --
 Ebot T.




 --
 Cheers!.

 Otávio Gonçalves de Santana

 blog: http://otaviosantana.blogspot.com.br/
 twitter: http://twitter.com/otaviojava
 site: *http://about.me/otaviojava http://about.me/otaviojava*
 55 (11) 98255-3513




-- 
Ebot T.


Re: Connect Cassandra rings in datacenter and ec2

2014-04-29 Thread Ben Bromhead
You will need to have the nodes running on AWS in a VPC. 

You can then configure a VPN to work with your VPC, see 
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html. Also as you 
will have multiple VPN connections (from your private DC and the other AWS 
region) AWS CloudHub will be the way to go 
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPN_CloudHub.html.

Additionally to access your Cassandra instances from your other VPCs you can 
use VPC peering (within the same region). See 
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-peering.html 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 30 Apr 2014, at 11:38 am, Chris Lohfink clohf...@blackbirdit.com wrote:

 Cassandra will require a different address per node though or at least 1 
 unique internal for same DC and 1 unique external for other DCs.  You could 
 look into http://aws.amazon.com/vpc/ or some other vpn solution.
 
 ---
 Chris Lohfink
 
 On Apr 29, 2014, at 6:56 PM, Trung Tran tr...@brightcloud.com wrote:
 
 Hi,
 
 We're planning to deploy 3 cassandra rings, one in our datacenter (with more 
 node/power) and two others in EC2. We don't have enough public IP to assign 
 for each individual node in our data center, so i wonder how could we 
 connect the cluster together? 
 
 Have any one tried this before, and if this is a good way to deploy 
 cassandra?
 
 Thanks,
 Trung.
 



Re: row caching for frequently updated column

2014-04-29 Thread Jimmy Lin
thanks all for the pointers.

let' me see if I can put the sequences of event together 

1.2
people mis-understand/mis-use row cache, that cassandra cached the entire
row of data even if you are only looking for small subset of the row data.
e.g
select single_column from a_wide_row_table
will result in entire row cached even if you are only interested in one
single column of a row.

2.0
and because of potential misuse of heap memory, Cassandra 2.0 remove heap
cache, and only support off-heap cache, which has a side effect that write
will invalidate the row cache(my original question)

2.1
the coming 2.1 Cassandra will offer true cache by query, so the cached data
will be much more efficient even for wide rows(it cached what it needs).

do I get it right?
for the new 2.1 row caching, is it still true that a write or update to the
row will still invalidate the cached row ?




On Tue, Apr 29, 2014 at 3:00 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Apr 29, 2014 at 1:53 PM, Brian Lam y2k...@gmail.com wrote:

 Are these issues 'resolved' only in 2.0 or later release?

 What about 1.2 version?


 As I understand it :

 1.2 version has the on-heap row cache and off-heap row cache. It does not
 have the new partition cache.
 2.0 version has only the off-heap row cache. It does not have the on-heap
 row cache or the new partition cache.
 2.1 version has the new partition cache.

 In summary, you probably don't want to use any of these half-baked,
 immature internal row/etc. caches unless you are very, very certain that
 you have an ideal case for them.

 =Rob



Cassandra Client authentication and system table replication question

2014-04-29 Thread Anand Somani
Hi

We have enabled cassandra client authentication and have set new user/pass
per keyspace. As I understand user/pass is stored in the system table, do
we need to change the replication factor of the system table so this data
is replicated? The cluster is going to be multi-dc.

Thanks
Anand


Re: Cassandra Client authentication and system table replication question

2014-04-29 Thread Anand Somani
Correction credentials are stored in the system_auth table, so it is
ok/recommended to change the replication factor of that keyspace?


On Tue, Apr 29, 2014 at 10:41 PM, Anand Somani meatfor...@gmail.com wrote:

 Hi

 We have enabled cassandra client authentication and have set new user/pass
 per keyspace. As I understand user/pass is stored in the system table, do
 we need to change the replication factor of the system table so this data
 is replicated? The cluster is going to be multi-dc.

 Thanks
 Anand