Re: How to model data to achieve specific data locality

2014-12-09 Thread Kai Wang
Some of the sequences grow so fast that sub-partition is inevitable. I may
need to try different bucket sizes to get the optimal throughput. Thank you
all for the advice.

On Mon, Dec 8, 2014 at 9:55 AM, Eric Stevens migh...@gmail.com wrote:

 The upper bound for the data size of a single column is 2GB, and the upper
 bound for the number of columns in a row (partition) is 2 billion.  So if
 you wanted to create the largest possible row, you probably can't afford
 enough disks to hold it.
 http://wiki.apache.org/cassandra/CassandraLimitations

 Practically speaking you start running into troubles *way* before you
 reach those thresholds though.  Large columns and large numbers of columns
 create GC pressure in your cluster, and since all data for a given row
 reside on the same primary and replicas, this tends to lead to hot
 spotting.  Repair happens for entire rows, so large rows increase the cost
 of repairs, including GC pressure during the repair.  And rows of this size
 are often arrived at by appending to the same row repeatedly, which will
 cause the data for that row to be scattered across a large number of
 SSTables which will hurt read performance. Also depending on your
 interface, you'll find you start hitting limits that you have to increase,
 each with their own implications (eg, maximum thrift message sizes and so
 forth).  The right maximum practical size for a row definitely depends on
 your read and write patterns, as well as your hardware and network.  More
 memory, SSD's, larger SSTables, and faster networks will all raise the
 ceiling for where large rows start to become painful.

 @Kai, if you're familiar with the Thrift paradigm, the partition key
 equates to a Thrift row key, and the clustering key equates to the first
 part of a composite column name.  CQL PRIMARY KEY ((a,b), c, d) equates to
 Thrift where row key is ['a:b'] and all columns begin with ['c:d:'].
 Recommended reading: http://www.datastax.com/dev/blog/thrift-to-cql3

 Whatever your partition key, if you need to sub-partition to maintain
 reasonable row sizes, then the only way to preserve data locality for
 related records is probably to switch to byte ordered partitioner, and
 compute blob or long column as part of your partition key that is meant to
 cause the PK to to map to the same token.  Just be aware that byte ordered
 partitioner comes with a number of caveats, and you'll become responsible
 for maintaining good data load distributions in your cluster. But the
 benefits from being able to tune locality may be worth it.


 On Sun Dec 07 2014 at 3:12:11 PM Jonathan Haddad j...@jonhaddad.com
 wrote:

 I think he mentioned 100MB as the max size - planning for 1mb might make
 your data model difficult to work.

 On Sun Dec 07 2014 at 12:07:47 PM Kai Wang dep...@gmail.com wrote:

 Thanks for the help. I wasn't clear how clustering column works. Coming
 from Thrift experience, it took me a while to understand how clustering
 column impacts partition storage on disk. Now I believe using seq_type as
 the first clustering column solves my problem. As of partition size, I will
 start with some bucket assumption. If the partition size exceeds the
 threshold I may need to re-bucket using smaller bucket size.

 On another thread Eric mentions the optimal partition size should be at
 100 kb ~ 1 MB. I will use that as the start point to design my bucket
 strategy.


 On Sun, Dec 7, 2014 at 10:32 AM, Jack Krupansky j...@basetechnology.com
  wrote:

   It would be helpful to look at some specific examples of sequences,
 showing how they grow. I suspect that the term “sequence” is being
 overloaded in some subtly misleading way here.

 Besides, we’ve already answered the headline question – data locality
 is achieved by having a common partition key. So, we need some clarity as
 to what question we are really focusing on

 And, of course, we should be asking the “Cassandra Data Modeling 101”
 question of what do your queries want to look like, how exactly do you want
 to access your data. Only after we have a handle on how you need to read
 your data can we decide how it should be stored.

 My immediate question to get things back on track: When you say “The
 typical read is to load a subset of sequences with the same seq_id”,
 what type of “subset” are you talking about? Again, a few explicit and
 concise example queries (in some concise, easy to read pseudo language or
 even plain English, but not belabored with full CQL syntax.) would be very
 helpful. I mean, Cassandra has no “subset” concept, nor a “load subset”
 command, so what are we really talking about?

 Also, I presume we are talking CQL, but some of the references seem
 more Thrift/slice oriented.

 -- Jack Krupansky

  *From:* Eric Stevens migh...@gmail.com
 *Sent:* Sunday, December 7, 2014 10:12 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: How to model data to achieve specific data locality

  Also new seq_types can be added and old seq_types can be 

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Ian Rose
Try `nodetool clearsnapshot` which will delete any snapshots you have.  I
have never taken a snapshot with nodetool yet I found several snapshots on
my disk recently (which can take a lot of space).  So perhaps they are
automatically generated by some operation?  No idea.  Regardless, nuking
those freed up a ton of space for me.

- Ian


On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder n...@whistle.com wrote:

 Hi All,

 I am new to Cassandra so I apologise in advance if I have missed anything
 obvious but this one currently has me stumped.

 I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
 C3.2XLarge nodes which overall is working very well for us.  However, after
 letting it run for a while I seem to get into a situation where the amount
 of disk space used far exceeds the total amount of data on each node and I
 haven't been able to get the size to go back down except by stopping and
 restarting the node.

 For example, in my data I have almost all of my data in one table.  On one
 of my nodes right now the total space used (as reported by nodetool
 cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
 size of the data files (using du) the data file for that table is 107GB.
 Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
 becomes a problem.

 Running nodetool compact didn't reduce the size and neither does running
 nodetool repair -pr on the node.  I also tried nodetool flush and nodetool
 cleanup (even though I have not added or removed any nodes recently) but it
 didn't change anything either.  In order to keep my cluster up I then
 stopped and started that node and the size of the data file dropped to 54GB
 while the total column family size (as reported by nodetool) stayed about
 the same.

 Any suggestions as to what I could be doing wrong?

 Thanks,
 Nate



Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi Ian,

Thanks for the suggestion but I had actually already done that prior to the
scenario I described (to get myself some free space) and when I ran
nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
don't think that is where my space went.

One additional piece of information I forgot to point out is that when I
ran nodetool status on the node it included all 6 nodes.

I have also heard it mentioned that I may want to have a prime number of
nodes which may help protect against split-brain.  Is this true?  If so
does it still apply when I am using vnodes?

Thanks again,
Nate

--
*Nathanael Yoder*
Principal Engineer  Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose ianr...@fullstory.com wrote:

 Try `nodetool clearsnapshot` which will delete any snapshots you have.  I
 have never taken a snapshot with nodetool yet I found several snapshots on
 my disk recently (which can take a lot of space).  So perhaps they are
 automatically generated by some operation?  No idea.  Regardless, nuking
 those freed up a ton of space for me.

 - Ian


 On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder n...@whistle.com wrote:

 Hi All,

 I am new to Cassandra so I apologise in advance if I have missed anything
 obvious but this one currently has me stumped.

 I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
 C3.2XLarge nodes which overall is working very well for us.  However, after
 letting it run for a while I seem to get into a situation where the amount
 of disk space used far exceeds the total amount of data on each node and I
 haven't been able to get the size to go back down except by stopping and
 restarting the node.

 For example, in my data I have almost all of my data in one table.  On
 one of my nodes right now the total space used (as reported by nodetool
 cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
 size of the data files (using du) the data file for that table is 107GB.
 Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
 becomes a problem.

 Running nodetool compact didn't reduce the size and neither does running
 nodetool repair -pr on the node.  I also tried nodetool flush and nodetool
 cleanup (even though I have not added or removed any nodes recently) but it
 didn't change anything either.  In order to keep my cluster up I then
 stopped and started that node and the size of the data file dropped to 54GB
 while the total column family size (as reported by nodetool) stayed about
 the same.

 Any suggestions as to what I could be doing wrong?

 Thanks,
 Nate





Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad
You don't need a prime number of nodes in your ring, but it's not a bad
idea to it be a multiple of your RF when your cluster is small.


On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder n...@whistle.com wrote:

 Hi Ian,

 Thanks for the suggestion but I had actually already done that prior to
 the scenario I described (to get myself some free space) and when I ran
 nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
 don't think that is where my space went.

 One additional piece of information I forgot to point out is that when I
 ran nodetool status on the node it included all 6 nodes.

 I have also heard it mentioned that I may want to have a prime number of
 nodes which may help protect against split-brain.  Is this true?  If so
 does it still apply when I am using vnodes?

 Thanks again,
 Nate

 --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose ianr...@fullstory.com wrote:

 Try `nodetool clearsnapshot` which will delete any snapshots you have.  I
 have never taken a snapshot with nodetool yet I found several snapshots on
 my disk recently (which can take a lot of space).  So perhaps they are
 automatically generated by some operation?  No idea.  Regardless, nuking
 those freed up a ton of space for me.

 - Ian


 On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder n...@whistle.com wrote:

 Hi All,

 I am new to Cassandra so I apologise in advance if I have missed
 anything obvious but this one currently has me stumped.

 I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
 C3.2XLarge nodes which overall is working very well for us.  However, after
 letting it run for a while I seem to get into a situation where the amount
 of disk space used far exceeds the total amount of data on each node and I
 haven't been able to get the size to go back down except by stopping and
 restarting the node.

 For example, in my data I have almost all of my data in one table.  On
 one of my nodes right now the total space used (as reported by nodetool
 cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
 size of the data files (using du) the data file for that table is 107GB.
 Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
 becomes a problem.

 Running nodetool compact didn't reduce the size and neither does running
 nodetool repair -pr on the node.  I also tried nodetool flush and nodetool
 cleanup (even though I have not added or removed any nodes recently) but it
 didn't change anything either.  In order to keep my cluster up I then
 stopped and started that node and the size of the data file dropped to 54GB
 while the total column family size (as reported by nodetool) stayed about
 the same.

 Any suggestions as to what I could be doing wrong?

 Thanks,
 Nate






Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks Jonathan.  So there is nothing too idiotic about my current set-up
with 6 boxes each with 256 vnodes each and a RF of 2?

I appreciate the help,
Nate



--
*Nathanael Yoder*
Principal Engineer  Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 You don't need a prime number of nodes in your ring, but it's not a bad
 idea to it be a multiple of your RF when your cluster is small.


 On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder n...@whistle.com wrote:

 Hi Ian,

 Thanks for the suggestion but I had actually already done that prior to
 the scenario I described (to get myself some free space) and when I ran
 nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
 don't think that is where my space went.

 One additional piece of information I forgot to point out is that when I
 ran nodetool status on the node it included all 6 nodes.

 I have also heard it mentioned that I may want to have a prime number of
 nodes which may help protect against split-brain.  Is this true?  If so
 does it still apply when I am using vnodes?

 Thanks again,
 Nate

 --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose ianr...@fullstory.com wrote:

 Try `nodetool clearsnapshot` which will delete any snapshots you have.
 I have never taken a snapshot with nodetool yet I found several snapshots
 on my disk recently (which can take a lot of space).  So perhaps they are
 automatically generated by some operation?  No idea.  Regardless, nuking
 those freed up a ton of space for me.

 - Ian


 On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder n...@whistle.com wrote:

 Hi All,

 I am new to Cassandra so I apologise in advance if I have missed
 anything obvious but this one currently has me stumped.

 I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
 C3.2XLarge nodes which overall is working very well for us.  However, after
 letting it run for a while I seem to get into a situation where the amount
 of disk space used far exceeds the total amount of data on each node and I
 haven't been able to get the size to go back down except by stopping and
 restarting the node.

 For example, in my data I have almost all of my data in one table.  On
 one of my nodes right now the total space used (as reported by nodetool
 cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
 size of the data files (using du) the data file for that table is 107GB.
 Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
 becomes a problem.

 Running nodetool compact didn't reduce the size and neither does
 running nodetool repair -pr on the node.  I also tried nodetool flush and
 nodetool cleanup (even though I have not added or removed any nodes
 recently) but it didn't change anything either.  In order to keep my
 cluster up I then stopped and started that node and the size of the data
 file dropped to 54GB while the total column family size (as reported by
 nodetool) stayed about the same.

 Any suggestions as to what I could be doing wrong?

 Thanks,
 Nate






Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad
Well, I personally don't like RF=2.  It means if you're using CL=QUORUM and
a node goes down, you're going to have a bad time. (downtime) If you're
using CL=ONE then you'd be ok.  However, I am not wild about losing a node
and having only 1 copy of my data available in prod.

On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder n...@whistle.com wrote:

 Thanks Jonathan.  So there is nothing too idiotic about my current set-up
 with 6 boxes each with 256 vnodes each and a RF of 2?

 I appreciate the help,
 Nate



 --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 You don't need a prime number of nodes in your ring, but it's not a bad
 idea to it be a multiple of your RF when your cluster is small.


 On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder n...@whistle.com wrote:

 Hi Ian,

 Thanks for the suggestion but I had actually already done that prior to
 the scenario I described (to get myself some free space) and when I ran
 nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
 don't think that is where my space went.

 One additional piece of information I forgot to point out is that when I
 ran nodetool status on the node it included all 6 nodes.

 I have also heard it mentioned that I may want to have a prime number of
 nodes which may help protect against split-brain.  Is this true?  If so
 does it still apply when I am using vnodes?

 Thanks again,
 Nate

 --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose ianr...@fullstory.com wrote:

 Try `nodetool clearsnapshot` which will delete any snapshots you have.
 I have never taken a snapshot with nodetool yet I found several snapshots
 on my disk recently (which can take a lot of space).  So perhaps they are
 automatically generated by some operation?  No idea.  Regardless, nuking
 those freed up a ton of space for me.

 - Ian


 On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder n...@whistle.com wrote:

 Hi All,

 I am new to Cassandra so I apologise in advance if I have missed
 anything obvious but this one currently has me stumped.

 I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
 C3.2XLarge nodes which overall is working very well for us.  However, 
 after
 letting it run for a while I seem to get into a situation where the amount
 of disk space used far exceeds the total amount of data on each node and I
 haven't been able to get the size to go back down except by stopping and
 restarting the node.

 For example, in my data I have almost all of my data in one table.  On
 one of my nodes right now the total space used (as reported by nodetool
 cfstats) is 57.2 GB and there are no snapshots. However, when I look at 
 the
 size of the data files (using du) the data file for that table is 107GB.
 Because the C3.2XLarge only have 160 GB of SSD you can see why this 
 quickly
 becomes a problem.

 Running nodetool compact didn't reduce the size and neither does
 running nodetool repair -pr on the node.  I also tried nodetool flush and
 nodetool cleanup (even though I have not added or removed any nodes
 recently) but it didn't change anything either.  In order to keep my
 cluster up I then stopped and started that node and the size of the data
 file dropped to 54GB while the total column family size (as reported by
 nodetool) stayed about the same.

 Any suggestions as to what I could be doing wrong?

 Thanks,
 Nate







Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks for the advice.  Totally makes sense.  Once I figure out how to make
my data stop taking up more than 2x more space without being useful I'll
definitely make the change :)

Nate



--
*Nathanael Yoder*
Principal Engineer  Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 9:02 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 Well, I personally don't like RF=2.  It means if you're using CL=QUORUM
 and a node goes down, you're going to have a bad time. (downtime) If you're
 using CL=ONE then you'd be ok.  However, I am not wild about losing a node
 and having only 1 copy of my data available in prod.


 On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder n...@whistle.com wrote:

 Thanks Jonathan.  So there is nothing too idiotic about my current set-up
 with 6 boxes each with 256 vnodes each and a RF of 2?

 I appreciate the help,
 Nate



 --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 You don't need a prime number of nodes in your ring, but it's not a bad
 idea to it be a multiple of your RF when your cluster is small.


 On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder n...@whistle.com wrote:

 Hi Ian,

 Thanks for the suggestion but I had actually already done that prior to
 the scenario I described (to get myself some free space) and when I ran
 nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
 don't think that is where my space went.

 One additional piece of information I forgot to point out is that when
 I ran nodetool status on the node it included all 6 nodes.

 I have also heard it mentioned that I may want to have a prime number
 of nodes which may help protect against split-brain.  Is this true?  If so
 does it still apply when I am using vnodes?

 Thanks again,
 Nate

 --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose ianr...@fullstory.com wrote:

 Try `nodetool clearsnapshot` which will delete any snapshots you
 have.  I have never taken a snapshot with nodetool yet I found several
 snapshots on my disk recently (which can take a lot of space).  So perhaps
 they are automatically generated by some operation?  No idea.  Regardless,
 nuking those freed up a ton of space for me.

 - Ian


 On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder n...@whistle.com wrote:

 Hi All,

 I am new to Cassandra so I apologise in advance if I have missed
 anything obvious but this one currently has me stumped.

 I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
 C3.2XLarge nodes which overall is working very well for us.  However, 
 after
 letting it run for a while I seem to get into a situation where the 
 amount
 of disk space used far exceeds the total amount of data on each node and 
 I
 haven't been able to get the size to go back down except by stopping and
 restarting the node.

 For example, in my data I have almost all of my data in one table.
 On one of my nodes right now the total space used (as reported by 
 nodetool
 cfstats) is 57.2 GB and there are no snapshots. However, when I look at 
 the
 size of the data files (using du) the data file for that table is 107GB.
 Because the C3.2XLarge only have 160 GB of SSD you can see why this 
 quickly
 becomes a problem.

 Running nodetool compact didn't reduce the size and neither does
 running nodetool repair -pr on the node.  I also tried nodetool flush and
 nodetool cleanup (even though I have not added or removed any nodes
 recently) but it didn't change anything either.  In order to keep my
 cluster up I then stopped and started that node and the size of the data
 file dropped to 54GB while the total column family size (as reported by
 nodetool) stayed about the same.

 Any suggestions as to what I could be doing wrong?

 Thanks,
 Nate







Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Reynald Bourtembourg

Hi Nate,

Are you using incremental backups?

Extract from the documentation ( 
http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_backup_incremental_t.html 
):


/When incremental backups are enabled (disabled by default), Cassandra 
hard-links each flushed SSTable to a backups directory under the 
keyspace data directory. This allows storing backups offsite without 
transferring entire snapshots. Also, incremental backups combine with 
snapshots to provide a dependable, up-to-date backup mechanism./

//

/As with snapshots, Cassandra does not automatically clear incremental 
backup files. *DataStax recommends setting up a process to clear 
incremental backup hard-links each time a new snapshot is created.*/


These backups are stored in directories named backups at the same 
level as the snapshots' directories.


Reynald

On 09/12/2014 18:13, Nate Yoder wrote:
Thanks for the advice.  Totally makes sense.  Once I figure out how to 
make my data stop taking up more than 2x more space without being 
useful I'll definitely make the change :)


Nate



--
*Nathanael Yoder*
Principal Engineer  Data Scientist, Whistle
415-944-7344 // n...@whistle.com mailto:n...@whistle.com

On Tue, Dec 9, 2014 at 9:02 AM, Jonathan Haddad j...@jonhaddad.com 
mailto:j...@jonhaddad.com wrote:


Well, I personally don't like RF=2.  It means if you're using
CL=QUORUM and a node goes down, you're going to have a bad time.
(downtime) If you're using CL=ONE then you'd be ok. However, I am
not wild about losing a node and having only 1 copy of my data
available in prod.


On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder n...@whistle.com
mailto:n...@whistle.com wrote:

Thanks Jonathan.  So there is nothing too idiotic about my
current set-up with 6 boxes each with 256 vnodes each and a RF
of 2?

I appreciate the help,
Nate



--
*Nathanael Yoder*
Principal Engineer  Data Scientist, Whistle
415-944-7344 // n...@whistle.com mailto:n...@whistle.com

On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad
j...@jonhaddad.com mailto:j...@jonhaddad.com wrote:

You don't need a prime number of nodes in your ring, but
it's not a bad idea to it be a multiple of your RF when
your cluster is small.


On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder
n...@whistle.com mailto:n...@whistle.com wrote:

Hi Ian,

Thanks for the suggestion but I had actually already
done that prior to the scenario I described (to get
myself some free space) and when I ran nodetool
cfstats it listed 0 snapshots as expected, so
unfortunately I don't think that is where my space went.

One additional piece of information I forgot to point
out is that when I ran nodetool status on the node it
included all 6 nodes.

I have also heard it mentioned that I may want to have
a prime number of nodes which may help protect against
split-brain.  Is this true?  If so does it still apply
when I am using vnodes?

Thanks again,
Nate

--
*Nathanael Yoder*
Principal Engineer  Data Scientist, Whistle
415-944-7344 // n...@whistle.com mailto:n...@whistle.com

On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose
ianr...@fullstory.com mailto:ianr...@fullstory.com
wrote:

Try `nodetool clearsnapshot` which will delete any
snapshots you have.  I have never taken a snapshot
with nodetool yet I found several snapshots on my
disk recently (which can take a lot of space).  So
perhaps they are automatically generated by some
operation? No idea.  Regardless, nuking those
freed up a ton of space for me.

- Ian


On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder
n...@whistle.com mailto:n...@whistle.com wrote:

Hi All,

I am new to Cassandra so I apologise in
advance if I have missed anything obvious but
this one currently has me stumped.

I am currently running a 6 node Cassandra
2.1.1 cluster on EC2 using C3.2XLarge nodes
which overall is working very well for us.
However, after letting it run for a while I
seem to get into a situation where the amount
of disk space used far exceeds the total
amount of data on each node and I haven't been

Observations/concerns with repair and hinted handoff

2014-12-09 Thread Robert Wille
I have spent a lot of time working with single-node, RF=1 clusters in my 
development. Before I deploy a cluster to our live environment, I have spent 
some time learning how to work with a multi-node cluster with RF=3. There were 
some surprises. I’m wondering if people here can enlighten me. I don’t exactly 
have that warm, fuzzy feeling.

I created a three-node cluster with RF=3. I then wrote to the cluster pretty 
heavily to cause some dropped mutation messages. The dropped messages didn’t 
trickle in, but came in a burst. I suspect full GC is the culprit, but I don’t 
really know. Anyway, I ended up with 17197 dropped mutation messages on node 1, 
6422 on node 2, and none on node 3. In order to learn about repair, I waited 
for compaction to finish doing its thing, recorded the size and estimated 
number of keys for each table, started up repair (nodetool repair keyspace) 
on all three nodes, and waited for it to complete before doing anything else 
(even reads). When repair and compaction were done, I checked the size and 
estimated number of keys for each table. All tables on all nodes grew in size 
and estimated number of keys. The estimated number of keys for each node grew 
by 65k, 272k and 247k (.2%, .7% and .6%) for nodes 1, 2 and 3 respectively. I 
expected some growth, but that’s significantly more new keys than I had dropped 
mutation messages. I also expected the most new data on node 1, and none on 
node 3, which didn’t come close to what actually happened. Perhaps a mutation 
message contains more than one record? Perhaps the dropped mutation message 
counter is incremented on the coordinator, not the node that was overloaded?

I repeated repair, and the second time around the tables remained unchanged, as 
expected. I would hope that repair wouldn’t do anything to the tables if they 
were in sync. 

Just to be clear, I’m not overly concerned about the unexpected increase in 
number of keys. I’m pretty sure that repair did the needful thing and did bring 
the nodes in sync. The unexpected results more likely indicates that I’m 
ignorant, and it really bothers me when I don’t understand something. If you 
have any insights, I’d appreciate them.

One of the dismaying things about repair was that the first time around it took 
about 4 hours, with a completely idle cluster (except for repairs, of course), 
and only 6 GB of data on each node. I can bootstrap a node with 6 GB of data in 
a couple of minutes. That makes repair something like 50 to 100 times more 
expensive than bootstrapping. I know I should run repair on one node at a time, 
but even if you divide by three, that’s still a horrifically long time for such 
a small amount of data. The second time around, repair only took 30 minutes. 
That’s much better, but best-case is still about 10x longer than bootstrapping. 
Should repair really be taking this long? When I have 300 GB of data, is a 
best-case repair going to take 25 hours, and a repair with a modest amount of 
work more than 100 hours? My records are quite small. Those 6 GB contain almost 
40 million partitions. 

Following my repair experiment, I added a fourth node, and then tried killing a 
node and importing a bunch of data while the node was down. As far as repair is 
concerned, this seems to work fine (although again, glacially). However, I 
noticed that hinted handoff doesn’t seem to be working. I added several million 
records (with consistency=one), and nothing appeared in system.hints (du -hs 
showed a few dozen K bytes), nor did I get any pending Hinted Handoff tasks in 
the Thread Pool Stats. When I started up the down node (less than 3 hours 
later), the missed data didn’t appear to get sent to it. The tables did not 
grow, compaction events didn’t schedule, and there wasn’t any appreciable CPU 
utilization by the cluster. With millions of records that were missed while it 
was down, I should have noticed something if it actually was replaying the 
hints. Is there some magic setting to turn on hinted handoffs? Were there too 
many hints and so it just deleted them? My assumption is that if hinted handoff 
is working, then my need for repair should be much less, which given my 
experience so far, would be a really good thing.

Given the horrifically long time it takes to repair a node, and hinted handoff 
apparently not working, if a node goes down, is it better to bootstrap a new 
one than to repair the node that went down? I would expect that even if I chose 
to bootstrap a new node, it would need to be repaired anyway, since it would 
probably miss writes while bootstrapping.

Thanks in advance

Robert



Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi Reynald,

Good idea but I have incremental backups turned off and other than *.db
files nothing else appears to be in the data directory for that table.

Is there any other output that would be helpful in helping you all help me?

Thanks,
Nate

--
*Nathanael Yoder*
Principal Engineer  Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 9:27 AM, Reynald Bourtembourg 
reynald.bourtembo...@esrf.fr wrote:

  Hi Nate,

 Are you using incremental backups?

 Extract from the documentation (
 http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_backup_incremental_t.html
 ):

 *When incremental backups are enabled (disabled by default), Cassandra
 hard-links each flushed SSTable to a backups directory under the keyspace
 data directory. This allows storing backups offsite without transferring
 entire snapshots. Also, incremental backups combine with snapshots to
 provide a dependable, up-to-date backup mechanism.*

 *As with snapshots, Cassandra does not automatically clear incremental
 backup files. DataStax recommends setting up a process to clear incremental
 backup hard-links each time a new snapshot is created.*
  These backups are stored in directories named backups at the same level
 as the snapshots' directories.

 Reynald


 On 09/12/2014 18:13, Nate Yoder wrote:

 Thanks for the advice.  Totally makes sense.  Once I figure out how to
 make my data stop taking up more than 2x more space without being useful
 I'll definitely make the change :)

  Nate



   --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 9:02 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 Well, I personally don't like RF=2.  It means if you're using CL=QUORUM
 and a node goes down, you're going to have a bad time. (downtime) If you're
 using CL=ONE then you'd be ok.  However, I am not wild about losing a node
 and having only 1 copy of my data available in prod.


 On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder n...@whistle.com wrote:

 Thanks Jonathan.  So there is nothing too idiotic about my current
 set-up with 6 boxes each with 256 vnodes each and a RF of 2?

  I appreciate the help,
 Nate



   --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

  On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 You don't need a prime number of nodes in your ring, but it's not a bad
 idea to it be a multiple of your RF when your cluster is small.


 On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder n...@whistle.com wrote:

 Hi Ian,

  Thanks for the suggestion but I had actually already done that prior
 to the scenario I described (to get myself some free space) and when I ran
 nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
 don't think that is where my space went.

  One additional piece of information I forgot to point out is that
 when I ran nodetool status on the node it included all 6 nodes.

  I have also heard it mentioned that I may want to have a prime
 number of nodes which may help protect against split-brain.  Is this true?
 If so does it still apply when I am using vnodes?

  Thanks again,
 Nate

   --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose ianr...@fullstory.com
 wrote:

 Try `nodetool clearsnapshot` which will delete any snapshots you
 have.  I have never taken a snapshot with nodetool yet I found several
 snapshots on my disk recently (which can take a lot of space).  So 
 perhaps
 they are automatically generated by some operation?  No idea.  
 Regardless,
 nuking those freed up a ton of space for me.

  - Ian


 On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder n...@whistle.com wrote:

 Hi All,

  I am new to Cassandra so I apologise in advance if I have missed
 anything obvious but this one currently has me stumped.

  I am currently running a 6 node Cassandra 2.1.1 cluster on EC2
 using C3.2XLarge nodes which overall is working very well for us.  
 However,
 after letting it run for a while I seem to get into a situation where 
 the
 amount of disk space used far exceeds the total amount of data on each 
 node
 and I haven't been able to get the size to go back down except by 
 stopping
 and restarting the node.

  For example, in my data I have almost all of my data in one
 table.  On one of my nodes right now the total space used (as reported 
 by
 nodetool cfstats) is 57.2 GB and there are no snapshots. However, when I
 look at the size of the data files (using du) the data file for that 
 table
 is 107GB.  Because the C3.2XLarge only have 160 GB of SSD you can see 
 why
 this quickly becomes a problem.

  Running nodetool compact didn't reduce the size and neither does
 running nodetool repair -pr on the node.  I also tried nodetool flush 
 and
 nodetool cleanup (even though I have not added or removed any 

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi All,

Thanks for the help but after yet another day of investigation I think I
might be running into this
https://issues.apache.org/jira/browse/CASSANDRA-8061 issue where tmplink
files aren't removed until Cassandra is restarted.

Thanks again for all the suggestions!

Nate

--
*Nathanael Yoder*
Principal Engineer  Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 10:18 AM, Nate Yoder n...@whistle.com wrote:

 Hi Reynald,

 Good idea but I have incremental backups turned off and other than *.db
 files nothing else appears to be in the data directory for that table.

 Is there any other output that would be helpful in helping you all help me?

 Thanks,
 Nate

 --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 9:27 AM, Reynald Bourtembourg 
 reynald.bourtembo...@esrf.fr wrote:

  Hi Nate,

 Are you using incremental backups?

 Extract from the documentation (
 http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_backup_incremental_t.html
 ):

 *When incremental backups are enabled (disabled by default), Cassandra
 hard-links each flushed SSTable to a backups directory under the keyspace
 data directory. This allows storing backups offsite without transferring
 entire snapshots. Also, incremental backups combine with snapshots to
 provide a dependable, up-to-date backup mechanism.*

 *As with snapshots, Cassandra does not automatically clear incremental
 backup files. DataStax recommends setting up a process to clear incremental
 backup hard-links each time a new snapshot is created.*
  These backups are stored in directories named backups at the same
 level as the snapshots' directories.

 Reynald


 On 09/12/2014 18:13, Nate Yoder wrote:

 Thanks for the advice.  Totally makes sense.  Once I figure out how to
 make my data stop taking up more than 2x more space without being useful
 I'll definitely make the change :)

  Nate



   --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 9:02 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 Well, I personally don't like RF=2.  It means if you're using CL=QUORUM
 and a node goes down, you're going to have a bad time. (downtime) If you're
 using CL=ONE then you'd be ok.  However, I am not wild about losing a node
 and having only 1 copy of my data available in prod.


 On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder n...@whistle.com wrote:

 Thanks Jonathan.  So there is nothing too idiotic about my current
 set-up with 6 boxes each with 256 vnodes each and a RF of 2?

  I appreciate the help,
 Nate



   --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

  On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 You don't need a prime number of nodes in your ring, but it's not a
 bad idea to it be a multiple of your RF when your cluster is small.


 On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder n...@whistle.com wrote:

 Hi Ian,

  Thanks for the suggestion but I had actually already done that
 prior to the scenario I described (to get myself some free space) and 
 when
 I ran nodetool cfstats it listed 0 snapshots as expected, so 
 unfortunately
 I don't think that is where my space went.

  One additional piece of information I forgot to point out is that
 when I ran nodetool status on the node it included all 6 nodes.

  I have also heard it mentioned that I may want to have a prime
 number of nodes which may help protect against split-brain.  Is this 
 true?
 If so does it still apply when I am using vnodes?

  Thanks again,
 Nate

   --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose ianr...@fullstory.com
 wrote:

 Try `nodetool clearsnapshot` which will delete any snapshots you
 have.  I have never taken a snapshot with nodetool yet I found several
 snapshots on my disk recently (which can take a lot of space).  So 
 perhaps
 they are automatically generated by some operation?  No idea.  
 Regardless,
 nuking those freed up a ton of space for me.

  - Ian


 On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder n...@whistle.com wrote:

 Hi All,

  I am new to Cassandra so I apologise in advance if I have missed
 anything obvious but this one currently has me stumped.

  I am currently running a 6 node Cassandra 2.1.1 cluster on EC2
 using C3.2XLarge nodes which overall is working very well for us.  
 However,
 after letting it run for a while I seem to get into a situation where 
 the
 amount of disk space used far exceeds the total amount of data on each 
 node
 and I haven't been able to get the size to go back down except by 
 stopping
 and restarting the node.

  For example, in my data I have almost all of my data in one
 table.  On one of my nodes right now the total space used (as reported 
 by
 

Best practice for emulating a Cassandra timeout during unit tests?

2014-12-09 Thread Clint Kelly
Hi all,

I'd like to write some tests for my code that uses the Cassandra Java
driver to see how it behaves if there is a read timeout while accessing
Cassandra.  Is there a best-practice for getting this done?  I was thinking
about adjusting the settings in the cluster builder to adjust the timeout
settings to be something impossibly low (like 1ms), but I'd rather do
something to my test Cassandra instance (using the
EmbeddedCassandraService) to temporarily slow it down.

Any suggestions?

Best regards,
Clint


Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks Rob.  Definitely good advice that I wish I had come across a couple
of months ago...  That said, it still definitely points me in the right
direction as to what to do now.

--
*Nathanael Yoder*
Principal Engineer  Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 12:21 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Dec 8, 2014 at 5:12 PM, Nate Yoder n...@whistle.com wrote:

 I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
 C3.2XLarge nodes which overall is working very well for us.  However, after
 letting it run for a while I seem to get into a situation where the amount
 of disk space used far exceeds the total amount of data on each node and I
 haven't been able to get the size to go back down except by stopping and
 restarting the node.



 [... link to rather serious bug in 2.1.1 version in JIRA ...]


 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

 =Rob



upgrade cassandra from 2.0.6 to 2.1.2

2014-12-09 Thread wyang
I looked some upgrade documentations and am a little puzzled.


According tohttps://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt, 
“Rolling upgrades from anything pre-2.0.7 is not supported”. It means we should 
upgrade to 2.0.7 or later first? Can we rolling upgrade to 2.0.7? Do we need 
upgradestables after that? It seems nothing specific to note upgrading between 
2.0.6 and 2.0.7 in NEWS.txt


Any advice will be kindly appreciated

Cassandra Maintenance Best practices

2014-12-09 Thread Neha Trivedi
Hi,
We have Two Node Cluster Configuration in production with RF=2.

Which means that the data is written in both the clusters and it's running
for about a month now and has good amount of data.

Questions?
1. What are the best practices for maintenance?
2. Is OPScenter required to be installed or I can manage with nodetool
utility?
3. Is is necessary to run repair weekly?

thanks
regards
Neha


[Cassandra][SStableLoader Out of Heap Memory]

2014-12-09 Thread 严超
Hi, Everyone:
I'm importing a CSV file into Cassandra using SStableLoader. And
I'm following the example here:
https://github.com/yukim/cassandra-bulkload-example/
When i try to run the sstableloader, it fails with OOM. I also
changed the sstableloader.sh script (that runs the java -cp ...BulkLoader )
to have more mem using -Xms and -Xmx args but still i keep hitting the same
issue.
Any hints/directions would be really helpful .

*Stack Trace : *
/usr/bin/sstableloader -v -d internal-ip /tmp/nitin_test/nitin_test_load/

Established connection to initial hosts
Opening sstables and calculating sections to stream
Exception in thread main java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.init(ArrayList.java:144)
at org.apache.cassandra.db.RowIndexEntry$Serializer.
deserialize(RowIndexEntry.java:120)
at org.apache.cassandra.io.sstable.SSTableReader.buildSummary(SSTableReader.
java:457)
at org.apache.cassandra.io.sstable.SSTableReader.openForBatch(SSTableReader.
java:170)
at org.apache.cassandra.io.sstable.SSTableLoader$1.
accept(SSTableLoader.java:112)
at java.io.File.list(File.java:1155)
at org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.
java:73)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(
SSTableLoader.java:155)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:66)


*Best Regards!*


*Chao Yan--**My twitter:Andy Yan @yanchao727
https://twitter.com/yanchao727*


*My Weibo:http://weibo.com/herewearenow
http://weibo.com/herewearenow--*


Re: upgrade cassandra from 2.0.6 to 2.1.2

2014-12-09 Thread Jonathan Haddad
Yes.  It is, in general, a best practice to upgrade to the latest bug fix
release before doing an upgrade to the next point release.

On Tue Dec 09 2014 at 6:58:24 PM wyang wy...@v5.cn wrote:

 I looked some upgrade documentations and am a little puzzled.


 According to
 https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt, “Rolling
 upgrades from anything pre-2.0.7 is not supported”. It means we should
 upgrade to 2.0.7 or later first? Can we rolling upgrade to 2.0.7? Do we
 need upgrade stables after that?  It seems nothing specific to note
 upgrading between 2.0.6 and 2.0.7 in NEWS.txt


 Any advice will be kindly appreciated





Re: Cassandra Maintenance Best practices

2014-12-09 Thread Jonathan Haddad
I did a presentation on diagnosing performance problems in production at
the US  Euro summits, in which I covered quite a few tools  preventative
measures you should know when running a production cluster.  You may find
it useful:
http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/

On ops center - I recommend it.  It gives you a nice dashboard.  I don't
think it's completely comprehensive (but no tool really is) but it gets you
90% of the way there.

It's a good idea to run repairs, especially if you're doing deletes or
querying at CL=ONE.  I assume you're not using quorum, because on RF=2
that's the same as CL=ALL.

I recommend at least RF=3 because if you lose 1 server, you're on the edge
of data loss.


On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com
wrote:

 Hi,
 We have Two Node Cluster Configuration in production with RF=2.

 Which means that the data is written in both the clusters and it's running
 for about a month now and has good amount of data.

 Questions?
 1. What are the best practices for maintenance?
 2. Is OPScenter required to be installed or I can manage with nodetool
 utility?
 3. Is is necessary to run repair weekly?

 thanks
 regards
 Neha