Does Cassandra support running on Java 8?

2014-10-22 Thread Fredrik
Are there any official recomendations, validations/tests done with 
Cassandra = 2.0 on Java 8?


Regards
/Fredrik


Performance Issue: Keeping rows in memory

2014-10-22 Thread Thomas Whiteway
Hi,

I'm working on an application using a Cassandra (2.1.0) cluster where

-  our entire dataset is around 22GB

-  each node has 48GB of memory but only a single (mechanical) hard disk

-  in normal operation we have a low level of writes and no reads

-  very occasionally we need to read rows very fast (1.5K 
rows/second), and only read each row once.

When we try and read the rows it takes up to five minutes before Cassandra is 
able to keep up.  The problem seems to be that it takes a while to get the data 
into the page cache and until then Cassandra can't retrieve the data from disk 
fast enough (e.g. if I drop the page cache mid-test then Cassandra slows down 
for the next 5 minutes).

Given that the total amount of should fit comfortably in memory I've been 
trying to find a way to keep the rows cached in memory but there doesn't seem 
to be a particularly great way to achieve this.

I've tried enabling the row cache and pre-populating the test by querying every 
row before starting the load which gives good performance, but the row cache 
isn't really intended to be used this way and we'd be fighting the row cache to 
keep the rows in (e.g. by cyclically reading through all the rows during normal 
operation).

Keeping the page cache warm by running a background task to keep accessing the 
files for the sstables would be simpler and currently this is the solution 
we're leaning towards, but we have less control over the page cache, it would 
be vulnerable to other processes knocking Cassandra's files out, and it 
generally feels like a bit of a hack.

Has anyone had any success with trying to do something similar to this or have 
any suggestions for possible solutions?

Thanks,
Thomas



Question on how to run incremental repairs

2014-10-22 Thread Juho Mäkinen
I'm having problems understanding how incremental repairs are supposed to
be run.

If I try to do nodetool repair -inc cassandra will complain that It is
not possible to mix sequential repair and incremental repairs. However it
seems that running nodetool repair -inc -par does the job, but I couldn't
be sure if  this is the correct (and only?) way to run incremental repairs?

Previously I ran repairs with nodetool repair -pr on each node at a time,
so that I could minimise the performance hit. I've understood that doing a
single nodetool repair -inc -par command runs it on all machines in the
entire cluster, so doesn't that cause a big performance penalty? Can I run
incremental repairs on one node at a time?

If running nodetool repair -inc -par every night in a single node is
fine, should I still spread them out so that each node takes a turn
executing this command each night?

Last question is a bit deeper: What I've understood is that incremental
repairs don't do repairs on SSTables which have already been repaired, but
doesn't this mean that these repaired SSTables can't be checked towards
missing or incorrect data?

Thanks.


Re: Question on how to run incremental repairs

2014-10-22 Thread Marcus Eriksson
On Wed, Oct 22, 2014 at 2:39 PM, Juho Mäkinen juho.maki...@gmail.com
wrote:

 I'm having problems understanding how incremental repairs are supposed to
 be run.

 If I try to do nodetool repair -inc cassandra will complain that It is
 not possible to mix sequential repair and incremental repairs. However it
 seems that running nodetool repair -inc -par does the job, but I couldn't
 be sure if  this is the correct (and only?) way to run incremental repairs?

 yes, you need to run with -par


 Previously I ran repairs with nodetool repair -pr on each node at a
 time, so that I could minimise the performance hit. I've understood that
 doing a single nodetool repair -inc -par command runs it on all machines
 in the entire cluster, so doesn't that cause a big performance penalty? Can
 I run incremental repairs on one node at a time?


repair still works the same way, you can do with -pr, and no, repair -inc
-par does not run on all nodes, it repairs all ranges that the node you are
executing it on owns, so, if you have rf = 3 you will need to run repair
(without -pr) on every third node


 If running nodetool repair -inc -par every night in a single node is
 fine, should I still spread them out so that each node takes a turn
 executing this command each night?


use your old schedule, repair works the same way, just that incremental
repair does not include already repaired sstables


 Last question is a bit deeper: What I've understood is that incremental
 repairs don't do repairs on SSTables which have already been repaired, but
 doesn't this mean that these repaired SSTables can't be checked towards
 missing or incorrect data?


no, if you get a corrupt sstable for example, you will need to run an old
style repair on that node (without -inc).




Cluster/node with inconsistent schema

2014-10-22 Thread Jens Rantil
Hi,


I have a table that I dropped, recreated with two clustering primary keys (only 
had a single partition key before), and loaded previous data into the table.


I started noticing that a single node of mine was not able to do `ORDER BY` 
executions on the table (while the other nodes were). What was interesting was 
that `DESCRIBE TABLE mytable` showed correct PRIMARY KEY, and schema version 
was the same on all machines when I looked at system.peers as well as 
system.local.


On the failing node I was seeing exceptions such as 
https://gist.github.com/JensRantil/c6b2df5a5a2e12cdd3df.


I restarted the failing node in the belief the maybe I would force the gossip 
to get into a consistent state. Now I am, instead, getting RPC timeout when 
trying to SELECT against the table while logs are giving me 
https://gist.github.com/JensRantil/3b238e47dd6cd33732c1.


Any input appreciated. Would you suggest I drain the node, clear all sstables 
(rm -fr /var/lib/cassandra/mykeyspace/mytable/*), boot up Cassandra and run a 
full repair?


Cheers,
Jens

———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

Re: Cassandra Restore data from snapshots and Different Counts

2014-10-22 Thread Li, George
I assume that you are restoring snapshot data onto a new ring with the same
topology (i.e. if the old ring has n nodes, your new ring has n nodes
also). I discussed this a consultant from DataStax, and he told me that I
need to make sure each new node in the new ring need to have the same token
list as the corresponding old node in the old ring. For example, if you are
restoring snapshot from old node 1 onto new node 1, you need to make sure
new node 1's token list is the same as the token list of the old node 1.
This can be done by the following main steps:
1. Run 'nodetool ring' on the old ring to find token list for each old node.
2. Stop Cassandra in each new node.
3. Modify new ring node 1's yaml file so 'initial_token' is the same as the
token list of old node 1. Also, set auto_bootstrap to false.
4. After this is done, start each new node one by one with 2 minutes (not
sure if this is necessary but I was told that Cassandra may have issue if
you start all nodes at once) in between and install your database schema.
5. Copy over snapshot. I also restart all new nodes ones by one with 2
minutes in between afterwards. I am not sure if this restart is necessary
but I was being cautious.
6. Do a nodetool repair on the new ring.
I have used these steps many times and the count always come back identical.
Hope this helps.

George.

On Thu, Oct 16, 2014 at 6:10 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Oct 16, 2014 at 4:17 PM, Bosung Seo bos...@brightcloud.com
 wrote:

 I upgraded my Cassandra ring and restored data(copying snapshots) from
 the old ring. I am currently running the nodetool repair.
 I count the tables to check every rows is in the table, but counts have
 different values.
 It contains 571 rows, and counts are 500, 530, 501, and so on. Should I
 wait until nodetool repair is done?


 Are you able to repro the miscount before the repair? What exact type of
 count are you doing?

 My conjecture is that the miscounts are probably being caused by the
 nodetool repair. I understand how perverse this statement is.

 =Rob
 http://twitter.com/rcolidba



Re: Cluster/node with inconsistent schema

2014-10-22 Thread Jens Rantil
Hi again,




Follow-up: The incorrect schema propagated to other servers. Luckily this was a 
smaller table. I dropped the table and noticed that no sstables were removed. I 
then created the table again, and truncated it instead. This removed all the 
sstables and things look good now.




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Wed, Oct 22, 2014 at 4:05 PM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,
 I have a table that I dropped, recreated with two clustering primary keys 
 (only had a single partition key before), and loaded previous data into the 
 table.
 I started noticing that a single node of mine was not able to do `ORDER BY` 
 executions on the table (while the other nodes were). What was interesting 
 was that `DESCRIBE TABLE mytable` showed correct PRIMARY KEY, and schema 
 version was the same on all machines when I looked at system.peers as well as 
 system.local.
 On the failing node I was seeing exceptions such as 
 https://gist.github.com/JensRantil/c6b2df5a5a2e12cdd3df.
 I restarted the failing node in the belief the maybe I would force the gossip 
 to get into a consistent state. Now I am, instead, getting RPC timeout when 
 trying to SELECT against the table while logs are giving me 
 https://gist.github.com/JensRantil/3b238e47dd6cd33732c1.
 Any input appreciated. Would you suggest I drain the node, clear all sstables 
 (rm -fr /var/lib/cassandra/mykeyspace/mytable/*), boot up Cassandra and run a 
 full repair?
 Cheers,
 Jens
 ———
 Jens Rantil
 Backend engineer
 Tink AB
 Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se
 Facebook Linkedin Twitter

Re: Performance Issue: Keeping rows in memory

2014-10-22 Thread DuyHai Doan
If you're using 2.1.0 the row cache has been redesigned. How did you
configure it ? There is some new parameters to specify how many CQL rows
you want to keep in the cache:
http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1

On Wed, Oct 22, 2014 at 1:34 PM, Thomas Whiteway 
thomas.white...@metaswitch.com wrote:

  Hi,



 I’m working on an application using a Cassandra (2.1.0) cluster where

 -  our entire dataset is around 22GB

 -  each node has 48GB of memory but only a single (mechanical)
 hard disk

 -  in normal operation we have a low level of writes and no reads

 -  very occasionally we need to read rows very fast (1.5K
 rows/second), and only read each row once.



 When we try and read the rows it takes up to five minutes before Cassandra
 is able to keep up.  The problem seems to be that it takes a while to get
 the data into the page cache and until then Cassandra can’t retrieve the
 data from disk fast enough (e.g. if I drop the page cache mid-test then
 Cassandra slows down for the next 5 minutes).



 Given that the total amount of should fit comfortably in memory I’ve been
 trying to find a way to keep the rows cached in memory but there doesn’t
 seem to be a particularly great way to achieve this.



 I’ve tried enabling the row cache and pre-populating the test by querying
 every row before starting the load which gives good performance, but the
 row cache isn’t really intended to be used this way and we’d be fighting
 the row cache to keep the rows in (e.g. by cyclically reading through all
 the rows during normal operation).



 Keeping the page cache warm by running a background task to keep accessing
 the files for the sstables would be simpler and currently this is the
 solution we’re leaning towards, but we have less control over the page
 cache, it would be vulnerable to other processes knocking Cassandra’s files
 out, and it generally feels like a bit of a hack.



 Has anyone had any success with trying to do something similar to this or
 have any suggestions for possible solutions?



 Thanks,

 Thomas





RE: stream_throughput_outbound_megabits_per_sec

2014-10-22 Thread Donald Smith
Sorry, I copy-and-pasted the wrong variable name.  I meant to copy and paste 
streaming_socket_timeout_in_ms. So my question should be:

streaming_socket_timeout_in_ms is the timeout per operation on the streaming 
socket.   The docs recommend not to set  it too low (because a timeout causes 
streaming to restart from the beginning). But the default 0 never times out.  
What's a reasonable value?


# Enable socket timeout for streaming operation.
# When a timeout occurs during streaming, streaming is retried from the start
# of the current file. This _can_ involve re-streaming an important amount of
# data, so you should avoid setting the value too low.
# Default value is 0, which never timeout streams.
# streaming_socket_timeout_in_ms: 0

My second question is: Does it stream an entire SSTable in one operation? I 
doubt it.  How large is the object it streams in one operation?  I'm tempted to 
put the timeout at 30 seconds or 1 minute. Is that too low?.

The entire file (SSTable) is large – several hundred megabytes.  Is the timeout 
for streaming the entire file?  Or only a block of it?

Don

From: Marcus Eriksson [mailto:krum...@gmail.com]
Sent: Friday, October 17, 2014 4:05 AM
To: user@cassandra.apache.org
Subject: Re: stream_throughput_outbound_megabits_per_sec



On Thu, Oct 16, 2014 at 1:54 AM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:


stream_throughput_outbound_megabits_per_sec  is the timeout per operation on 
the streaming socket.   The docs recommend not to have it too low (because a 
timeout causes streaming to restart from the beginning). But the default 0 
never times out.  What's a reasonable value?

no, it is not a timeout, it states how fast sstables are streamed


Does it stream an entire SSTable in one operation? I doubt it.  How large is 
the object it streams in one operation?  I'm tempted to put the timeout at 30 
seconds or 1 minute. Is that too low?

unsure what you meat by 'operation' here, but it is one tcp connection, 
streaming the whole file (if thats what we want)


/Marcus


Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Donald Smith
Question about the read path in cassandra.  If a partition/row is in the 
Memtable and is being actively written to by other clients,  will a READ of 
that partition also have to hit SStables on disk (or in the page cache)?  Or 
can it be serviced entirely from the Memtable?

If you select all columns (e.g., select * from )   then I can imagine 
that cassandra would need to merge whatever columns are in the Memtable with 
what's in SStables on disk.

But if you select a single column (e.g., select Name from   where id= 
) and if that column is in the Memtable, I'd hope cassandra could skip 
checking the disk.  Can it do this optimization?

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Re: Performance Issue: Keeping rows in memory

2014-10-22 Thread Jonathan Haddad
First, did you run a query trace?

I recommend Al Tobey's pcstat util to determine if your files are in
the buffer cache: https://github.com/tobert/pcstat



On Wed, Oct 22, 2014 at 4:34 AM, Thomas Whiteway
thomas.white...@metaswitch.com wrote:
 Hi,



 I’m working on an application using a Cassandra (2.1.0) cluster where

 -  our entire dataset is around 22GB

 -  each node has 48GB of memory but only a single (mechanical) hard
 disk

 -  in normal operation we have a low level of writes and no reads

 -  very occasionally we need to read rows very fast (1.5K
 rows/second), and only read each row once.



 When we try and read the rows it takes up to five minutes before Cassandra
 is able to keep up.  The problem seems to be that it takes a while to get
 the data into the page cache and until then Cassandra can’t retrieve the
 data from disk fast enough (e.g. if I drop the page cache mid-test then
 Cassandra slows down for the next 5 minutes).



 Given that the total amount of should fit comfortably in memory I’ve been
 trying to find a way to keep the rows cached in memory but there doesn’t
 seem to be a particularly great way to achieve this.



 I’ve tried enabling the row cache and pre-populating the test by querying
 every row before starting the load which gives good performance, but the row
 cache isn’t really intended to be used this way and we’d be fighting the row
 cache to keep the rows in (e.g. by cyclically reading through all the rows
 during normal operation).



 Keeping the page cache warm by running a background task to keep accessing
 the files for the sstables would be simpler and currently this is the
 solution we’re leaning towards, but we have less control over the page
 cache, it would be vulnerable to other processes knocking Cassandra’s files
 out, and it generally feels like a bit of a hack.



 Has anyone had any success with trying to do something similar to this or
 have any suggestions for possible solutions?



 Thanks,

 Thomas





-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Jonathan Haddad
No.  Consider a scenario where you supply a timestamp a week in the future,
flush it to sstable, and then do a write, with the current timestamp.  The
record in disk will have a timestamp greater than the one in the memtable.

On Wed, Oct 22, 2014 at 9:18 AM, Donald Smith 
donald.sm...@audiencescience.com wrote:

  Question about the read path in cassandra.  If a partition/row is in the
 Memtable and is being actively written to by other clients,  will a READ of
 that partition also have to hit SStables on disk (or in the page
 cache)?  Or can it be serviced entirely from the Memtable?



 If you select all columns (e.g., “*select * from ….*”)   then I can
 imagine that cassandra would need to merge whatever columns are in the
 Memtable with what’s in SStables on disk.



 But if you select a single column (e.g., “*select Name from ….  where id=
 …*.”) and if that column is in the Memtable, I’d hope cassandra could
 skip checking the disk.  Can it do this optimization?



 Thanks, Don



 *Donald A. Smith* | Senior Software Engineer
 P: 425.201.3900 x 3866
 C: (206) 819-5965
 F: (646) 443-2333
 dona...@audiencescience.com


 [image: AudienceScience]






-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


RE: Performance Issue: Keeping rows in memory

2014-10-22 Thread Thomas Whiteway
I was using the pre-2.1.0 configuration scheme of setting caching to 
‘rows_only’ on the column family.  I’ve tried runs with  row_cache_size_in_mb 
set to both 16384 and 32768.

I don’t think the new settings would have helped in my case.  My understanding 
of the rows_per_partition setting is that it allows you to restrict the number 
of rows which are cached compared to the pre-2.1.0 way of doing things, while 
we want to cache as much as possible.

From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: 22 October 2014 16:59
To: user@cassandra.apache.org
Cc: James Lee
Subject: Re: Performance Issue: Keeping rows in memory

If you're using 2.1.0 the row cache has been redesigned. How did you configure 
it ? There is some new parameters to specify how many CQL rows you want to 
keep in the cache: http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1

On Wed, Oct 22, 2014 at 1:34 PM, Thomas Whiteway 
thomas.white...@metaswitch.commailto:thomas.white...@metaswitch.com wrote:
Hi,

I’m working on an application using a Cassandra (2.1.0) cluster where

-  our entire dataset is around 22GB

-  each node has 48GB of memory but only a single (mechanical) hard disk

-  in normal operation we have a low level of writes and no reads

-  very occasionally we need to read rows very fast (1.5K 
rows/second), and only read each row once.

When we try and read the rows it takes up to five minutes before Cassandra is 
able to keep up.  The problem seems to be that it takes a while to get the data 
into the page cache and until then Cassandra can’t retrieve the data from disk 
fast enough (e.g. if I drop the page cache mid-test then Cassandra slows down 
for the next 5 minutes).

Given that the total amount of should fit comfortably in memory I’ve been 
trying to find a way to keep the rows cached in memory but there doesn’t seem 
to be a particularly great way to achieve this.

I’ve tried enabling the row cache and pre-populating the test by querying every 
row before starting the load which gives good performance, but the row cache 
isn’t really intended to be used this way and we’d be fighting the row cache to 
keep the rows in (e.g. by cyclically reading through all the rows during normal 
operation).

Keeping the page cache warm by running a background task to keep accessing the 
files for the sstables would be simpler and currently this is the solution 
we’re leaning towards, but we have less control over the page cache, it would 
be vulnerable to other processes knocking Cassandra’s files out, and it 
generally feels like a bit of a hack.

Has anyone had any success with trying to do something similar to this or have 
any suggestions for possible solutions?

Thanks,
Thomas




RE: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Donald Smith
On the cassandra irc channel I discussed this question.  I learned that the 
timestamp in the Memtable may be OLDER than the timestamp in some SSTable 
(e.g., due to hints or retries).  So there’s no guarantee that the Memtable has 
the most recent version.

But there may be cases, they say, in which the time stamp in the SSTable can be 
used to skip over SSTables that have older data (via metadata on SSTables, I 
presume).

Memtable are like write-through caches and do NOT correspond to SSTables loaded 
from disk.

From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of 
Jonathan Haddad
Sent: Wednesday, October 22, 2014 9:24 AM
To: user@cassandra.apache.org
Subject: Re: Is cassandra smart enough to serve Read requests entirely from 
Memtables in some cases?

No.  Consider a scenario where you supply a timestamp a week in the future, 
flush it to sstable, and then do a write, with the current timestamp.  The 
record in disk will have a timestamp greater than the one in the memtable.

On Wed, Oct 22, 2014 at 9:18 AM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:
Question about the read path in cassandra.  If a partition/row is in the 
Memtable and is being actively written to by other clients,  will a READ of 
that partition also have to hit SStables on disk (or in the page cache)?  Or 
can it be serviced entirely from the Memtable?

If you select all columns (e.g., “select * from ….”)   then I can imagine that 
cassandra would need to merge whatever columns are in the Memtable with what’s 
in SStables on disk.

But if you select a single column (e.g., “select Name from ….  where id= ….”) 
and if that column is in the Memtable, I’d hope cassandra could skip checking 
the disk.  Can it do this optimization?

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866tel:425.201.3900%20x%203866
C: (206) 819-5965tel:%28206%29%20819-5965
F: (646) 443-2333tel:%28646%29%20443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]




--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Copy Error

2014-10-22 Thread Jeremy Franzen
Hey folks,

I am sure that this is a simple oversight on my part, but I just can not see 
the forest for the trees. Any ideas on this one?

copy strevus_data.strevus_metadata_data to 
'c:/temp/strevus/export/strevus_data.strevus_metadata_data.csv';
Bad Request: Undefined name 
0008081000
 in selection clause


Jeremy J. Franzen
VP Operations | Strevus
jeremy.fran...@strevus.com
T: +1.415.649.6234 | M: +1.408.726.4363
Compliance Made Easy.
... . -- .--. . .-. / ..-. ..




Re: Increasing size of Batch of prepared statements

2014-10-22 Thread Jens Rantil
Shabab,

Apologize for the late answer.

On Mon, Oct 6, 2014 at 2:38 PM, shahab shahab.mok...@gmail.com wrote:

 But do you mean that inserting columns with large size (let's say a text
 with 20-30 K) is potentially problematic in Cassandra?


AFAIK, the size _warning_ you are getting relates to the size of the batch
of prepared statements (INSERT INTO mykeyspace.mytable VALUES (?,?,?,?)).
That is, it has nothing to do with the actual content of your row. 20-30 K
shouldn't be a problem. But it's considered good practise to split larger
files (maybe  5 MB into chunks) since it makes operations easier to your
cluster more likely to spread more evenly across cluster.


 What shall i do if I want columns with large size?


Just don't insert to many rows in a single batch and you should be fine.
Like Shane's JIRA ticket said, the warning is to let you know you are not
following best practice when adding too many rows in a single batch. It can
create bottlenecks in a single Cassandra node.

Cheers,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink


Re: Performance Issue: Keeping rows in memory

2014-10-22 Thread Robert Coli
On Wed, Oct 22, 2014 at 4:34 AM, Thomas Whiteway 
thomas.white...@metaswitch.com wrote:

  I’m working on an application using a Cassandra (2.1.0) cluster where

  -  our entire dataset is around 22GB

 -  each node has 48GB of memory but only a single (mechanical)
 hard disk

 -  in normal operation we have a low level of writes and no reads

 -  very occasionally we need to read rows very fast (1.5K
 rows/second), and only read each row once.



 When we try and read the rows it takes up to five minutes before Cassandra
 is able to keep up.  The problem seems to be that it takes a while to get
 the data into the page cache and until then Cassandra can’t retrieve the
 data from disk fast enough (e.g. if I drop the page cache mid-test then
 Cassandra slows down for the next 5 minutes).


Use :

populate_io_cache_on_flush

It's designed for this case. flush in this case also includes the flush
that comes at the end of compaction.

Kevin Burton's (hi! :D) https://code.google.com/p/linux-ftools/ will help
you keep the SSTables in the page cache when f/e rebooting nodes.

=Rob


Re: Copy Error

2014-10-22 Thread Tyler Hobbs
What's your schema for that table, and what version of Cassandra are you
using?

On Wed, Oct 22, 2014 at 12:25 PM, Jeremy Franzen jeremy.fran...@strevus.com
 wrote:

   Hey folks,

  I am sure that this is a simple oversight on my part, but I just can not
 see the forest for the trees. Any ideas on this one?

  copy strevus_data.strevus_metadata_data to
 'c:/temp/strevus/export/strevus_data.strevus_metadata_data.csv';
 Bad Request: Undefined name
 0008081000
 in selection clause



 *Jeremy J. Franzen *VP Operations |
 *Strevus *jeremy.fran...@strevus.com
 *T:* +1.415.649.6234 | *M:* +1.408.726.4363
 * Compliance Made Easy.*
 ... . -- .--. . .-. / ..-. ..





-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Cassandra Restore data from snapshots and Different Counts

2014-10-22 Thread Robert Coli
On Wed, Oct 22, 2014 at 7:58 AM, Li, George guangxing...@pearson.com
wrote:

 I assume that you are restoring snapshot data onto a new ring with the
 same topology (i.e. if the old ring has n nodes, your new ring has n nodes
 also). I discussed this a consultant from DataStax, and he told me that I
 need to make sure each new node in the new ring need to have the same token
 list as the corresponding old node in the old ring. For example, if you are
 restoring snapshot from old node 1 onto new node 1, you need to make sure
 new node 1's token list is the same as the token list of the old node 1.
 This can be done by the following main steps:
 1. Run 'nodetool ring' on the old ring to find token list for each old
 node.
 2. Stop Cassandra in each new node.
 3. Modify new ring node 1's yaml file so 'initial_token' is the same as
 the token list of old node 1. Also, set auto_bootstrap to false.


For vnodes, you can use this handy one-liner to get a comma-delimited list
of tokens for the current node :

nodetool info -T | grep ^Token | awk '{ print $3 }' | tr \\n , | sed -e
's/,$/\n/'

=Rob
http://twitter.com/rcolidba


Re: Question on how to run incremental repairs

2014-10-22 Thread Robert Coli
On Wed, Oct 22, 2014 at 5:47 AM, Marcus Eriksson krum...@gmail.com wrote:


 no, if you get a corrupt sstable for example, you will need to run an old
 style repair on that node (without -inc).


As a general statement, if you get a corrupt SSTable, restoring it from a
backup (with the node down) should be done before repair.

=Rob
http://twitter.com/rcolidba


Re: Does Cassandra support running on Java 8?

2014-10-22 Thread Michael Shuler

On 10/22/2014 02:42 AM, Fredrik wrote:

Are there any official recomendations, validations/tests done with
Cassandra = 2.0 on Java 8?


We've been running JDK8 dtest jenkins jobs on the cassandra-2.1 branch 
for a while, I recently added a trunk_dtest_jdk8 job, and I just now 
added unit test jobs for those branches on JDK8, so they should finish 
in a bit.


https://cassci.datastax.com/search/?q=jdk8

--
Michael



Re: Does Cassandra support running on Java 8?

2014-10-22 Thread Michael Shuler

On 10/22/2014 03:14 PM, Michael Shuler wrote:

On 10/22/2014 02:42 AM, Fredrik wrote:

Are there any official recomendations, validations/tests done with
Cassandra = 2.0 on Java 8?


We've been running JDK8 dtest jenkins jobs on the cassandra-2.1 branch
for a while, I recently added a trunk_dtest_jdk8 job, and I just now
added unit test jobs for those branches on JDK8, so they should finish
in a bit.

https://cassci.datastax.com/search/?q=jdk8


(drop to http:// or accept the self-signed cert :) )

--
Michael



Re: Copy Error

2014-10-22 Thread Jeremy Franzen
We are dropping the Copy command and just going with the sstableloader command 
instead. Sorry for the noise on the channel.


Jeremy J. Franzen
VP Operations | Strevus
jeremy.fran...@strevus.com
T: +1.415.649.6234 | M: +1.408.726.4363
Compliance Made Easy.
... . -- .--. . .-. / ..-. ..



From: Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, October 22, 2014 at 12:02 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Copy Error

What's your schema for that table, and what version of Cassandra are you using?

On Wed, Oct 22, 2014 at 12:25 PM, Jeremy Franzen 
jeremy.fran...@strevus.commailto:jeremy.fran...@strevus.com wrote:
Hey folks,

I am sure that this is a simple oversight on my part, but I just can not see 
the forest for the trees. Any ideas on this one?

copy strevus_data.strevus_metadata_data to 
'c:/temp/strevus/export/strevus_data.strevus_metadata_data.csv';
Bad Request: Undefined name 
0008081000
 in selection clause


Jeremy J. Franzen
VP Operations | Strevus
jeremy.fran...@strevus.commailto:eremy.fran...@strevus.com
T: +1.415.649.6234tel:%2B1.415.649.6234 | M: 
+1.408.726.4363tel:%2B1.408.726.4363
Compliance Made Easy.
... . -- .--. . .-. / ..-. ..





--
Tyler Hobbs
DataStaxhttp://datastax.com/


Re: Copy Error

2014-10-22 Thread Tyler Hobbs
For the sake of fixing a potential bug, would you mind sharing your schema
and Cassandra version anyway?

On Wed, Oct 22, 2014 at 4:49 PM, Jeremy Franzen jeremy.fran...@strevus.com
wrote:

   We are dropping the Copy command and just going with the sstableloader
 command instead. Sorry for the noise on the channel.



 *Jeremy J. Franzen *VP Operations |
 *Strevus *jeremy.fran...@strevus.com
 *T:* +1.415.649.6234 | *M:* +1.408.726.4363
 * Compliance Made Easy.*
 ... . -- .--. . .-. / ..-. ..



   From: Tyler Hobbs ty...@datastax.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Wednesday, October 22, 2014 at 12:02 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Copy Error

   What's your schema for that table, and what version of Cassandra are
 you using?

 On Wed, Oct 22, 2014 at 12:25 PM, Jeremy Franzen 
 jeremy.fran...@strevus.com wrote:

   Hey folks,

  I am sure that this is a simple oversight on my part, but I just can
 not see the forest for the trees. Any ideas on this one?

  copy strevus_data.strevus_metadata_data to
 'c:/temp/strevus/export/strevus_data.strevus_metadata_data.csv';
 Bad Request: Undefined name
 0008081000
 in selection clause



 *Jeremy J. Franzen *VP Operations |
 *Strevus *jeremy.fran...@strevus.com
 *T:* +1.415.649.6234 | *M:* +1.408.726.4363
 * Compliance Made Easy.*
 ... . -- .--. . .-. / ..-. ..





 --
 Tyler Hobbs
 DataStax http://datastax.com/




-- 
Tyler Hobbs
DataStax http://datastax.com/


Cassandra Developer - Choice Hotels

2014-10-22 Thread Jeremiah Anderson
Hi,

I am hoping to get the word out that we are looking for a Cassandra 
Developerhttp://careers.choicehotels.com/careers/jobDetails.html?jobTitle=Cassandra+Developer
 for a full time position at our office in Scottsdale, AZ. Please let me know 
what I can do to let folks know we are looking :)

Thank you!!


Jeremiah Anderson | Sr. Recruiter
Choice Hotels International, Inc. (NYSE: CHH) | 
www.choicehotels.comhttp://www.choicehotels.com/
6811 E Mayo Blvd, Ste 100, Phoenix, AZ 85054
*: 602.494.6648 | *: 
jeremiah_ander...@choicehotels.commailto:jeremiah_ander...@choicehotels.com
[cid:image001.jpg@01CFEE1E.3C59EE70]