from:"aaron morton"

Re: Consolidating records and TTL

2014-06-05 Thread Aaron Morton

As Tyler says, with atomic batches which are enabled by default the cluster 
will keep trying to replay the insert / deletes. 

Nodes check their local batch log for failed batches, ones where the 
coordinator did not acknowledge it had successfully completed, every 60 
seconds. So there is a window where it’s possible for not all mutations in the 
batch to be completed. This could happen when a write timeout occurs when 
processing a batch of 2 rows; the request CL will not have been achieved on one 
or more of the rows. The coordinator will leave it up to the batch log to 
replay the request, and the client driver will (by default config) not retry. 

You can use a model like this. 

create table ledger (
account int, 
tx_id   timeuuid, 
sub_total   int,
primary key (account, tx_id)
);

create table account (
account int, 
total   int, 
last_tx_id  timeuuid, 
primary key (account)
);

To get the total:

select * from account where account = X;

Then get the ledger entries you need

select * from ledger where account = X and tx_id  last_tx_id;

This query will degrade when the partition size in the ledger table gets 
bigger, as it will need to read the column index (see column_index_size_in_kb 
in yaml). It will use that to find the first page that contains the rows we are 
interested in and then read forwards to the end of the row. It’s not the most 
efficient type of read but if you are going to delete ledger entries this 
*should* be able to skip over the tombstones without reading them. 

When you want to update the total in the account write to the account table and 
update both the total and the last_tx_id. You can then delete ledger entries if 
needed. Don’t forget to ensure that only one client thread is doing this at a 
time. 

Hope that helps. 
Aaron


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 5/06/2014, at 10:37 am, Tyler Hobbs ty...@datastax.com wrote:

 Just use an atomic batch that holds both the insert and deletes: 
 http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2
 
 
 On Tue, Jun 3, 2014 at 2:13 PM, Charlie Mason charlie@gmail.com wrote:
 Hi All.
 
 I have a system thats going to make possibly several concurrent changes to a 
 running total. I know I could use a counter for this. However I have extra 
 meta data I can store with the changes which would allow me to reply the 
 changes. If I use a counter and it looses some writes I can't recover it as I 
 will only have its current total not the extra meta data to know where to 
 replay from.
 
 What I was planning to do was write each change of the value to a CQL table 
 with a Time UUID as a row level primary key as well as a partition key. Then 
 when I need to read the running total back I will do a query for all the 
 changes and add them up to get the total.
 
 As there could be tens of thousands of these I want to have a period after 
 which these are consolidated. Most won't be any where near that but a few 
 will which I need to be able to support. So I was also going to have a 
 consolidated total table which holds the UUID of the values consolidated up 
 to. Since I can bound the query for the recent updates by the UUID I should 
 be able to avoid all the tombstones. So if the read encounters any changes 
 that can be consolidated it inserts a new consolidated value and deletes the 
 newly consolidated changes.
 
 What I am slightly worried about is what happens if the consolidated value 
 insert fails but the deletes to the change records succeed. I would be left 
 with an inconsistent total indefinitely. I have come up with a couple of 
 ideas:
 
 
 1, I could make it require all nodes to acknowledge it before deleting the 
 difference records.
 
 2, May be I could have another period after its consolidated but before its 
 deleted?
 
 3, Is there anyway I could use the TTL to allow to it to be deleted after a 
 period of time? Chances are another read would come in and fix the value.
 
 
 Anyone got any other suggestions on how I could implement this?
 
 
 Thanks,
 
 Charlie M
 
 
 
 -- 
 Tyler Hobbs
 DataStax

Re: Increased Cassandra connection latency

2014-05-29 Thread Aaron Morton

You’ll need to provide some more information such as: 

* Do you have monitoring on the cassandra cluster that shows the request 
latency ? Data Stax OpsCentre is  good starting point. 

* Is compaction keeping up ? Check with nodetool compactionstats

* Is the GCInspector logging about long running ParNew ? (it only logs when 
it’s longer than 200ms)

Cheers
Aaron

  
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 23/05/2014, at 10:35 pm, Alexey Sverdelov alexey.sverde...@googlemail.com 
wrote:

 Hi all,
 
 I've noticed increased latency on our tomcat REST-service (average 30ms, max 
  2sec). We are using Cassandra 1.2.16 with official DataStax Java driver 
 v1.0.3. 
 
 Our setup:
 
 * 2 DCs
 * each DC: 7 nodes
 * RF=5
 * Leveled compaction
 
 After cassandra restart on all nodes, the latencies are alright again 
 (average  5ms, max 50ms).
 
 Any thoughts are greatly appreciated.
 
 Thanks,
 Alexey

Re: What % of cassandra developers are employed by Datastax?

2014-05-29 Thread Aaron Morton

 The Cassandra Summit Bootcamp, Sep 12-13, immediately following the Summit, 
 might be interesting for potential contributors.
I’ll be there to help people get started. Looking forward to it.

While DS are the biggest contributor in time and patches, there are several 
other well known people and companies contributing and committing. 

IMHO level of community activity and support over the last 5ish years has been 
and will continue to be critical to the success of Cassandra, both Apache and 
DSE. Which is a polite way of saying there is *always* something an individual 
can do to contribute to the health of the project.

Cheers
Aaron 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 24/05/2014, at 7:28 am, Michael Shuler mich...@pbandjelly.org wrote:

 On 05/23/2014 01:23 PM, Peter Lin wrote:
 A separate but important consideration is long term health of a project.
 Many apache projects face this issue. When a project doesn't continually
 grow the contributors and committers, the project runs into issues in
 the long term. All open source projects see this, contributors and
 committers eventually leave, so it's important to continue to invite
 worthy contributors to become committers.
 
 The Cassandra Summit Bootcamp, Sep 12-13, immediately following the Summit, 
 might be interesting for potential contributors.
 
 -- 
 Michael

Re: Memory issue

2014-05-29 Thread Aaron Morton

   As soon as it starts, the JVM is get killed because of memory issue.
What is the memory issue that gets kills the JVM ? 

The log message below is simply a warning

 WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable to lock 
 JVM memory (ENOMEM).
 This can result in part of the JVM being swapped out, especially with mmapped 
 I/O enabled.
 Increase RLIMIT_MEMLOCK or run Cassandra as root.

Is there anything in the system logs ? 

Cheers
Aaron 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 24/05/2014, at 9:17 am, Robert Coli rc...@eventbrite.com wrote:

 On Fri, May 23, 2014 at 2:08 PM, opensaf dev opensaf...@gmail.com wrote:
 I have a different service which controls the cassandra service for high 
 availability.
 
 IMO, starting or stopping a Cassandra node should never be a side effect of 
 another system's properties. YMMV.
 
 https://issues.apache.org/jira/browse/CASSANDRA-2356
 
 For some related comments.
 
 =Rob

Re: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-28 Thread Aaron Morton

cold_reads_to_omit defaults to 0.0 which disabled the feature, so it may not
have been responsible in this case.

There are a couple of things that could explain the difference:

* after nodetool compaction there was one SSTable, so one -Filter.db file
rather than 8 that each had 700 entires. However 700 entries is not very many
so this would have been a small size on disk.

* Same story with the -Index.db files, they would have all had the same values
but that would not have been very with big with 700 entries. However with the
wide rows column indexes would have also been present in the -Index.db file.

* Compression may have been better. In the when you have one SSTable all the
columns for the row will be stored sequentially and it may have just had better
compression.

If most of the difference was in the -Data.db files I would guess compression,
nodetool cfstats will tell you the compression ratio.

Hope that helps.
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 23/05/2014, at 9:46 am, Phil Luckhurst phil.luckhu...@powerassure.com
wrote:

Hi Andreas,

So does that mean it can compact the 'hottest' partitions into a new sstable
but the old sstables may not immediately be removed so the same data could
be in more that one sstable? That would certainly explain the difference we
see when we manually run nodetool compact.

Thanks
Phil

Andreas Finke wrote
Hi Phil,

I found an interesting blog entry that may address your problem.

http://www.datastax.com/dev/blog/optimizations-around-cold-sstables

It seems that compaction is skipped for stables which so mit satisfy a
certain read rate. Please check.

Kind regards

Andreas Finke
Java Developer
Solvians IT-Solutions GmbH

Phil Luckhurst wrote

Definitely no TTL and records are only written once with no deletions.

Phil

DuyHai Doan wrote
Are you sure there is no TTL set on your data? It might explain the
shrink
in sstable size after compaction.

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594644.html
Sent from the

cassandra-user@.apache

mailing list archive at Nabble.com.

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594658.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.

Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-20 Thread Aaron Morton

 “between 1.2.6 and 2.0.6 the setInputRange(startToken, endToken) is not 
 working”
 Can you confirm or disprove?


My reading of the code is that it will consider the part of a token range (from 
vnodes or initial tokens) that overlap with the provided token range. 

 I’ve already got one confirmation that in C* version I use (1.2.15) setting 
 limits with setInputRange(startToken, endToken) doesn’t work.
Can you be more specific ?

 work only for ordered partitioners (in 1.2.15).

it will work with ordered and unordered partitioners equally. The difference is 
probably what you consider to “working” to mean.  The token ranges are handled 
the same, it’s the row in them that changes. 

Cheers
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/05/2014, at 11:37 am, Anton Brazhnyk anton.brazh...@genesys.com wrote:

 Hi Aaron,
  
 I’ve seen the code which you describe (working with splits and intersections) 
 but that range is derived from keys and work only for ordered partitioners 
 (in 1.2.15).
 I’ve already got one confirmation that in C* version I use (1.2.15) setting 
 limits with setInputRange(startToken, endToken) doesn’t work.
 “between 1.2.6 and 2.0.6 the setInputRange(startToken, endToken) is not 
 working”
 Can you confirm or disprove?
  
 WBR,
 Anton
  
 From: Aaron Morton [mailto:aa...@thelastpickle.com] 
 Sent: Monday, May 19, 2014 1:58 AM
 To: Cassandra User
 Subject: Re: Cassandra token range support for Hadoop 
 (ColumnFamilyInputFormat)
  
 The limit is just ignored and the entire column family is scanned.
 Which limit ? 
 
 
 1. Am I right that there is no way to get some data limited by token range 
 with ColumnFamilyInputFormat?
 From what I understand setting the input range is used when calculating the 
 splits. The token ranges in the cluster are iterated and if they intersect 
 with the supplied range the overlapping range is used to calculate the split. 
 Rather than the full token range. 
  
 2. Is there other way to limit the amount of data read from Cassandra with 
 Spark and ColumnFamilyInputFormat,
 so that this amount is predictable (like 5% of entire dataset)?
 if you suppled a token range is that is 5% of the possible range of values 
 for the token that should be close to a random 5% sample. 
  
  
 Hope that helps. 
 Aaron
  
 -
 Aaron Morton
 New Zealand
 @aaronmorton
  
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
  
 On 14/05/2014, at 10:46 am, Anton Brazhnyk anton.brazh...@genesys.com wrote:
 
 
 Greetings,
 
 I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd 
 like to read just part of it - something like Spark's sample() function.
 Cassandra's API seems allow to do it with its 
 ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method, 
 but it doesn't work.
 The limit is just ignored and the entire column family is scanned. It seems 
 this kind of feature is just not supported 
 and sources of AbstractColumnFamilyInputFormat.getSplits confirm that (IMO).
 Questions:
 1. Am I right that there is no way to get some data limited by token range 
 with ColumnFamilyInputFormat?
 2. Is there other way to limit the amount of data read from Cassandra with 
 Spark and ColumnFamilyInputFormat,
 so that this amount is predictable (like 5% of entire dataset)?
 
 
 WBR,
 Anton

Re: CQL 3 and wide rows

2014-05-20 Thread Aaron Morton

In a CQL 3 table the only **column** names are the ones defined in the table, 
in the example below there are three column names. 


 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.

Internally there may be more **cells** ( as we now call the internal columns). 
In the example above each value for row_key will create a single partition (as 
we now call internal storage engine rows). In each of those partitions there 
will be cells for each CQL 3 row that has the same row_key, those cells will 
use a Composite for the name. The first part of the composite will be the value 
of the wide_row_column and the second will be the literal name of the non 
primary key columns. 

IMHO Wide partitions (storage engine rows) are more prevalent in CQL3 than 
thrift models. 

 But still - I do not see Iteration, so it looks to me that CQL 3 is limited 
 when compared to CLI/Hector.
Now days you can do pretty much everything you can in cli. Provide an example 
and we may be able to help. 

Cheers
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/05/2014, at 8:18 am, Maciej Miklas mac.mik...@gmail.com wrote:

 Hi James,
 
 Clustering is based on rows. I think that you meant not clustering columns, 
 but compound columns. Still all columns belong to single table and are stored 
 within single folder on one computer. And it looks to me (but I’am not sure) 
 that CQL 3 driver loads all column names into memory - which is confusing to 
 me. From one side we have wide row, but we load whole into ram…..
 
 My understanding of wide row is a row that supports millions of columns, or 
 similar things like map or set. In CLI you would generate column names (or 
 use compound columns) to simulate set or map,  in CQL 3 you would use some 
 static names plus Map or Set structures, or you could still alter table and 
 have large number of columns. But still - I do not see Iteration, so it looks 
 to me that CQL 3 is limited when compared to CLI/Hector.
 
 
 Regards,
 Maciej
 
 On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com wrote:
 
 Maciej,
 
 In CQL3 wide rows are expected to be created using clustering columns.  So 
 while the schema will have a relatively smaller number of named columns, the 
 effect is a wide row.  For example:
 
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.
 
 James
 From: Maciej Miklas mac.mik...@gmail.com
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
 
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
 
 Cassandra should support wide rows, meaning tables with millions of columns. 
 Knowing that, I would expect kind of iterator for column names. Am I missing 
 something here? 
 
 
 Regards,
 Maciej Miklas

Re: idempotent counters

2014-05-19 Thread Aaron Morton

 Does anybody else use another technique for achieving this idempotency with 
 counters?

The idempotency problem with counters has to do with what will happen when you 
get a timeout. If you reply the write there is a chance of the increment been 
applied twice. This is inherent in the current design. 

Cheers
Aaron 

  
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 9/05/2014, at 1:07 am, Jabbar Azam aja...@gmail.com wrote:

 Hello,
 
 Do people use counters when they want to have idempotent operations in 
 cassandra?
 
 I have a use case for using a counter to check for a count of objects in a 
 partition. If the counter is more than some value then the data in the 
 partition is moved into two different partitions. I can't work out how to do 
 this splitting and recover if a problem happens during modification of the 
 counter.
 
 http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2
  explains that counters shouldn't be used if you want idempotency. I would 
 agree, but the alternative  is not very elegant. I would have to manully 
 count the objects in a partition and then move the data and repeat the 
 operation if something went wrong.
 
 It is less resource intensive to read a counter value to see if a partition 
 needs splitting then to read all the objects in a partition. The counter 
 value can be stored in its own table sorting in descending order of the 
 counter value.
 
 Does anybody else use another technique for achieving this idempotency with 
 counters?
 
 I'm using cassandra 2.0.7.
 
 
 
 Thanks
 
 Jabbar Azam

Re: Effect of number of keyspaces on write-throughput....

2014-05-19 Thread Aaron Morton

 Each client is writing to a separate keyspace simultaneously. Hence, is there 
 a lot of switching of keyspaces?
 
 
I would think not. If the client app is using one keyspace per connection there 
should be no reason for the driver to change keyspaces. 

 
  But, I observed that when using a single keyspace, the write throughout 
 reduced slightly to 1800pkts/sec while I actually expected it to increase 
 since there is no switching of contexts now. Why is this so? 
 
 

That’s a 5% change which is close enough to be ignored. 

I would guess that the clients are not doing anything that requires the driver 
to change the keyspace for the connection. 

  Can you also kindly explain how factors like using a single v/s 
 multiple keyspaces, distributing write requests to a single cassandra node 
 v/s multiple cassandra nodes, etc. affect the write throughput? 
 
 
Normally you have one keyspace per application. And the best data models are 
ones where the throughput improves as the number of nodes increases. This 
happens when there are no “hot spots” where every / most web requests need to 
read or write to a particular row. 

In general you can improve throughput by having more client threads hitting 
more machines. You can expect 3,000 to 4,000 non counter writes per code per 
node. 

Hope that helps. 
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 13/05/2014, at 1:02 am, Krishna Chaitanya bnsk1990r...@gmail.com wrote:

 Hello,
 Thanks for the reply. Currently, each client is writing about 470 packets per 
 second where each packet is 1500 bytes. I have four clients writing 
 simultaneously to the cluster. Each client is writing to a separate keyspace 
 simultaneously. Hence, is there a lot of switching of keyspaces?
 
 The total throughput is coming to around 1900 packets per second when 
 using multiple keyspaces. This is because there are 4 clients and each one is 
 writing around 470 pkts/sec. But, I observed that when using a single 
 keyspace, the write throughout reduced slightly to 1800pkts/sec while I 
 actually expected it to increase since there is no switching of contexts now. 
 Why is this so?  470 packets is the maximum I can write from each client 
 currently, since it is the limitation of my client program.
 I should also mention that these tests are being run on a 
 single and double node clusters with all  the write requests going only to a 
 single cassandra server.
 
  Can you also kindly explain how factors like using a single v/s 
 multiple keyspaces, distributing write requests to a single cassandra node 
 v/s multiple cassandra nodes, etc. affect the write throughput?  Are there 
 any other factors that affect write throughput other than these?  Because, a 
 single cassandra node seems to be able to handle all these write requests as 
 I am not able to see any significant improvement by distributing write 
 requests among multiple nodes.
 
 Thanking you.
 
 
 On May 12, 2014 2:39 PM, Aaron Morton aa...@thelastpickle.com wrote:
 On the homepage of libQtCassandra, its mentioned that switching between 
 keyspaces is costly when storing into Cassandra thereby affecting the write 
 throughput. Is this necessarily true for other libraries like pycassa and 
 hector as well?
 
 
 When using the thrift connection the keyspace is a part of the connection 
 state, so changing keyspaces requires a round trip to the server. Not hugely 
 expensive, but it adds up if you do it a lot. 
 
 Can I increase the write throughput by configuring all the 
 clients to store in a single keyspace instead of multiple keyspaces to 
 increase the write throughput?
 
 
 You should expect to get 3,000 to 4,000 writes per core per node. 
 
 What are you getting now?
 
 Cheers
 A
 
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 11/05/2014, at 4:06 pm, Krishna Chaitanya bnsk1990r...@gmail.com wrote:
 
 Hello,
 I have an application that writes network packets to a Cassandra cluster 
 from a number of client nodes. It uses the libQtCassandra library to access 
 Cassandra. On the homepage of libQtCassandra, its mentioned that switching 
 between keyspaces is costly when storing into Cassandra thereby affecting 
 the write throughput. Is this necessarily true for other libraries like 
 pycassa and hector as well?
 Can I increase the write throughput by configuring all the 
 clients to store in a single keyspace instead of multiple keyspaces to 
 increase the write throughput?
 
 Thankyou.

Re: Schema errors when bootstrapping / restarting node

2014-05-19 Thread Aaron Morton

 I am able to fix this error by clearing out the schema_columns system table 
 on disk.  After that, a node can boot successfully.
 
 Does anyone have a clue what's going on here?

Something has come corrupted in the system tables as you say. 

A less aggressive way to reset the local schema is to use nodetool 
resetlocalschema on the nodes that you suspect as having problems. 

 ERROR [InternalResponseStage:5] 2014-05-05 23:56:03,786 CassandraDaemon.java 
 (line 191) Exception in thread Thread[InternalResponseStage:5,5,main]
 org.apache.cassandra.db.marshal.MarshalException: cannot parse 'column1' as 
 hex bytes
 at 
 org.apache.cassandra.db.marshal.BytesType.fromString(BytesType.java:69)
 at 
 org.apache.cassandra.config.ColumnDefinition.fromSchema(ColumnDefinition.java:231)
 at 
 org.apache.cassandra.config.CFMetaData.addColumnDefinitionSchema(CFMetaData.java:1524)
 at 
 org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1456)

This looks like a secondary index has been incorrectly defined via thrift. I 
would guess the comparator for the CF is BytesType and you have defined an 
index on a column and specified the column name as “column1” which is not a 
valid hex value. 

You should be able to fix this by dropping the index or dropping the CF. 

Hope that helps. 
Aaron



-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 14/05/2014, at 2:18 am, Adam Cramer a...@bn.co wrote:

 Hi All,
 
 I'm having some major issues bootstrapping a new node to my cluster.  We are 
 running 1.2.16, with vnodes enabled.
 
 When a new node starts up (with auto_bootstrap), it selects a host ID and 
 finds the ring successfully:
 
 INFO 18:42:29,559 JOINING: waiting for ring information
 
 It successfully selects a set of tokens.  Then the weird stuff begins.  I get 
 this error once, while the node is reading the system keyspace:
 
 ERROR 18:42:32,921 Exception in thread Thread[InternalResponseStage:1,5,main]
 java.lang.NullPointerException
   at 
 org.apache.cassandra.utils.ByteBufferUtil.toLong(ByteBufferUtil.java:421)
   at org.apache.cassandra.cql.jdbc.JdbcLong.compose(JdbcLong.java:94)
   at org.apache.cassandra.db.marshal.LongType.compose(LongType.java:34)
   at 
 org.apache.cassandra.cql3.UntypedResultSet$Row.getLong(UntypedResultSet.java:138)
   at 
 org.apache.cassandra.db.SystemTable.migrateKeyAlias(SystemTable.java:199)
   at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:346)
   at 
 org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:66)
   at 
 org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:47)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 
 
 But it doesn't stop the bootstrap process.  The node successfully handshakes 
 versions, and pauses before bootstrapping:
 
 
  INFO 18:42:59,564 JOINING: schema complete, ready to bootstrap
  INFO 18:42:59,565 JOINING: waiting for pending range calculation
  INFO 18:42:59,565 JOINING: calculation complete, ready to bootstrap
  INFO 18:42:59,565 JOINING: getting bootstrap token
  INFO 18:42:59,705 JOINING: sleeping 3 ms for pending range setup
 
 
 After 30 seconds, I get a flood of endless 
 org.apache.cassandra.db.UnknownColumnFamilyException errors, and all other 
 nodes in the cluster log the following endlessly:
 
 INFO [HANDSHAKE-/x.x.x.x] 2014-05-09 18:44:36,289 OutboundTcpConnection.java 
 (line 418) Handshaking version with /x.x.x.x
 
 
 I suspect there may be something wrong with my schemas.  Sometimes while 
 restarting an existing node, the node will fail to restart, with the 
 following error, again while reading the system keyspace:
 
 ERROR [InternalResponseStage:5] 2014-05-05 23:56:03,786 CassandraDaemon.java 
 (line 191) Exception in thread Thread[InternalResponseStage:5,5,main]
 org.apache.cassandra.db.marshal.MarshalException: cannot parse 'column1' as 
 hex bytes
 at 
 org.apache.cassandra.db.marshal.BytesType.fromString(BytesType.java:69)
 at 
 org.apache.cassandra.config.ColumnDefinition.fromSchema(ColumnDefinition.java:231)
 at 
 org.apache.cassandra.config.CFMetaData.addColumnDefinitionSchema(CFMetaData.java:1524)
 at 
 org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1456)
 at 
 org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:306)
 at 
 org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:444)
 at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:356

Re: Query returns incomplete result

2014-05-19 Thread Aaron Morton

Calling execute the second time runs the query a second time, and it looks like 
the query mutates instance state during the pagination. 

What happens if you only call execute() once ? 

Cheers
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 8/05/2014, at 8:03 pm, Lu, Boying boying...@emc.com wrote:

 Hi, All,
  
 I use the astyanax 1.56.48 + Cassandra 2.0.6 in my test codes and do some 
 query like this:
  
 query = keyspace.prepareQuery(..).getKey(…)
 .autoPaginate(true)
 .withColumnRange(new RangeBuilder().setLimit(pageSize).build());
  
 ColumnListIndexColumnName result;
 result= query.execute().getResult();
 while (!result.isEmpty()) {
 //handle result here
 result= query.execute().getResult();
 }
  
 There are 2003 records in the DB, if the pageSize is set to 1100, I get only 
 2002 records back.
 and if the pageSize is set to 3000, I can get the all 2003 records back.
  
 Does anyone know why? Is it a bug?
  
 Thanks
  
 Boying

Re: Datacenter understanding question

2014-05-19 Thread Aaron Morton

Depends on how you have setup the replication. 

If you are using SimpleStrategy with RF 1, then there will be a single copy of 
each row in the cluster. 

If you are using the NetworkTopologyStrategy with RF 1 in each DC then there 
will be two copies of each row in the cluster. One in each DC. 

Hope that helps. 
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 15/05/2014, at 3:55 am, Mark Farnan devm...@petrolink.com wrote:

 Yes they will
  
 From: ng [mailto:pipeli...@gmail.com] 
 Sent: Tuesday, May 13, 2014 11:07 PM
 To: user@cassandra.apache.org
 Subject: Datacenter understanding question
  
 If I have configuration of two data center with one node each.
 Replication factor is also 1.
 Will these 2 nodes going to be mirrored/replicated?

Re: Cassandra counter column family performance

2014-05-19 Thread Aaron Morton

 I get a lot of TExceptions
What are the exceptions ?

In general counters are slower than writes, but that does not lead them to fail 
like that. 

Check the logs for errors and/or messages from the GCInspector saying the 
garbage collection is going on. 

Cheers
A

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 13/05/2014, at 9:51 pm, Batranut Bogdan batra...@yahoo.com wrote:

 Hello all,
 
 I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter, c 
 counter, d counter
 After inserting a few million keys... 55 mil, the performance goes down the 
 drain, 2-3 nodes in the cluster are on medium load, and when inserting 
 batches of same lengths writes take longer and longer until the whole cluster 
 becomes loaded and I get a lot of TExceptions... and the cluster becomes 
 unresponsive.
 
 Did anyone have the same problem?
 Feel free to comment and share experiences about counter CF performance.

Re: Question about READS in a multi DC environment.

2014-05-19 Thread Aaron Morton

In this case I was not thinking about what was happening synchronous to client 
request, only that the request was hitting all nodes. 

You are right, when reading at LOCAL_ONE the coordinator will only be blocking 
for one response (the data response). 

Cheers
Aaron
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 14/05/2014, at 11:36 am, graham sanderson gra...@vast.com wrote:

 Yeah, but all the requests for data/digest are sent at the same time… 
 responses that aren’t “needed” to complete the request are dealt with 
 asynchronously (possibly causing repair). 
 
 In the original trace (which is confusing because I don’t think the clocks 
 are in sync)… I don’t see anything that makes me believe it is blocking for 
 all 3 responses - It actually does reads on all 3 nodes even if only digests 
 are required
 
 On May 12, 2014, at 12:37 AM, DuyHai Doan doanduy...@gmail.com wrote:
 
 Ins't read repair supposed to be done asynchronously in background ?
 
 
 On Mon, May 12, 2014 at 2:07 AM, graham sanderson gra...@vast.com wrote:
 You have a read_repair_chance of 1.0 which is probably why your query is 
 hitting all data centers.
 
 On May 11, 2014, at 3:44 PM, Mark Farnan devm...@petrolink.com wrote:
 
  Im trying to understand READ load in Cassandra across a multi-datacenter 
  cluster.   (Specifically why it seems to be hitting more than one DC) and 
  hope someone can help.
 
  From what Iím seeing here, a READ, with Consistency LOCAL_ONE,   seems to 
  be hitting All 3 datacenters, rather than just the one Iím connected to.   
  I see  'Read 101 live and 0 tombstoned cells'  from EACH of the 3 DCs in 
  the trace, which seems, wrong.
  I have tried every  Consistency level, same result.   This also is same 
  from my C# code via the DataStax driver, (where I first noticed the issue).
 
  Can someone please shed some light on what is occurring ?  Specifically I 
  dont' want a query on one DC, going anywhere near the other 2 as a rule, 
  as in production,  these DC's will be accross slower links.
 
 
  Query:  (NOTE:  Whilst this uses a kairosdb table,  i'm just playing with 
  queries against it as it has 100k columns in this key for testing).
 
  cqlsh:kairosdb consistency local_one
  Consistency level set to LOCAL_ONE.
 
  cqlsh:kairosdb select * from data_points where key = 
  0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f6963653a
   limit 1000;
 
  ... Some return data  rows listed here which I've removed 
 
  CassandraQuery.txt
  Query Respose Trace:
 
  activity   
| timestamp  
| source | source_elapsed
  --+--++
 
 execute_cql3_query | 
  07:18:12,692 | 192.168.25.111 |  0
 
  Message received from /192.168.25.111 | 
  07:18:00,706 | 192.168.25.131 | 50
 
Executing single-partition query on data_points | 
  07:18:00,707 | 192.168.25.131 |760
 
   Acquiring sstable references | 
  07:18:00,707 | 192.168.25.131 |814
 
Merging memtable tombstones | 
  07:18:00,707 | 192.168.25.131 |924
 
   Bloom filter allows skipping sstable 191 | 
  07:18:00,707 | 192.168.25.131 |   1050
 
   Bloom filter allows skipping sstable 190 | 
  07:18:00,707 | 192.168.25.131 |   1166
 
  Key cache hit for sstable 189 | 
  07:18:00,707 | 192.168.25.131 |   1275
 
Seeking to partition beginning in data file | 
  07:18:00,707 | 192.168.25.131 |   1293
 Skipped 0/3 
  non-slice-intersecting sstables, included 0 due to tombstones | 
  07:18:00,708 | 192.168.25.131 |   2173

Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-19 Thread Aaron Morton

 The limit is just ignored and the entire column family is scanned.
Which limit ? 

 1. Am I right that there is no way to get some data limited by token range 
 with ColumnFamilyInputFormat?
From what I understand setting the input range is used when calculating the 
splits. The token ranges in the cluster are iterated and if they intersect with 
the supplied range the overlapping range is used to calculate the split. Rather 
than the full token range. 

 2. Is there other way to limit the amount of data read from Cassandra with 
 Spark and ColumnFamilyInputFormat,
 so that this amount is predictable (like 5% of entire dataset)?
if you suppled a token range is that is 5% of the possible range of values for 
the token that should be close to a random 5% sample. 


Hope that helps. 
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 14/05/2014, at 10:46 am, Anton Brazhnyk anton.brazh...@genesys.com wrote:

 Greetings,
 
 I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd 
 like to read just part of it - something like Spark's sample() function.
 Cassandra's API seems allow to do it with its 
 ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method, 
 but it doesn't work.
 The limit is just ignored and the entire column family is scanned. It seems 
 this kind of feature is just not supported 
 and sources of AbstractColumnFamilyInputFormat.getSplits confirm that (IMO).
 Questions:
 1. Am I right that there is no way to get some data limited by token range 
 with ColumnFamilyInputFormat?
 2. Is there other way to limit the amount of data read from Cassandra with 
 Spark and ColumnFamilyInputFormat,
 so that this amount is predictable (like 5% of entire dataset)?
 
 
 WBR,
 Anton

Re: Disable reads during node rebuild

2014-05-15 Thread Aaron Morton

 As of 2.0.7, driftx has added this long-requested feature.
Thanks

A
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 13/05/2014, at 9:36 am, Robert Coli rc...@eventbrite.com wrote:

 On Mon, May 12, 2014 at 10:18 AM, Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com wrote:
 Is there a way to disable reads from a node while performing rebuild from 
 another datacenter? I tried starting the node in write survery mode, but the 
 nodetool rebuild command does not work in this mode.
 
 As of 2.0.7, driftx has added this long-requested feature.
 
 https://issues.apache.org/jira/browse/CASSANDRA-6961
 
 Note that it is impossible to completely close the race window here as long 
 as writes are incoming, this functionality just dramatically shortens it.
 
 =Rob

Re: How long are expired values actually returned?

2014-05-15 Thread Aaron Morton

 Is this normal or am I doing something wrong?.
probably  this one. 

But the TTL is set based on the system clock on the server, first through would 
be to check the times are correct. 

If that fails, send over the schema and the insert. 

Cheers
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 9/05/2014, at 2:44 am, Sebastian Schmidt isib...@gmail.com wrote:

 Hi,
 
 I'm using the TTL feature for my application. In my tests, when using a
 TTL of 5, the inserted rows are still returned after 7 seconds, and
 after 70 seconds. Is this normal or am I doing something wrong?.
 
 Kind Regards,
 Sebastian

Re: How to balance this cluster out ?

2014-05-15 Thread Aaron Morton

This is not a problem with the token assignments. Here is the ideal assignments 
from the tools/bin/token-generator script 

DC #1:
  Node #1:0
  Node #2:   56713727820156410577229101238628035242
  Node #3:  113427455640312821154458202477256070484

You are pretty close, but the order of the nodes in the output is a little odd, 
would normally expect node 2 to be first.

First step would be to check the logs on 1 to see if it’s failing at 
compaction, and to check if it’s holding a lot of hints. Then make sure repair 
is running so the data is distributed. 

Hope that helps. 
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/05/2014, at 11:58 pm, Oleg Dulin oleg.du...@gmail.com wrote:

 I have a cluster that looks like this:
 
 Datacenter: us-east
 ==
 Replicas: 2
 
 Address RackStatus State   LoadOwns   
 Token
 
 113427455640312821154458202477256070484
 *.*.*.1   1b  Up Normal  141.88 GB   66.67% 
 56713727820156410577229101238628035242
 *.*.*.2  1a  Up Normal  113.2 GB66.67%  210
 *.*.*.3   1d  Up Normal  102.37 GB   66.67% 
 113427455640312821154458202477256070484
 
 
 Obviously, the first node in 1b has 40% more data than the others. If I 
 wanted to rebalance this cluster, how would I go about that ? Would shifting 
 the tokens accomplish what I need and which tokens ?
 
 Regards,
 Oleg

Re: Disable reads during node rebuild

2014-05-13 Thread Aaron Morton

I'm not able to replace a dead node using the ordinary procedure
(boostrap+join), and would like to rebuild the replacement node from another
DC.
Normally when you want to add a new DC to the cluster the command to use is
nodetool rebuild $DC_NAME .(with auto_bootstrap: false) That will get the node
to stream data from the $DC_NAME

The problem is that if I start a node with auto_bootstrap=false to perform
the rebuild, it automatically starts serving empty reads (CL=LOCAL_ONE).

When adding a new DC the nodes wont be processing reads, that is not the case
for you.

You should disable the client API’s to prevent the clients from calling the new
nodes, use -Dcassandra.start_rpc=false and
-Dcassandra.start_native_transport=false in cassandra-env.sh or appropriate
settings in cassandra.yaml

Disabling reads from other nodes will be harder. IIRC during bootstrap a
different timeout (based on ring_delay) is used to detect if the bootstrapping
node is down. However if the node is running and you use nodetool rebuild i’m
pretty sure the normal gossip failure detectors will kick in. Which means you
cannot disable gossip to prevent reads. Also we would want the node to be up
for writes.

But what you can do is artificially set the severity of the node high so the
dynamic snitch will route around it. See
https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/locator/DynamicEndpointSnitchMBean.java#L37

* Set the value to something high on the node you will be rebuilding, the
number or cores on the system should do. (jmxterm is handy for this
http://wiki.cyclopsgroup.org/jmxterm)
* Check nodetool gossipinfo on the other nodes to see the SEVERITY app state
has propagated.
* Watch completed ReadStage tasks on the node you want to rebuild. If you have
read repair enabled it will still get some traffic.
* Do rebuild
* Reset severity to 0

Hope that helps.
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 13/05/2014, at 5:18 am, Paulo Ricardo Motta Gomes
paulo.mo...@chaordicsystems.com wrote:

Hello,

I'm not able to replace a dead node using the ordinary procedure
(boostrap+join), and would like to rebuild the replacement node from another
DC. The problem is that if I start a node with auto_bootstrap=false to
perform the rebuild, it automatically starts serving empty reads
(CL=LOCAL_ONE).

Is there a way to disable reads from a node while performing rebuild from
another datacenter? I tried starting the node in write survery mode, but the
nodetool rebuild command does not work in this mode.

Thanks,

--
Paulo Motta

Chaordic | Platform
www.chaordic.com.br
+55 48 3232.3200

Re: Question about READS in a multi DC environment.

2014-05-12 Thread Aaron Morton

   read_repair_chance=1.00 AND

There’s your problem. 

When read repair is active for a read request the coordinator will over read to 
all UP replicas. Your client request will only block waiting for the one 
request (the data request), the rest of the repair will happen in the 
background. Setting this to 1.0 will mean it’s active across the entire cluster 
for each read. 

Change read_repair_chance to 0 and set dclocal_read_repair_chance to 0.1 so 
that read repair will only happen local to the DC you are connected to.

Hope that helps. 
A


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/05/2014, at 5:37 pm, DuyHai Doan doanduy...@gmail.com wrote:

 Ins't read repair supposed to be done asynchronously in background ?
 
 
 On Mon, May 12, 2014 at 2:07 AM, graham sanderson gra...@vast.com wrote:
 You have a read_repair_chance of 1.0 which is probably why your query is 
 hitting all data centers.
 
 On May 11, 2014, at 3:44 PM, Mark Farnan devm...@petrolink.com wrote:
 
  Im trying to understand READ load in Cassandra across a multi-datacenter 
  cluster.   (Specifically why it seems to be hitting more than one DC) and 
  hope someone can help.
 
  From what Iím seeing here, a READ, with Consistency LOCAL_ONE,   seems to 
  be hitting All 3 datacenters, rather than just the one Iím connected to.   
  I see  'Read 101 live and 0 tombstoned cells'  from EACH of the 3 DCs in 
  the trace, which seems, wrong.
  I have tried every  Consistency level, same result.   This also is same 
  from my C# code via the DataStax driver, (where I first noticed the issue).
 
  Can someone please shed some light on what is occurring ?  Specifically I 
  dont' want a query on one DC, going anywhere near the other 2 as a rule, as 
  in production,  these DC's will be accross slower links.
 
 
  Query:  (NOTE:  Whilst this uses a kairosdb table,  i'm just playing with 
  queries against it as it has 100k columns in this key for testing).
 
  cqlsh:kairosdb consistency local_one
  Consistency level set to LOCAL_ONE.
 
  cqlsh:kairosdb select * from data_points where key = 
  0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f6963653a
   limit 1000;
 
  ... Some return data  rows listed here which I've removed 
 
  CassandraQuery.txt
  Query Respose Trace:
 
  activity
   | timestamp
  | source | source_elapsed
  --+--++
  
execute_cql3_query | 07:18:12,692 
  | 192.168.25.111 |  0
  
 Message received from /192.168.25.111 | 07:18:00,706 
  | 192.168.25.131 | 50
  
   Executing single-partition query on data_points | 07:18:00,707 
  | 192.168.25.131 |760
  
  Acquiring sstable references | 07:18:00,707 
  | 192.168.25.131 |814
  
   Merging memtable tombstones | 07:18:00,707 
  | 192.168.25.131 |924
  
  Bloom filter allows skipping sstable 191 | 07:18:00,707 
  | 192.168.25.131 |   1050
  
  Bloom filter allows skipping sstable 190 | 07:18:00,707 
  | 192.168.25.131 |   1166
  
 Key cache hit for sstable 189 | 07:18:00,707 
  | 192.168.25.131 |   1275
  
   Seeking to partition beginning in data file | 07:18:00,707 
  | 192.168.25.131 |   1293
 Skipped 0/3 
  non-slice-intersecting sstables, included 0 due to tombstones | 
  07:18:00,708 | 192.168.25.131 |   2173
  
Merging data from memtables and 1 sstables | 07:18:00,708 
  | 192.168.25.131 |   2195

Re: Effect of number of keyspaces on write-throughput....

2014-05-12 Thread Aaron Morton

 On the homepage of libQtCassandra, its mentioned that switching between 
 keyspaces is costly when storing into Cassandra thereby affecting the write 
 throughput. Is this necessarily true for other libraries like pycassa and 
 hector as well?
 
 
When using the thrift connection the keyspace is a part of the connection 
state, so changing keyspaces requires a round trip to the server. Not hugely 
expensive, but it adds up if you do it a lot. 

 Can I increase the write throughput by configuring all the 
 clients to store in a single keyspace instead of multiple keyspaces to 
 increase the write throughput?
 
 
You should expect to get 3,000 to 4,000 writes per core per node. 

What are you getting now?

Cheers
A

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 11/05/2014, at 4:06 pm, Krishna Chaitanya bnsk1990r...@gmail.com wrote:

 Hello,
 I have an application that writes network packets to a Cassandra cluster from 
 a number of client nodes. It uses the libQtCassandra library to access 
 Cassandra. On the homepage of libQtCassandra, its mentioned that switching 
 between keyspaces is costly when storing into Cassandra thereby affecting the 
 write throughput. Is this necessarily true for other libraries like pycassa 
 and hector as well?
 Can I increase the write throughput by configuring all the 
 clients to store in a single keyspace instead of multiple keyspaces to 
 increase the write throughput?
 
 Thankyou.

Re: Cassandra MapReduce/Storm/ etc

2014-05-12 Thread Aaron Morton

 Is there a good blog/article that describes how using MapReduce on Cassandra 
 table ?
The best way to get into cassandra and hadoop is to play with Cassandra DSE. 

It’s free for development, costs for production, and is an easy way to learn 
about hadoop integration without having to worry about the installation process.

http://www.datastax.com/docs/datastax_enterprise3.1/solutions/about_hadoop

 If a database table is input source for MapReduce or Storm, for me , this is 
 in the simple case, is translating to a full table scan of the input table, 
 which can timeout and is generally not a recommended access pattern in 
 Cassandra. 
The Hadoop integration is token aware, it splits the tasks to run local on the 
node. The tasks then scan over the token range local to the node. 

Hope that helps. 
A

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 9/05/2014, at 9:43 am, Manoj Khangaonkar khangaon...@gmail.com wrote:

 Hi,
 
 Searching for Cassandra with MapReduce, I am finding that the search results 
 are really dated -- from version 0.7  2010/2011.
 
 Is there a good blog/article that describes how using MapReduce on Cassandra 
 table ?
 
 From my naive understanding, Cassandra is all about partitioning. Querying is 
 based on partitionkey + clustered column(s).
 
 Inputs to MapReduce is a sequence of Key,values. For Storm it is a stream of 
 tuples.
 
 If a database table is input source for MapReduce or Storm, for me , this is 
 in the simple case, is translating to a full table scan of the input table, 
 which can timeout and is generally not a recommended access pattern in 
 Cassandra. 
 
 My initial reaction is that if I need to process data with MapReduce or 
 Storm, reading it from Cassandra might not be the optimal way. Storing the 
 output to Cassandra however does make sense.
 
 If anyone had links to blogs or personal experience in this area, I would 
 appreciate if you can share it.
 
 regards

Re: Really need some advices on large data considerations

2014-05-12 Thread Aaron Morton

 We've learned that compaction strategy would be an important point cause 
 we've ran into 'no space' trouble because of the 'sized tiered'  compaction 
 strategy.
If you want to get the most out of the raw disk space LCS is the way to go, 
remember it uses approximately twice the disk IO. 

 From our experience changing any settings/schema during a large cluster is on 
 line and has been running for some time is really really a pain.
Which parts in particular ? 

Updating the schema or config ? OpsCentre has a rolling restart feature which 
can be handy when chef / puppet is deploying the config changes. Schema / 
gossip can take a little to propagate with high number of nodes. 
 
On a modern version you should be able to run 2 to 3 TB per node, maybe higher. 
The biggest concerns are going to be repair (the changes in 2.1 will help) and 
bootstrapping. I’d recommend testing a smaller cluster, say 12 nodes, with a 
high load per node 3TB. 

cheers
Aaron
 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 9/05/2014, at 12:09 pm, Yatong Zhang bluefl...@gmail.com wrote:

 Hi,
 
 We're going to deploy a large Cassandra cluster in PB level. Our scenario 
 would be:
 
 1. Lots of writes, about 150 writes/second at average, and about 300K size 
 per write.
 2. Relatively very small reads
 3. Our data will be never updated
 4. But we will delete old data periodically to free space for new data
 
 We've learned that compaction strategy would be an important point cause 
 we've ran into 'no space' trouble because of the 'sized tiered'  compaction 
 strategy.
 
 We've read http://wiki.apache.org/cassandra/LargeDataSetConsiderations and is 
 this enough or update-to-date? From our experience changing any 
 settings/schema during a large cluster is on line and has been running for 
 some time is really really a pain. So we're gathering more info and expecting 
 some more practical suggestions before we set up  the cassandra cluster. 
 
 Thanks and any help is of great appreciation

Re: Understanding about Cassandra read repair with QUORUM

2014-01-16 Thread Aaron Morton

 I have following understanding about Cassandra read repair:
Read Repair is an automatic process that reads from more nodes than necessary 
during a normal read and checks and repairs differences in the background. It’s 
different to “repair” or Anti Entropy that you run with nodetool repair. 

   • If we write with QUORUM and read with QUORUM then we do not need to 
 externally (nodetool) trigger read repair. 
You normally still want to run repair because it’s the way to ensure Tombstones 
are distributed. 

   • Since we are reading + writing with QUORUM then it is safe to set 
 read_repair_chance=0  dclocal_read_repair_chance=0 in column family 
 definition.
It’s safe, read repair does not affect consistency. It’s designed to reduce the 
chance that the server will need to repair an inconsistency during a read for a 
client. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/01/2014, at 11:31 am, chovatia jaydeep chovatia_jayd...@yahoo.co.in 
wrote:

 Hi,
 
 I have following understanding about Cassandra read repair:
   • If we write with QUORUM and read with QUORUM then we do not need to 
 externally (nodetool) trigger read repair. 
   • Since we are reading + writing with QUORUM then it is safe to set 
 read_repair_chance=0  dclocal_read_repair_chance=0 in column family 
 definition.
  Can someone please clarify?
  
 -jaydeep

Re: Problem in running cassandra-2.0.4 trigger example

2014-01-16 Thread Aaron Morton

 But i am getting error:   Bad Request: Key may not be empty
My guess is the trigger is trying to create a row with an empty key. 

Add some logging to the trigger to see what it’s doing. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/01/2014, at 11:50 am, Thunder Stumpges thunder.stump...@gmail.com wrote:

 I'm not sure if this is your issue as I have not used these triggers before 
 but shouldn't the invertedindex table have a different primary key than the 
 primary table (either f2 or f3)?
 
 -Thunder
 
 
 On Jan 11, 2014, at 12:03 PM, Vidit Asthana vidit.astha...@gmail.com wrote:
 
 I am new to cassandra and trying to run the trigger example provided by 
 cassandra on a pseudo cluster using instructions provided on 
 https://github.com/apache/cassandra/tree/cassandra-2.0/examples/triggers
 
 But i am getting error:   Bad Request: Key may not be empty
 
 Can someone tell me if my CREATE table is proper? What else can be wrong?
 
 I am doing following using cqlsh.
  • CREATE KEYSPACE keyspace1 WITH REPLICATION = { 'class' : 
 'SimpleStrategy', 'replication_factor' : 1 }; 
  • use keyspace1;
  • CREATE TABLE invertedindex ( f1 varchar , f2 varchar, f3 varchar, 
 PRIMARY KEY(f1));
  • CREATE TABLE table1 ( f1 varchar , f2 varchar, f3 varchar, PRIMARY 
 KEY(f1));
  • CREATE TRIGGER mytrigger ON table1 USING 
 'org.apache.cassandra.triggers.InvertedIndex';
  • insert into table1 (f1,f2,f3) values ('aaa','bbb','ccc');
 
 This is what i get in system.log: 
 
 INFO  [Thrift:1] 2014-01-11 14:48:09,875 InvertedIndex.java:67 - loaded 
 property file, InvertedIndex.properties
 
 This is content of conf/InvertedIndex.properties file:
 
 keyspace=keyspace1
 columnfamily=invertedindex
 
 Thanks in advance.
 
 Vidit

Re: Need ur expertise on Cassandra issue!!

2014-01-16 Thread Aaron Morton

Look at the logs for the cassandra servers, are nodes going down ? 
Are there any other errors ? 
Check for log messages about GCInspector, if there is a lot of GC nodes will 
start to flap up and down. 

It sounds like there is stability issue with cassandra, look there first to 
make sure it is always available.

If you want to load 150GB of data from Hadoop to Cassandra a day I would 
suggest creating SSTables in Hadoop and bulk loading them into cassandra. This 
article is old buy it’s still relevant 
http://www.datastax.com/dev/blog/bulk-loading

Hope that helps. 

 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/01/2014, at 3:53 pm, Arun toarun...@gmail.com wrote:

 Hi , 
 
 I need your help  suggestions for our production issue. 
 
 Details: 
 -- 
 we have 40 CF's in cassandra cluster for each datasource like below 
 MusicData--Keyspace 
 spotify_1-column family-Active 
 spotify_2-column family-standby 
 Daily we load data into this cluster using as below process: 
 1.Astyanix library to delete inactive version of CF datahere spotify_2 
 2. Hadoop Bulkload JAR -pushes data from Hadoop to cassandra into spotify_2 
 Data inflow rate 150GB per day . 
 Datastax community version 1.1.9 with 9 nodes of 4 TB which are built on 
 openstack with high end config. 
 
 Problem: 
 --- 
 we're encountering the problem every week, the hadoop bulkload program is 
 failing with 
 java.io.IOException: Too many hosts failed: [/10.240.171.80, /10.240.171.76, 
 /10.240.171.74, /10.240.171.73] 
 
 at 
 org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:243
 ) 
 I can provide more details about the error if you need.with our initial 
 analysis we came to know if we're deleting the deleted space for tombstoned 
 blocks will be reclaimed in compaction process so we have increased storage 
 capacity by adding new nodes but problem still persists. 
 we need your expertise to comment on this production issue.please let me 
 know if you need any information!! 
 I will wait for your response !! 
 
 -Arun

Re: upgrade from cassandra 1.2.3 - 1.2.13 + start using SSL

2014-01-16 Thread Aaron Morton

 Can you confirm that, cause we'll add a new DC with version 1.2.13 
 (read-only) and we'll upgarde other DCs to 1.2.13 weeks later. We made some 
 tests and didn't notice anything. But we didn't test a node failure

Depending on the other version you may not be able to run repair. All nodes 
have to use the same file version, file versions are here 
https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/io/sstable/Descriptor.java#L52

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 14/01/2014, at 7:30 am, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Jan 13, 2014 at 3:38 AM, Cyril Scetbon cyril.scet...@free.fr wrote:
 Can you confirm that, cause we'll add a new DC with version 1.2.13 
 (read-only) and we'll upgarde other DCs to 1.2.13 weeks later. We made some 
 tests and didn't notice anything. But we didn't test a node failure
 
 In general adding nodes at a new version is not supported, whether a single 
 node or an entire DC of nodes.
 
 =Rob

Re: various Cassandra performance problems when CQL3 is really used

2014-01-16 Thread Aaron Morton

 I don't know. How do I find out? The only mention about query plan in 
 Cassandra I found is your article on your site, from 2011 and considering 
 version 0.8.

See the help for TRACE in cqlsh

My general approach is to solve problems with the read path by making changes 
to the write path. So I would normally say make a new table to store the data 
you want to read, or change the layout of a table to me more flexible. 

Can you provide the table definition and the query you are using ? 

Cheers
 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 15/01/2014, at 9:48 am, Ondřej Černoš cern...@gmail.com wrote:

 Hi,
 
 thanks for the answer and sorry for the delay. Let me answer inline.
 
 
 On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton aa...@thelastpickle.com wrote:
  * select id from table where token(id)  token(some_value) and 
  secondary_index = other_val limit 2 allow filtering;
 
  Filtering absolutely kills the performance. On a table populated with 
  130.000 records, single node Cassandra server (on my i7 notebook, 2GB of 
  JVM heap) and secondary index built on column with low cardinality of its 
  value set this query takes 156 seconds to finish.
 Yes, this is why you have to add allow_filtering. You are asking the nodes to 
 read all the data that matches and filter in memory, that’s a SQL type 
 operation.
 
 Your example query is somewhat complex and I doubt it could get decent 
 performance, what does the query plan look like?
 
 I don't know. How do I find out? The only mention about query plan in 
 Cassandra I found is your article on your site, from 2011 and considering 
 version 0.8.
 
 The example query gets computed in a fraction of the time if I perform just 
 the fetch of all rows matching the token function and perform the filtering 
 client side.
 
  
 IMHO you need to do further de-normalisation, you will get the best 
 performance when you select rows by their full or part primary key.
 
 I denormalize all the way I can. The problem is I need to support paging and 
 filtering at the same time. The API I must support allows filtering by 
 example and paging - so how should I denormalize? Should I somehow manage 
 pages of primary row keys manually? Or should I have manual secondary index 
 and page somehow in the denormalized wide row?
 
 The trouble goes even further, even this doesn't perform well:
 
 select id from table where token(id)  token(some_value) and pk_cluster = 
 'val' limit N;
 
 where id and pk_cluster are primary key (CQL3 table). I guess this should be 
 ordered row query and ordered column slice query, so where is the problem 
 with performance?
 
  
  By the way, the performance is order of magnitude better if this patch is 
  applied:
 That looks like it’s tuned to your specific need, it would ignore the max 
 results included in the query
 
 It is tuned, it only demonstrates the heuristics doesn't work well.
  
  * select id from table;
 
  As we saw in the trace log, the query - although it queries just row ids - 
  scans all columns of all the rows and (probably) compares TTL with current 
  time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that 
  if the table somehow mixes wide and narrow rows, the performance suffers 
  horribly.
 Select all rows from a table requires a range scan, which reads all rows from 
 all nodes. It should never be used production.
 
 The trouble is I just need to perform it, sometimes. I know what the problem 
 with the query is, but I have just a couple of thousands records - 150.000 - 
 the datasets can all be stored in memory, SSTables can be fully mmapped. 
 There is no reason for this query to be slow in this case.
  
 Not sure what you mean by “scans all columns from all rows” a select by 
 column name will use a SliceByNamesReadCommand which will only read the 
 required columns from each SSTable (it normally short circuits though and 
 read from less).
 
 The query should fetch only IDs, it checks TTLs of columns though. That is 
 the point. Why does it do it? 
  
 if there is a TTL the ExpiringColumn.localExpirationTime must be checked, if 
 there is no TTL it will no be checked.
 
 It is a standard CQL3 table with ID, couple of columns and a CQL3 collection. 
 I didn't do anything with TTL on the table and it's columns.
  
  As Cassandra checks all the columns in selects, performance suffers badly 
  if the collection is of any interesting size.
 This is not true, could you provide an example where you think this is 
 happening ?
 
 We saw it in the trace log. It happened in the select ID from table query. 
 The table had a collection column.
  
  Additionally, we saw various random irreproducible freezes, high CPU 
  consumption when nothing happens (even with trace log level set no activity 
  was reported) and highly inpredictable performance characteristics after 
  nodetool flush and/or major

Re: Cassandra mad GC

2014-01-16 Thread Aaron Morton

c3.4xlarge
long par new on a machine like this is not normal.

Do you have a custom comparator or are you using triggers ?
Do you have a data model that creates a lot of tombstones ?

Try to return the settings to default and then tune from there, that includes
returning to the default JVM GC settings. If for no other reason than other
people will be able to offer advice.

Have you changed the compaction_throughput ? Put it back if you have.
If you have enabled multi_threaded compaction disable it.
Consider setting concurrent_compactors to 4 or 8 to reduce compaction churn.
If you have increased in_memory_compaction_limit put it back.

Cassandra logs
Can you provide some of the log messages from GCInspector ? How long are the
pauses ? Is there a lot of CMS or ParNew ?
Do you have monitoring in place ? Is CMS able to return the heap to a low value
e.g. 3Gb ?

cpu load 1000%
Is this all from cassandra ?
try jvmtop (https://code.google.com/p/jvmtop/) to see what cassandra threads
are doing.

It’s a lot easier to tune a system with fewer non default settings.

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 16/01/2014, at 8:22 am, Arya Goudarzi gouda...@gmail.com wrote:

It is not a good idea to change settings without identifying the root cause.
Chances are what you did masked the problem a bit for you, but the problem is
still there, isn't it?

On Wed, Jan 15, 2014 at 1:11 AM, Dimetrio dimet...@flysoft.ru wrote:
I set G1 because GS started to work wrong(dropped messages) with standard GC
settings.
In my opinion, Cassandra started to work more stable with G1 (it's getting
less count of timeouts now) but it's not ideally yet.
I just want cassandra to works fine.

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-mad-GC-tp7592248p7592257.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.

--
Cheers,
-Arya

Re: Nodetool ring

2014-01-09 Thread Aaron Morton

Owns is how much of the entire, cluster wide, data set the node has. In both 
your examples every node has a full copy of the data. 

If you have 6 nodes and RF 3 they would have 50%. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 3/01/2014, at 6:00 pm, Vivek Mishra mishra.v...@gmail.com wrote:

 Yes.
 
 
 On Fri, Jan 3, 2014 at 12:57 AM, Robert Coli rc...@eventbrite.com wrote:
 On Thu, Jan 2, 2014 at 10:48 AM, Vivek Mishra mishra.v...@gmail.com wrote:
 Thanks for your quick reply. Even with 2 data center  with 3 data nodes each 
 i am seeing 100% on both data center nodes.
 
 Do you have RF=3 in both?
 
 =Rob

Re: Cassandra consuming too much memory in ubuntu as compared to within windows, same machine.

2014-01-09 Thread Aaron Morton

When Xms and Xmx are the same like this the JVM allocates all the memory, and 
then on Linux cassandra will ask the OS to lock that memory so it cannot be 
paged out. On windows it’s probably getting paged out.

If you only have 4GB on the box, you probably do not want to run cassandra with 
4GB. Try 2Gb so there is room for other things. 

Cheers
 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 7/01/2014, at 9:03 am, Erik Forkalsud eforkals...@cj.com wrote:

 On 01/04/2014 08:04 AM, Ertio Lew wrote:
 ...  my dual boot 4GB(RAM) machine. 
 
 ...  -Xms4G -Xmx4G -
 
 
 You are allocating all your ram to the java heap.  Are you using the same JVM 
 parameters on the windows side?   You can try to lower the heap size or add 
 ram to your machine.
 
 
 
 - Erik -

Re: massive spikes in read latency

2014-01-09 Thread Aaron Morton

 The spikes in latency don’t seem to be correlated to an increase in reads. 
 The cluster’s workload is usually handling a maximum workload of 4200 
 reads/sec per node, with writes being significantly less, at ~200/sec per 
 node. Usually it will be fine with this, with read latencies at around 3.5-10 
 ms/read, but once or twice an hour the latencies on the 3 nodes will shoot 
 through the roof.
Could there be errant requests coming in from the app ? e.g. something asking 
for 1’000s of columns ?
Or something that hits a row that has a lot of tombstones ? 

Take a look at nodetool cfhistograms to see if you have any outlier wide rows. 
Also the second column, sstables, will tell you how many sstables were touched 
by reads which. High numbers, above 4, let you know there are some wide rows 
out there.  

In 2.0 and later 1.2 releases nodetool cfstats will also include information 
about the number tombstones touched in a read. 

Hope that helps. 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 8/01/2014, at 2:15 am, Jason Wee peich...@gmail.com wrote:

 /**
  * Verbs it's okay to drop if the request has been queued longer than the 
 request timeout.  These
  * all correspond to client requests or something triggered by them; we 
 don't want to
  * drop internal messages like bootstrap or repair notifications.
  */
 public static final EnumSetVerb DROPPABLE_VERBS = 
 EnumSet.of(Verb.BINARY,

 Verb._TRACE,

 Verb.MUTATION,

 Verb.READ_REPAIR,
Verb.READ,

 Verb.RANGE_SLICE,

 Verb.PAGED_RANGE,

 Verb.REQUEST_RESPONSE);
 
 
 The short term solution would probably increase the timeout in your yaml file 
 but i suggest you get the monitoring graphs (ping internode, block io) ready 
 so it will give better indication which might be the exact problem.
 
 Jason
 
 
 On Tue, Jan 7, 2014 at 2:30 AM, Blake Eggleston bl...@shift.com wrote:
 That’s a good point. CPU steal time is very low, but I haven’t observed 
 internode ping times during one of the peaks, I’ll have to check that out. 
 Another thing I’ve noticed is that cassandra starts dropping read messages 
 during the spikes, as reported by tpstats. This indicates that there’s too 
 many queries for cassandra to handle. However, as I mentioned earlier, the 
 spikes aren’t correlated to an increase in reads.
 
 On Jan 5, 2014, at 3:28 PM, Blake Eggleston bl...@shift.com wrote:
 
  Hi,
 
  I’ve been having a problem with 3 neighboring nodes in our cluster having 
  their read latencies jump up to 9000ms - 18000ms for a few minutes (as 
  reported by opscenter), then come back down.
 
  We’re running a 6 node cluster, on AWS hi1.4xlarge instances, with 
  cassandra reading and writing to 2 raided ssds.
 
  I’ve added 2 nodes to the struggling part of the cluster, and aside from 
  the latency spikes shifting onto the new nodes, it has had no effect. I 
  suspect that a single key that lives on the first stressed node may be 
  being read from heavily.
 
  The spikes in latency don’t seem to be correlated to an increase in reads. 
  The cluster’s workload is usually handling a maximum workload of 4200 
  reads/sec per node, with writes being significantly less, at ~200/sec per 
  node. Usually it will be fine with this, with read latencies at around 
  3.5-10 ms/read, but once or twice an hour the latencies on the 3 nodes will 
  shoot through the roof.
 
  The disks aren’t showing serious use, with read and write rates on the ssd 
  volume at around 1350 kBps and 3218 kBps, respectively. Each cassandra 
  process is maintaining 1000-1100 open connections. GC logs aren’t showing 
  any serious gc pauses.
 
  Any ideas on what might be causing this?
 
  Thanks,
 
  Blake

Re: nodetool cleanup / TTL

2014-01-09 Thread Aaron Morton

 Is there some other mechanism for forcing expired data to be removed without 
 also compacting? (major compaction having obvious problematic side effects, 
 and user defined compaction being significant work to script up).
Tombstone compactions may help here 
https://issues.apache.org/jira/browse/CASSANDRA-3442 

They cannot be forced, but if there is nothing else to compact they will look 
for single sstables to compact.

Cheers 


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 8/01/2014, at 11:18 pm, Sylvain Lebresne sylv...@datastax.com wrote:

 
 Is there some other mechanism for forcing expired data to be removed without 
 also compacting? (major compaction having obvious problematic side effects, 
 and user defined compaction being significant work to script up).
 
 
 Online scrubs will, as a side effect, purge expired tombstones *when 
 possible* (even expired data cannot be removed if it possibly overwrite some 
 older data in some other sstable than the one scubbed). Please don't take 
 that as me saying that this is a guarantee of scrub: it is just one of its 
 current implementation side effect and it might very well change tomorrow.
 
 --
 Sylvain

Re: upgrade from cassandra 1.2.3 - 1.2.13 + start using SSL

2014-01-09 Thread Aaron Morton

We avoid mixing versions for a long time, but we always upgrade one node and 
check the application is happy before proceeding. e.g. wait for 30 minutes 
before upgrading the others. 

If you snapshot before upgrading, and have to roll back after 30 minutes you 
can roll back to the snapshot and use repair to fix the data on disk. 

Hope that helps. 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 9/01/2014, at 7:24 am, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jan 8, 2014 at 1:17 AM, Jiri Horky ho...@avast.com wrote:
 I am specifically interested whether is possible to upgrade just one
 node and keep it running like that for some time, i.e. if the gossip
 protocol is compatible in both directions. We are a bit afraid to
 upgrade all nodes to 1.2.13 at once in a case we would need to rollback.
 
 This not not officially supported. It will probably work for these particular 
 versions, but it is not recommended.
 
 The most serious potential issue is an inability to replace the new node if 
 it fails. There's also the problem of not being able to repair until you're 
 back on the same versions. And other, similar, undocumented edge cases...
 
 =Rob

Re: MUTATION messages dropped

2013-12-30 Thread Aaron Morton

 I ended up changing memtable_flush_queue_size to be large enough to contain 
 the biggest flood I saw.
As part of the flush process the “Switch Lock” is taken to synchronise around 
the commit log. This is a reentrant Read Write lock, the flush path takes the 
write lock and write path takes the read part. When flushing a CF the write 
lock is taken, the commit log is updated, and memtable is added to the flush 
queue. If the queue is full then the write lock will be held blocking the write 
threads from taking the read lock. 

There are a few reasons why the queue may be full, the simple one is the disk 
IO is not fast enough. Others are that the commit log segments are too small, 
there are lots of CF’s and/or lots of secondary indexes, or nodetoo flush is 
called frequently. 

Increasing the size of the queue is a good work around, and the correct 
approach if you have a lot of CF’s and/or secondary indexes. 

Hope that helps.


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/12/2013, at 6:03 am, Ken Hancock ken.hanc...@schange.com wrote:

 I ended up changing memtable_flush_queue_size to be large enough to contain 
 the biggest flood I saw.
 
 I monitored tpstats over time using a collection script and an analysis 
 script that I wrote to figure out what my largest peaks were.  In my case, 
 all my mutation drops correlated with hitting the maximum 
 memtable_flush_queue_size and then mutations drops stopped as soon as the 
 queue size dropped below the max.
 
 I threw the scripts up on github in case they're useful...
 
 https://github.com/hancockks/tpstats
 
 
 
 
 On Fri, Dec 20, 2013 at 1:08 AM, Alexander Shutyaev shuty...@gmail.com 
 wrote:
 Thanks for you answers.
 
 srmore,
 
 We are using v2.0.0. As for GC I guess it does not correlate in our case, 
 because we had cassandra running 9 days under production load and no dropped 
 messages and I guess that during this time there were a lot of GCs.
 
 Ken,
 
 I've checked the values you indicated. Here they are:
 
 node1 6498
 node2 6476
 node3 6642
 
 I guess this is not good :) What can we do to fix this problem?
 
 
 2013/12/19 Ken Hancock ken.hanc...@schange.com
 We had issues where the number of CF families that were being flushed would 
 align and then block writes for a very brief period. If that happened when a 
 bunch of writes came in, we'd see a spike in Mutation drops.
 
 Check nodetool tpstats for FlushWriter all time blocked.
 
 
 On Thu, Dec 19, 2013 at 7:12 AM, Alexander Shutyaev shuty...@gmail.com 
 wrote:
 Hi all!
 
 We've had a problem with cassandra recently. We had 2 one-minute periods when 
 we got a lot of timeouts on the client side (the only timeouts during 9 days 
 we are using cassandra in production). In the logs we've found corresponding 
 messages saying something about MUTATION messages dropped.
 
 Now, the official faq [1] says that this is an indicator that the load is too 
 high. We've checked our monitoring and found out that 1-minute average cpu 
 load had a local peak at the time of the problem, but it was like 0.8 against 
 0.2 usual which I guess is nothing for a 2 core virtual machine. We've also 
 checked java threads - there was no peak there and their count was reasonable 
 ~240-250.
 
 Can anyone give us a hint - what should we monitor to see this high load 
 and what should we tune to make it acceptable?
 
 Thanks in advance,
 Alexander
 
 [1] http://wiki.apache.org/cassandra/FAQ#dropped_messages
 
 
 
 -- 
 Ken Hancock | System Architect, Advanced Advertising 
 SeaChange International 
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC 
 Office: +1 (978) 889-3329 |  ken.hanc...@schange.com | hancockks | hancockks  
 
 
 This e-mail and any attachments may contain information which is SeaChange 
 International confidential. The information enclosed is intended only for the 
 addressees herein and may not be copied or forwarded without permission from 
 SeaChange International.
 
 
 
 
 -- 
 Ken Hancock | System Architect, Advanced Advertising 
 SeaChange International 
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC 
 Office: +1 (978) 889-3329 |  ken.hanc...@schange.com | hancockks | hancockks  
 
 
 This e-mail and any attachments may contain information which is SeaChange 
 International confidential. The information enclosed is intended only for the 
 addressees herein and may not be copied or forwarded without permission from 
 SeaChange International.

Re: Astyanax - multiple key search with pagination

2013-12-30 Thread Aaron Morton

You will need to paginate the list of keys to read in your app. 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/12/2013, at 12:58 pm, Parag Patel parag.pa...@fusionts.com wrote:

 Hi,
  
 I’m using Astyanax and trying to do search for multiple keys with pagination. 
  I tried “.getKeySlice” with a list a of primary keys, but it doesn’t allow 
 pagination.  Does anyone know how to tackle this issue with Astyanax?
  
 Parag

Re: Broken pipe with Thrift

2013-12-30 Thread Aaron Morton

 One question, which is confusing , it's a server side issue or client side?
Check the server log for errors to make sure it’s not a server side issue. 
Also check if there could be something in network that is killing long lived 
connections. 
Check the thrift lib the client is using is the same as the one in the 
cassandra lib on the server. 

Can you do some simple tests using cqlsh from the client machine? That would 
eliminate the client driver. 

Hope that helps.


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 25/12/2013, at 4:35 am, Steven A Robenalt srobe...@stanford.edu wrote:

 In our case, the issue was on the server side, but since you're on the 1.2.x 
 branch, it's not likely to be the same issue. Hopefully, somone else who is 
 using the 1.2.x branch will have more insight than I do.
 
 
 On Mon, Dec 23, 2013 at 11:52 PM, Vivek Mishra mishra.v...@gmail.com wrote:
 Hi Steven,
 One question, which is confusing , it's a server side issue or client side?
 
 -Vivek
 
 
 
 
 On Tue, Dec 24, 2013 at 12:30 PM, Vivek Mishra mishra.v...@gmail.com wrote:
 Hi Steven,
 Thanks for your reply. We are using version 1.2.9.
 
 -Vivek
 
 
 On Tue, Dec 24, 2013 at 12:27 PM, Steven A Robenalt srobe...@stanford.edu 
 wrote:
 Hi Vivek,
 
 Which release are you using? We had an issue with 2.0.2 that was solved by a 
 fix in 2.0.3.
 
 
 On Mon, Dec 23, 2013 at 10:47 PM, Vivek Mishra mishra.v...@gmail.com wrote:
 Also to add. It works absolutely fine on single node.
 
 -Vivek
 
 
 On Tue, Dec 24, 2013 at 12:15 PM, Vivek Mishra mishra.v...@gmail.com wrote:
 Hi,
 I have a 6 node, 2DC cluster setup. I have configured consistency level to 
 QUORUM.  But very often i am getting Broken pipe
 com.impetus.client.cassandra.CassandraClientBase
 (CassandraClientBase.java:1926) - Error while executing native CQL
 query Caused by: .
 org.apache.thrift.transport.TTransportExceptionjava.net.SocketException: 
 Broken pipe
at
 org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransportjava:147)
 at 
 org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156)
 at
 org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
 at
 org.apache.cassandra.thrift.Cassandra$Client.send_execute_cql3_query(Cassandra.java:1556)
 at
 org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1546)
 
 
 I am simply reading few records from a column family(not huge amount of data)
 
 Connection pooling and socket time out is properly configured. I have even 
 modified 
 read_request_timeout_in_ms
 request_timeout_in_ms
 write_request_timeout_in_ms  in cassandra.yaml to higher value.
 
 
 any idea? Is it an issue at server side or with client API?
 
 -Vivek
 
 
 
 
 -- 
 Steve Robenalt
 Software Architect
 HighWire | Stanford University 
 425 Broadway St, Redwood City, CA 94063 
 
 srobe...@stanford.edu 
 http://highwire.stanford.edu 
 
 
 
 
 
 
 
 
 
 
 -- 
 Steve Robenalt
 Software Architect
 HighWire | Stanford University 
 425 Broadway St, Redwood City, CA 94063 
 
 srobe...@stanford.edu 
 http://highwire.stanford.edu

Re: querying time series from hadoop

2013-12-30 Thread Aaron Morton

 So now i will try to patch my cassandra 1.2.11 installation but i just wanted 
 to ask you guys first, if there is any other solution that does not involve a 
 release.
That patch in CASSANDRA-6311 is for 2.0 you cannot apply it to 1.2

 but when i am using the java driver, the driver already uses row key for 
 token statements and i cannot execute the query above, therefore it does a 
 full scan of rows.
The  ColumnFamilyRecordReader is designed to read lots of rows, not a single 
row. 

You should be able to use the java driver from a hadoop task though to read a 
single row. Can you provide some more info on what you are doing ? 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 26/12/2013, at 9:56 pm, mete efk...@gmail.com wrote:

 Hello  folks, 
 
 i have come up with a basic time series cql schema based on the articles here:
 
 http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
 
 so simply put its something like:
 
 rowkey, timestamp, col3, col4 etc... 
 
 where rowkey and timestamp are compound keys.
 
 Where i am having issues is to efficiently query this data structure. 
 
 When i use cqlsh and query it is perfectly fine:
 
 select * from table where rowkey='row key' and date  xxx and date = yyy
 
 but when i am using the java driver, the driver already uses row key for 
 token statements and i cannot execute the query above, therefore it does a 
 full scan of rows.
 
 The issue that i am having is discussed here:
 
 http://stackoverflow.com/questions/19189649/composite-key-in-cassandra-with-pig
 
 i have gone through the relevant jira issues 6151 and 6311. This behaviour is 
 supposed to be fixed in 2.0.x but so far it is not there. So now i will try 
 to patch my cassandra 1.2.11 installation but i just wanted to ask you guys 
 first, if there is any other solution that does not involve a release.
 
 i assume that this is somewhat a common use case, the articles i referred 
 seems to be old enough and unless i am missing something obvious i cannot 
 query this schema efficiently with the current version (1.2.x or 2.0.x)
 
 Does anyone has a similar issue? Any pointers are welcome.
 
 Regards
 Mete

Re: Offline migration: Random-Murmur

2013-12-30 Thread Aaron Morton

  I wrote a small (yet untested) utility, which should be able to read SSTable 
 files from disk and write them into a cassandra cluster using Hector.
Consider using the SSTableSimpleUnsortedWriter (see 
http://www.datastax.com/dev/blog/bulk-loading) to create the SSTables you can 
then bulk load them into the destination system.This will be much faster. 


Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 29/12/2013, at 6:26 am, Edward Capriolo edlinuxg...@gmail.com wrote:

 Internally we have a tool that does get range slice on the souce cluster and 
 replicates to destination.
 
 Remeber that writes are itempotemt. Our tool can optionally only replicate 
 data between two timestamps, allowing incremental transfers.
 
 So if you get your application writing new data to both clusters you can run 
 a range scanning program to copy all the data.
 
 On Monday, December 23, 2013, horschi hors...@gmail.com wrote:
  Interesting you even dare to do a live migration :-)
 
  Do you do all Murmur-writes with the timestamp from the Random-data? So 
  that all migrated data is written with timestamps from the past.
 
 
 
  On Mon, Dec 23, 2013 at 3:59 PM, Rahul Menon ra...@apigee.com wrote:
 
  Christian,
 
  I have been planning to migrate my cluster from random to murmur3 in a 
  similar manner. I intend to use pycassa to read and then write to the 
  newer cluster. My only concern would be ensuring the consistency of 
  already migrated data as the cluster ( with random ) would be constantly 
  serving the production traffic. I was able to do this on a non prod 
  cluster, but production is a different game.
 
  I would also like to hear more about this, especially if someone was able 
  to successfully do this.
 
  Thanks
  Rahul
 
 
  On Mon, Dec 23, 2013 at 6:45 PM, horschi hors...@gmail.com wrote:
 
  Hi list,
 
  has anyone ever tried to migrate a cluster from Random to Murmur?
 
  We would like to do so, to have a more standardized setup. I wrote a 
  small (yet untested) utility, which should be able to read SSTable files 
  from disk and write them into a cassandra cluster using Hector. This 
  migration would be offline of course and would only work for smaller 
  clusters.
 
  Any thoughts on the topic?
 
  kind regards,
  Christian
 
  PS: The reason for doing so are not performance. It is to simplify 
  operational stuff for the years to come. :-)
 
 
 
 
 -- 
 Sorry this was sent from mobile. Will do less grammar and spell check than 
 usual.

Re: cassandra monitoring

2013-12-30 Thread Aaron Morton

  JMX is doing it's thing on the cassandra node and is running on port 8081
Have you set the JMX port for the cluster in Ops Centre ? The default JMX port 
has been 7199 for a while.

Off the top of the my head it’s in the same area where you specify the initial 
nodes in the cluster, maybe behind an “Advanced” button. 

The Ops Centre agent talks to the server to find out what JMX port it should 
use to talk to the local Cassandra install. 

Also check the logs in /var/log/datastax 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 30/12/2013, at 2:21 am, Tim Dunphy bluethu...@gmail.com wrote:

 Hi all,
 
 I'm attempting to configure datastax agent so that opscenter can monitor 
 cassandra. I am running cassandra 2.0.3 and opscenter-4.0.1-2.noarch running. 
 Cassandra is running on a centos 5.9 host and the opscenter host is running 
 on centos 6.5
 
 A ps shows the agent running
 
 [root@beta:~] #ps -ef | grep datastax | grep -v grep 
 root  2166 1  0 03:31 ?00:00:00 /bin/bash 
 /usr/share/datastax-agent/bin/datastax_agent_monitor
 106   2187 1  0 03:31 ?00:01:37 
 /etc/alternatives/javahome/bin/java -Xmx40M -Xms40M 
 -Djavax.net.ssl.trustStore=/var/lib/datastax-agent/ssl/agentKeyStore 
 -Djavax.net.ssl.keyStore=/var/lib/datastax-agent/ssl/agentKeyStore 
 -Djavax.net.ssl.keyStorePassword=opscenter 
 -Dagent-pidfile=/var/run/datastax-agent/datastax-agent.pid 
 -Dlog4j.configuration=/etc/datastax-agent/log4j.properties -jar 
 datastax-agent-4.0.2-standalone.jar /var/lib/datastax-agent/conf/address.yaml
 
 And the service itself claims that it is running:
 
 [root@beta:~] #service datastax-agent status 
 datastax-agent (pid  2187) is running...
 
 On the cassandra node I have ports 61620 and 61621 open on the firewall.
 
 But if I do an lsof and look for those ports I see no activity there.
 
 [root@beta:~] #lsof -i :61620 
 [root@beta:~] #lsof -i :61621
 
 And a netstat turns up nothing either:
 [root@beta:~] #netstat -tapn | egrep (datastax|ops)
 
 
 So I guess it should come as no surprise that the opscenter interface reports 
 the node as down.
 
 And trying to reinstall the agent remotely by clicking the 'fix' link errors 
 out:
 
 g is null
 
 If you need to make changes, you can press Retry and the installations will 
 be retried.
 
 And also I got on another attempt:
 
 Cannot call method 'getRequstStatus' of null. 
 
 I'm really wondering why I'm doing wrong here, and how I can work my way out 
 of this quagmire. It would be beyond awesome to actually get this working!
 
 I've also attempted to get Cassandra Cluster Admin working. JMX is doing it's 
 thing on the cassandra node and is running on port 8081. CCA is running on 
 the same host as the opscenter.
 
 But cca gives me this error once I log in:
 
 Cassandra Cluster Admin
 
 Logout
 
 Fatal error: Uncaught exception 'TTransportException' with message 'TSocket: 
 timed out reading 4 bytes from beta.jokefire.com:9160' in 
 /var/www/Cassandra-Cluster-Admin/include/thrift/transport/TSocket.php:268 
 Stack trace: #0 
 /var/www/Cassandra-Cluster-Admin/include/thrift/transport/TTransport.php(87): 
 TSocket-read(4) #1 
 /var/www/Cassandra-Cluster-Admin/include/thrift/transport/TFramedTransport.php(135):
  TTransport-readAll(4) #2 
 /var/www/Cassandra-Cluster-Admin/include/thrift/transport/TFramedTransport.php(102):
  TFramedTransport-readFrame() #3 
 /var/www/Cassandra-Cluster-Admin/include/thrift/transport/TTransport.php(87): 
 TFramedTransport-read(4) #4 
 /var/www/Cassandra-Cluster-Admin/include/thrift/protocol/TBinaryProtocol.php(300):
  TTransport-readAll(4) #5 
 /var/www/Cassandra-Cluster-Admin/include/thrift/protocol/TBinaryProtocol.php(192):
  TBinaryProtocol-readI32(NULL) #6 
 /var/www/Cassandra-Cluster-Admin/include/thrift/packages/cassandra/cassandra.Cassandra.client.php(1017):
  TBinaryProtocol-readMessageBegin(NULL, 0, 0) # in 
 /var/www/Cassandra-Cluster-Admin/include/thrift/transport/TSocket.php on line 
 268
 
 Any advice I could get on my CCA problem and /or my Opcenter problem would be 
 great and appreciated.
 
 Thanks
 Tim
 
 -- 
 GPG me!!
 
 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: Cleanup and old files

2013-12-30 Thread Aaron Morton

Check the SSTable is actually in use by cassandra, if it’s missing a component 
or otherwise corrupt it will not be opened at run time and so not included in 
all the fun games the other SSTables get to play. 

If you have the last startup in the logs check for an “Opening… “ message or an 
ERROR about the file. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 30/12/2013, at 1:28 pm, David McNelis dmcne...@gmail.com wrote:

 I am currently running a cluster with 1.2.8.  One of my larger column 
 families on one of my nodes has keyspace-tablename-ic--Data.db with a 
 modify date in August.
 
 Since august we have added several nodes (with vnodes), with the same number 
 of vnodes as all the existing nodes.
 
 As a result, (we've since gone from 15 to 21 nodes), then ~32% of my data of 
 the original 15 nodes should have been essentially balanced out to the 6 new 
 nodes.  (1/15 + 1/16 +  1/21).
 
 When I run a cleanup, however, the old data files never get updated, and I 
 can't believe that they all should have remained the same.
 
 The only recently updated files in that data directory are secondary index 
 sstable files.  Am I doing something wrong here?  Am I thinking about this 
 wrong?
 
 David

Re: Commitlog replay makes dropped and recreated keyspace and column family rows reappear

2013-12-23 Thread Aaron Morton

mmm, my bad there. 

First schema changes are always flushed to disk, so the commit log is not 
really an issue. 

Second when the commit log replays it just processes the mutations, the Drop 
keyspace” message comes from MigrationManager.announceKeyspaceDrop() and is not 
called. 

If you can reproduce this in a simple way please create a ticket at 
https://issues.apache.org/jira/browse/CASSANDRA

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 19/12/2013, at 2:42 am, Desimpel, Ignace ignace.desim...@nuance.com wrote:

 I did the test again to get the log information.
  
 There is a “Drop keyspace” message at the time I drop the keyspace. That 
 actually must be working since after the drop, I do not get any records back.
  
 But starting from the time of restart, I do not get any “Drop keyspace” 
 message in the log.
  
 I get the following lines (only part of log here ):
  
 ……….
 2013-12-18 14:30:19.385 Initializing system_traces.sessions
 2013-12-18 14:30:19.387 Initializing system_traces.events
 2013-12-18 14:30:19.394 Replaying 
 ../../../../data/cac.cassandra.cac/dbcommitlog/CommitLog-3-1387372026304.log, 
 ../../../../data/cac.cassandra.cac/dbcommitlog/CommitLog-3-1387372026305.log
 2013-12-18 14:30:19.414 Replaying 
 ../../../../data/cac.cassandra.cac/dbcommitlog/CommitLog-3-1387372026304.log
 2013-12-18 14:30:20.291 CFS(Keyspace='CodeStructure', ColumnFamily='Labels') 
 liveRatio is 10.79257274718398 (just-counted was 10.79257274718398).  
 calculation took 720ms for 6128 columns
 2013-12-18 14:30:20.331 CFS(Keyspace='CodeStructure', ColumnFamily='Class') 
 liveRatio is 9.787147977470557 (just-counted was 9.574295954941116).  
 calculation took 39ms for 1236 columns
 2013-12-18 14:30:20.454 CFS(Keyspace='CodeStructure', 
 ColumnFamily='ClassMethod') liveRatio is 10.415524860171194 (just-counted was 
 10.415524860171194).  calculation took 122ms for 6630 columns
 2013-12-18 14:30:21.294 Finished reading 
 ../../../../data/cac.cassandra.cac/dbcommitlog/CommitLog-3-1387372026304.log
 2013-12-18 14:30:21.294 Replaying 
 ../../../../data/cac.cassandra.cac/dbcommitlog/CommitLog-3-1387372026305.log
 2013-12-18 14:30:21.294 Finished reading 
 ../../../../data/cac.cassandra.cac/dbcommitlog/CommitLog-3-1387372026305.log
 2013-12-18 14:30:21.298 Enqueuing flush of 
 Memtable-ReverseIntegerFunction@663725448(270/2700 serialized/live bytes, 10 
 ops)
 2013-12-18 14:30:21.298 Writing 
 Memtable-ReverseIntegerFunction@663725448(270/2700 serialized/live bytes, 10 
 ops)
 ……more flushing of my memtables ……...
 Log replay complete, 42237 replayed mutations
 2013-12-18 14:30:25.428 Cassandra version: 2.0.2-SNAPSHOT
 2013-12-18 14:30:25.428 Thrift API version: 19.38.0
 ……
  
  
 Regards,
  
 Ignace Desimpel
  
  
 Do you have the logs from after the restart ?
 Did it include a Drop Keyspace…” INFO level message ?
  
 Cheers
  
  
 -
 Aaron Morton
 New Zealand
 @aaronmorton
  
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
  
  
  
 From: Desimpel, Ignace 
 Sent: dinsdag 3 december 2013 14:45
 To: user@cassandra.apache.org
 Subject: Commitlog replay makes dropped and recreated keyspace and column 
 family rows reappear
  
 Hi,
  
 I have the impression that there is an issue with dropping a keyspace and 
 then recreating the keyspace (and column families), combined with a restart 
 of the database
  
 My test goes as follows:
  
 Create keyspace K and column families C.
 Insert rows X0 column family  C0
 Query for X0 : found rows : OK
 Drop keyspace K
 Query for X0 : found no rows : OK
  
 Create keyspace K and column families C.
 Insert rows X1 column family  C1
 Query for X0 : not found : OK
 Query for X1 : found : OK
  
 Stop the Cassandra database
 Start the Cassandra database
 Query for X1 : found : OK
 Query for X0 : found : NOT OK !
  
 Did someone tested this scenario?
  
 Using : CASSANDRA VERSION 2.02, thrift, java 1.7.x, centos
  
 Ignace Desimpel

Re: Writes during schema migration

2013-12-23 Thread Aaron Morton

It depends a little on the nature of the change, but you need some coordination 
between the schema change and your code. e.g. add new column, change code to 
write to it or add new column, change code to use new column and not old 
column, remove old column. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 19/12/2013, at 3:02 am, Ben Hood 0x6e6...@gmail.com wrote:

 Hi,
 
 I was wondering if anybody knows any best practices of how to apply a
 schema migration across a cluster.
 
 I've been reading this article:
 
 http://www.datastax.com/dev/blog/the-schema-management-renaissance
 
 to see what is happening under the covers. However the article doesn't
 seem to talk about concurrent write access during the migration
 process.
 
 I'm naively assuming that you'd need to block all writes to the
 cluster before the migration is started. This is would be firstly
 because of potential consistency issues amongst the cluster nodes. But
 this would also be because you'd need two versions of your app to
 running at the same time.
 
 Does anybody have any experience with doing this kind of thing?
 
 Cheers,
 
 Ben

Re: How to tune cassandra to avoid OOM

2013-12-23 Thread Aaron Morton

Cassandra version is : apache-cassandra-1.2.4
The latest 1.2 version is 1.2.13, you really should be on that.

commitlog_total_space_in_mb: 16
commitlog_segment_size_in_mb: 16
Reducing the total commit log size to 16 MB is a very bad idea, you should
return it to 4096 and the segment size to 32.

The commit log is kept on disk and has no impact on the memory footprint.
Reducing the size will cause much more disk IO.

It’s kind of unusual to go OOM in 1.2+, but I’ve seen it happen with large
number of SSTables (30k+) and LCS. Also wide rows, or lots of tombstones, and
bad queries can result in a lot of premature tenuring. Finally custom
comparators can create a lot of garbage or a low powered CPU may not be able to
keep up.

How many cores do you have ?

You may want to make these changes to reduce how quickly objects are tenured,
also pay attention to how low the total heap use get’s to after CMS.

JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=4
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=2
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50”

Hope that helps.

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 19/12/2013, at 4:47 pm, Lee Mighdoll l...@underneath.ca wrote:

I'd suggest setting some cassandra jvm parameters so that you can analyze a
heap dump and peek through the gc logs. That'll give you some clues e.g. if
the memory problem is growing steadily or suddenly, and clues from a peek at
which object are using the memory.

-XX:+HeapDumpOnOutOfMemoryError

And if you don't want to wait six days for another failure, you can collect a
heap sooner with jmap -F.

-Xloggc:/path/to/where/to/put/the/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure

Cheers,
Lee

On Wed, Dec 18, 2013 at 6:52 PM, Shammi Jayasinghe sha...@wso2.com wrote:
Hi,

We are facing with a problem on Cassandra tuning. In that we have faced with
following OOM scenario[1], after running the system for 6 days. We have tuned
the cassandra with following values. These values also obtained by going
through huge number of testing cycles. But still it has gone OOM. I would
like to know if someone can help on identifying tuning parameters.

In this server , we have given 6GB for the Xmx value and the total memory in
the server is 8GB. Cassandra version is : apache-cassandra-1.2.4

Tuning parameters:
flush_largest_memtables_at: 0.5
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
commitlog_total_space_in_mb: 16
commitlog_segment_size_in_mb: 16

As i mentioned in the above parameters ( Flush_largest_memtable_at to 0,5) ,
i feel that it has not be affected to the server. Is there any way that we
can check whether it is affected as expected to the server ?

[1]WARN 19:16:50,355 Heap is 0.9971737408184552 full. You may need to reduce
memtable and/or cache sizes. Cassandra will now flush up to the two largest
memtables to free up memory. Adjust flush_largest_memtables_at threshold in
cassandra.yaml if you don't want Cassandra to do this automatically
WARN 19:18:19,784 Flushing CFS(Keyspace='QpidKeySpace',
ColumnFamily='DestinationSubscriptionsCountRow') to relieve memory pressure
ERROR 19:20:50,316 Exception in thread Thread[ReadStage:63,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.ByteBuffer.wrap(ByteBuffer.java:350)
at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
at
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:391)
at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
at
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
at
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:84)
at
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:73)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:370)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:325)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:151)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:48)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:90)
at
org.apache.cassandra.db.filter.QueryFilter$2

Re: Best way to measure write throughput...

2013-12-23 Thread Aaron Morton

nodetool proxyhistograms shows the throughput for the node, nodetool 
cfhistograms shows it for a single node. 

If you want to get an overview install something like Ops Centre 
http://www.datastax.com/what-we-offer/products-services/datastax-opscenter

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 19/12/2013, at 8:46 pm, Jason Wee peich...@gmail.com wrote:

 Hello, you could also probably do it in your application? Just sample with an 
 interval of time and that should give some indication of throughput.
 
 HTH
 
 /Jason
 
 
 On Thu, Dec 19, 2013 at 12:11 AM, Krishna Chaitanya bnsk1990r...@gmail.com 
 wrote:
 Hello,
 
 Could you please suggest to me the best way to measure write-throughput in 
 Cassandra. I basically have an application that stores network packets to a 
 Cassandra cluster. Which is the best way to measure write performance, 
 especially write-throughput, in terms of number of packets stored into 
 Cassandra per second or something similar to this??? Can I measure this using 
 nodetool?
 
 Thanks.
 
 -- 
 Regards,
 BNSK.

Re: Cassandra pytho pagination

2013-12-23 Thread Aaron Morton

 First approach:

Sounds good. 

 Second approach ( I used in production ):
If the row gets big enough this will have bad performance. 

A

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 19/12/2013, at 10:28 am, Kumar Ranjan winnerd...@gmail.com wrote:

 I am using pycassa. So, here is how I solved this issue. Will discuss 2 
 approaches. First approach didn't work out for me. Thanks Aaron for your 
 attention.
 
 First approach:
 - Say if column_count = 10
 - collect first 11 rows, sort first 10, send it to user (front end) as JSON 
 object and last=11th_column
 - User then calls for page 2, with prev = 1st_column_id, column_start = 
 11th_column and column_count = 10
 - This way, I can traverse, next page and previous page.
 - Only issue with this approach is, I don't have all columns in super column 
 sorted. So this did not work.
 
 Second approach ( I used in production ):
 - fetch all super columns for a row key
 - Sort this in python using sorted and lambda function based on column values.
 - Once sorted, I prepare buckets and each bucked size is of page size/column 
 count. Also filter out any rogue data if needed
 - Store page by page results in Redis with keys such as 
 'row_key|page_1|super_column' and keep refreshing redis periodically.
 
 I am sure, there must be a better and brighter approach but for now, 2nd 
 approach is working. Thoughts ??
 
 
 
 On Tue, Dec 17, 2013 at 9:19 PM, Aaron Morton aa...@thelastpickle.com wrote:
 CQL3 and thrift do not support an offset clause, so you can only really 
 support next / prev page calls to the database. 
 
 I am trying to use xget with column_count and buffer_size parameters. Can 
 someone explain me, how does it work? From doc, my understanding is that, I 
 can do something like,
 What client are you using ? 
 xget is not a standard cassandra function. 
 
 Cheers
 
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 13/12/2013, at 4:56 am, Kumar Ranjan winnerd...@gmail.com wrote:
 
 Hey Folks,
 
 I need some ideas about support implementing of pagination on the browser, 
 from the backend. So python code (backend) gets request from frontend with 
 page=1,2,3,4 and so on and count_per_page=50. 
 
 I am trying to use xget with column_count and buffer_size parameters. Can 
 someone explain me, how does it work? From doc, my understanding is that, I 
 can do something like,
 
 
 total_cols is total columns for that key.
 count is what user sends me. 
 .xget('Twitter_search', hh, column_count=total_cols, buffer_size=count):
 
 Is my understanding correct? because its not working for page 2 and so on? 
 Please enlighten me with suggestions.
 
 Thanks.

Re: Cassandra pytho pagination

2013-12-23 Thread Aaron Morton

 Is there something wrong with it? Here 1234555665_53323232 and 
 2344555665_53323232 are super columns. Also, If I have to represent this data 
 with new composite comparator, How will I accomplish that?
 
 

Composite types via pycassa 
http://pycassa.github.io/pycassa/assorted/composite_types.html?highlight=composite

Create a composite of where the super column value is the first part and the 
second part is the column name, this is basically what cql3 does. 

You will have to make all columns the same type though.

Or use CQL 3, it works well for these sorts of models. 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/12/2013, at 7:22 am, Kumar Ranjan winnerd...@gmail.com wrote:

 Rob - I got a question following your advice. This is how, I define my column 
 family 
 validators = {
 
 'approved':'UTF8Type',
 
 'tid': 'UTF8Type',
 
 'iid': 'UTF8Type',
 
 'score':   'IntegerType',
 
 'likes':   'IntegerType',
 
 'retweet': 'IntegerType',
 
 'favorite':'IntegerType',
 
 'screen_name': 'UTF8Type',
 
 'created_date':'UTF8Type',
 
 'expanded_url':'UTF8Type',
 
 'embedly_data':'BytesType',
 
 }
 
 SYSTEM_MANAGER.create_column_family('KeySpaceNNN', 'Twitter_Instagram', 
 default_validation_class='UTF8Type', super=True, comparator='UTF8Type', 
 key_validation_class='UTF8Type', column_validation_classes=validator)
 
 Actual data representation:
 
 'row_key': {'1234555665_53323232': {'approved': 'false', 'tid': 123,  'iid': 
 34, 'score': 2, likes: 50, retweets: 45, favorite: 34, 
 screen_name:'goodname'},
 
 '2344555665_53323232': {'approved': 'false', 'tid': 134,  
 'iid': 34, 'score': 2, likes: 50, retweets: 45, favorite: 34, 
 screen_name:'newname'}.
 
 .
 
}
 
 Is there something wrong with it? Here 1234555665_53323232 and 
 2344555665_53323232 are super columns. Also, If I have to represent this data 
 with new composite comparator, How will I accomplish that?
 
 
 
 Please let me know.
 
 
 
 Regards.
 
 
 
 On Wed, Dec 18, 2013 at 5:32 PM, Robert Coli rc...@eventbrite.com wrote:
 On Wed, Dec 18, 2013 at 1:28 PM, Kumar Ranjan winnerd...@gmail.com wrote:
 Second approach ( I used in production ):
 - fetch all super columns for a row key
 
 Stock response mentioning that super columns are anti-advised for use, 
 especially in brand new code.
 
 =Rob

Re: WriteTimeoutException instead of UnavailableException

2013-12-23 Thread Aaron Morton

 But in some cases, from one certain node, I get an WriteTimeoutException for 
 a few minutes until an UnavailableException. It's like the coordinator don't 
 know the status of the cluster. Any clue why is this happening?
Depending on how the node goes down there can be a delay in other nodes knowing 
it is down. 

If you stop gossip (nodetool disablegossip) the node will cancel the gossip 
thread (without interrupting), wait two seconds, then inform other nodes it’s 
leaving gossip. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 18/12/2013, at 8:56 am, Demian Berjman dberj...@despegar.com wrote:

 Question. I have a 5 node cluster (local with ccm). A keyspace with rf: 3.  
 Three nodes are down. I run nodetool ring in the two living nodes and both 
 see the other three nodes down.
 
 Then i do an insert with cs quorum and get an UnavailableException. It's ok.
 
 I am using Datastax java driver v 2.0.0-rc2.
 
 But in some cases, from one certain node, I get an WriteTimeoutException for 
 a few minutes until an UnavailableException. It's like the coordinator don't 
 know the status of the cluster. Any clue why is this happening?
 
 Thanks,

Re: WriteTimeoutException on Lightweight transactions

2013-12-23 Thread Aaron Morton

Some background….

http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

You can also get a timeout during the prepare phase, well anytime you are
waiting on other node really. The WriteTimeoutException returned from the
server includes a writeType
(https://github.com/apache/cassandra/blob/cassandra-2.0.0-beta1/src/java/org/apache/cassandra/exceptions/WriteTimeoutException.java#L27)
that will say if it CAS during the prepare and propose phases and simple when
trying to commit.

it’s also on the WriteTimeoutException in the driver. if it says CAS then we
did not get to start the write.

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/12/2013, at 10:05 am, Demian Berjman dberj...@despegar.com wrote:

Hi. I am using Cassandra 2.0.3 with Datastax Java client.

I execute an insert query:

Insert insert = QueryBuilder.insertInto(demo_cl,demo_table).value(id,
id).value(col1, transactions).ifNotExists();

session.execute(insert.setConsistencyLevel(ConsistencyLevel.QUORUM);

Then, i force a shutdown on one node and get:

com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout
during write query at consistency SERIAL (2 replica were required but only 1
acknowledged the write)

Then i read the row and i got not results. It seems that it was not inserted.
What happened to the 1 acknowledged the write? It's lost? It's like a
rollback?

Thanks,

Re: Issue upgrading from 1.2 to 2.0.3

2013-12-23 Thread Aaron Morton

If this is still a concern can you post the output from nodetool gossipinfo ? 
It will give the details of the nodes think of the other ones. 

A

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/12/2013, at 11:38 am, Parag Patel parag.pa...@fusionts.com wrote:

 Thanks for that link.
  
 Our 1.2 version is 1.2.12
  
 Our 2.0.3 nodes were restarted once.  Before restart, it was the 1.2.12 
 binary, after it was the 2.0.3.  Immediately after the node was back in the 
 cluster, we ran nodetool upgradesstables.  We haven’t restarted since.
  
 Is a restart required for each node?
  
 From: Robert Coli [mailto:rc...@eventbrite.com] 
 Sent: Thursday, December 19, 2013 4:17 PM
 To: user@cassandra.apache.org
 Subject: Re: Issue upgrading from 1.2 to 2.0.3
  
 On Thu, Dec 19, 2013 at 1:03 PM, Parag Patel parag.pa...@fusionts.com wrote:
 We are in the process of upgrading 1.2 to 2.0.3.  
  ...
 Please help as this will prevent us from pushing into production.
  
 (as a general commentary : 
 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ )
  
 specific feedback on your question :
  
 Did the 2.0.3 nodes see the 1.2.x (which 1.2.x?) nodes after the first 
 restart?
  
 =Rob

Re: Improving write performance in Cassandra and a few related issues...

2013-12-19 Thread Aaron Morton

 Thanks for the reply.  By packet drops I mean, the client is not able to read 
 the shared memory as fast as the software switch is writing into it..
 
 
What is the error you are getting on the client ? 

Also, I would like to know if in general , distribution of write requests 
 to different Casaandra nodes instead of only to one, leads to increased write 
 performance in Cassandra.
 
In general yes, clients should distribute their writes. 

Is there any particular way in which write performance can be measured, 
 preferably from the client???
 
 
Logging at the client level ? 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 18/12/2013, at 5:02 pm, Krishna Chaitanya bnsk1990r...@gmail.com wrote:

 Thanks for the reply.  By packet drops I mean, the client is not able to read 
 the shared memory as fast as the software switch is writing into it..
   I doubt its the issue with the client but can you in particular issues 
 that could cause this type of scenario?
Also, I would like to know if in general , distribution of write requests 
 to different Casaandra nodes instead of only to one, leads to increased write 
 performance in Cassandra.
Is there any particular way in which write performance can be measured, 
 preferably from the client???
 
 On Dec 18, 2013 8:30 AM, Aaron Morton aa...@thelastpickle.com wrote:
 rite throughput is remaining at around 460 pkts/sec or sometimes even 
 falling below that rate as against the expected rate of around 920 pkts/sec. 
 Is it some kind of limitation of Cassandra or am I doing something wrong??? 
 There is nothing in cassandra that would make that happen. Double check your 
 client. 
 
I also see an 
 increase in packet drops when I try to store the packets from both the hosts 
 into the same keyspace. The packets are getting collected properly followed 
 by intervals in which they are being dropped in both the systems, at the 
 same time. Could this be some kind of a buffer issue??? 
 What do you mean by packet drops ? 
 
 Do you mean dropped messages in cassandra ? 
 
 Also, can write throughput be increased by distributing the write requests 
 between the 2 Cassandra nodes instead of sending the requests to a single 
 node? Currently, I dont see any improvement even  if I distribute the write 
 requests to different hosts. How can I improve the write performance overall?
 Normally we expect 3k to 4k non counter writes per core per node, if you are 
 not seeing that it may be configuration or the client. 
 
 Hope that helps. 
  
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 15/12/2013, at 7:51 pm, Krishna Chaitanya bnsk1990r...@gmail.com wrote:
 
 Hello,
I am a newbie to the Cassandra world and have a few doubts which I 
 wanted to clarify. I am having a software switch that stores netflow packets 
 into a shared memory segment and a daemon that reads that memory segment and 
 stores them into a 2-node Cassandra cluster. Currently, I am storing the 
 packets from 2 hosts into 2 different keyspaces, hence only writes and no 
 reads. The write throughput is coming to around 460 pkts/sec in each of the 
 keyspaces. But, when I try to store the packets into the same keyspace, I 
 observe that the write throughput is remaining at around 460 pkts/sec or 
 sometimes even falling below that rate as against the expected rate of 
 around 920 pkts/sec. Is it some kind of limitation of Cassandra or am I 
 doing something wrong??? 
I also see an 
 increase in packet drops when I try to store the packets from both the hosts 
 into the same keyspace. The packets are getting collected properly followed 
 by intervals in which they are being dropped in both the systems, at the 
 same time. Could this be some kind of a buffer issue??? 
   The write requests from both the systems are 
 sent to the same node which is also the seed node. I am mostly using the 
 default Cassandra configuration with replication_factor set to 1 and without 
 durable_writes. The systems are i5s with 4 gb RAM. The data model is: each 
 second is the row key with all the packets collected in that second as the 
 columns. Also, can write throughput be increased by distributing the write 
 requests between the 2 Cassandra nodes instead of sending the requests to a 
 single node? Currently, I dont see any improvement even  if I distribute the 
 write requests to different hosts. How can I improve the write performance 
 overall?
 
 Thanks.
 -- 
 Regards,
 BNSK.

Re: Unable to create collection inside collection

2013-12-17 Thread Aaron Morton

 Could anybody suggest me how do I achieve it in Cassandra.
It’s not supported. 

You may want to model the feeschedule as a table. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/12/2013, at 11:09 pm, Santosh Shet santosh.s...@vista-one-solutions.com 
wrote:

 Hi,
  
 I am not able to create collection inside another collection in Cassandra. 
 Please find screenshot below
  
 image001.png
  
 In the above screenshot, I am trying to create column named feeschedule with 
 type Map and Map have values which is of type List.
  
 Could anybody suggest me how do I achieve it in Cassandra.
 My Cassandra version details are given below:
  
 cqlsh version- cqlsh 4.1.0
 Cassandra version – 2.0.2
  
 Thanks in advance,
  
 Regards
 Santosh Shet
 Software Engineer | VistaOne Solutions
 Direct India : +91 80 30273829 | Mobile India : +91 8105720582
 Skype : santushet

Re: Cassandra pytho pagination

2013-12-17 Thread Aaron Morton

CQL3 and thrift do not support an offset clause, so you can only really support 
next / prev page calls to the database. 

 I am trying to use xget with column_count and buffer_size parameters. Can 
 someone explain me, how does it work? From doc, my understanding is that, I 
 can do something like,
What client are you using ? 
xget is not a standard cassandra function. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 13/12/2013, at 4:56 am, Kumar Ranjan winnerd...@gmail.com wrote:

 Hey Folks,
 
 I need some ideas about support implementing of pagination on the browser, 
 from the backend. So python code (backend) gets request from frontend with 
 page=1,2,3,4 and so on and count_per_page=50. 
 
 I am trying to use xget with column_count and buffer_size parameters. Can 
 someone explain me, how does it work? From doc, my understanding is that, I 
 can do something like,
 
 
 total_cols is total columns for that key.
 count is what user sends me. 
 .xget('Twitter_search', hh, column_count=total_cols, buffer_size=count):
 
 Is my understanding correct? because its not working for page 2 and so on? 
 Please enlighten me with suggestions.
 
 Thanks.

Re: Cassandra data update for a row

2013-12-17 Thread Aaron Morton

 'twitter_row_key': OrderedDict([('411186035495010304', u'{score: 0, tid: 
 411186035495010304, created_at: Thu Dec 12 17:29:24 + 2013, 
 favorite: 0, retweet: 0, approved: true}'),])
 
 How can I set approved to 'false' ??
 
 
It looks like the value of the 411186035495010304 column is a string, to 
cassandra that’s an opaque typer we do not make partial updates to. 

If you need to update the values individually they need to be stored in 
columns. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 13/12/2013, at 8:18 am, Kumar Ranjan winnerd...@gmail.com wrote:

 Hey Folks,
 
 I have a row like this. 'twitter_row_key' is the row key and 
 411186035495010304 is column. Rest is values for 411186035495010304 column. 
 See below.
 
 'twitter_row_key': OrderedDict([('411186035495010304', u'{score: 0, tid: 
 411186035495010304, created_at: Thu Dec 12 17:29:24 + 2013, 
 favorite: 0, retweet: 0, approved: true}'),])
 
 How can I set approved to 'false' ??
 
 
 
 When I try insert for row key 'twitter_row_key' and column 
 411186035495010304, it overwrites the whole data and new row becomes like this
 
 'twitter_row_key': OrderedDict([('411186035495010304', u'{approved: 
 true}'),])
 
 
 
 Any thoughts guys?

Re: Get all the data for x number of seconds from CQL?

2013-12-17 Thread Aaron Morton

 Is it possible to get all the data for last 5 seconds or 10 seconds or 30 
 seconds by using the id column?
Not using the current table.

Try this 

CREATE TABLE test1 (
day integer,
timestamp   integer,
count   integer, 
record_name text,
record_valueblob,
  PRIMARY KEY (day, timestamp, record_name)
)

Store the day as MMDD and the timestamp as before, you can then do queries 
like 

select * from test1 where day = 20131218 and timestamp  X and timestamp  y;

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 13/12/2013, at 11:28 am, Techy Teck comptechge...@gmail.com wrote:

 Below is my CQL table - 
 
 CREATE TABLE test1 (
   id text,
   record_name text,
   record_value blob,
   PRIMARY KEY (id, record_name)
 )
 
 here id column will have data like this - 
 
 timestamp.count
 
 And here timestamp is in milliseconds but rounded up to nearest seconds. So 
 as an example, data in `id column` will be like this - 
 
 138688293.1
 
 And a single row in the above table will be like this - 
 
 138688293.1 | event_name | hello-world
 
 Now my question is - 
 
 Is it possible to get all the data for last 5 seconds or 10 seconds or 30 
 seconds by using the id column?
 
 I am running Cassandra 1.2.9

Re: Write performance with 1.2.12

2013-12-17 Thread Aaron Morton

 With a single node I get 3K for cassandra 1.0.12 and 1.2.12. So I suspect 
 there is some network chatter. I have started looking at the sources, hoping 
 to find something.
1.2 is pretty stable, I doubt there is anything in there that makes it run 
slower than 1.0. It’s probably something in your configuration or network.

Compare the local write time from nodetool cfhistograms and the request latency 
from nodetool proxyhistograms. Writes request latency should be a bit below 1ms 
and local write latency should be around .5 ms or better. if there is a wider 
difference between the two it’s wait time + network time. 

As a general rule you should get around 3k to 4k writes per second per core.

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 13/12/2013, at 8:06 pm, Rahul Menon ra...@apigee.com wrote:

 Quote from 
 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
 
 Murmur3Partitioner is NOT compatible with RandomPartitioner, so if you’re 
 upgrading and using the new cassandra.yaml file, be sure to change the 
 partitioner back to RandomPartitioner
 
 
 On Thu, Dec 12, 2013 at 10:57 PM, srmore comom...@gmail.com wrote:
 
 
 
 On Thu, Dec 12, 2013 at 11:15 AM, J. Ryan Earl o...@jryanearl.us wrote:
 Why did you switch to RandomPartitioner away from Murmur3Partitioner?  Have 
 you tried with Murmur3?
 
 # partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 partitioner: org.apache.cassandra.dht.RandomPartitioner
 
 
 Since I am comparing between the two versions I am keeping all the settings 
 same. I see
 Murmur3Partitioner has some performance improvement but then switching back to
 RandomPartitioner should not cause performance to tank, right ? or am I 
 missing something ? 
 
 Also, is there an easier way to update the data from RandomPartitioner to 
 Murmur3 ? (upgradesstable ?)
 
 
  
 
 On Fri, Dec 6, 2013 at 10:36 AM, srmore comom...@gmail.com wrote:
 
 
 
 On Fri, Dec 6, 2013 at 9:59 AM, Vicky Kak vicky@gmail.com wrote:
 You have passed the JVM configurations and not the cassandra configurations 
 which is in cassandra.yaml.
 
 Apologies, was tuning JVM and that's what was in my mind. 
 Here are the cassandra settings http://pastebin.com/uN42GgYT
 
  
 The spikes are not that significant in our case and we are running the 
 cluster with 1.7 gb heap.
 
 Are these spikes causing any issue at your end?
 
 There are no big spikes, the overall performance seems to be about 40% low.
  
 
 
 
 
 On Fri, Dec 6, 2013 at 9:10 PM, srmore comom...@gmail.com wrote:
 
 
 
 On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:
 Hard to say much without knowing about the cassandra configurations.
  
 The cassandra configuration is 
 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=4
 -XX:MaxTenuringThreshold=2
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly
 
  
 Yes compactions/GC's could skipe the CPU, I had similar behavior with my 
 setup.
 
 Were you able to get around it ?
  
 
 -VK
 
 
 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:
 We have a 3 node cluster running cassandra 1.2.12, they are pretty big 
 machines 64G ram with 16 cores, cassandra heap is 8G. 
 
 The interesting observation is that, when I send traffic to one node its 
 performance is 2x more than when I send traffic to all the nodes. We ran 
 1.0.11 on the same box and we observed a slight dip but not half as seen with 
 1.2.12. In both the cases we were writing with LOCAL_QUORUM. Changing CL to 
 ONE make a slight improvement but not much.
 
 The read_Repair_chance is 0.1. We see some compactions running.
 
 following is my iostat -x output, sda is the ssd (for commit log) and sdb is 
 the spinner.
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   66.460.008.950.010.00   24.58
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz 
 avgqu-sz   await  svctm  %util
 sda   0.0027.60  0.00  4.40 0.00   256.0058.18 
 0.012.55   1.32   0.58
 sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 sda2  0.0027.60  0.00  4.40 0.00   256.0058.18 
 0.012.55   1.32   0.58
 sdb   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 sdb1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 dm-0  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 dm-1  0.00 0.00  0.00  0.60 0.00 4.80 8.00 
 0.005.33   2.67   0.16
 dm-2  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 dm-3  0.00

Re: Bulkoutputformat

2013-12-17 Thread Aaron Morton

 Request did not complete within rpc_timeout.
The node is overloaded and did not return in time. 

Check the logs for errors or excessive JVM GC and try selecting less data. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 14/12/2013, at 10:06 am, varun allampalli vshoori.off...@gmail.com wrote:

 Thanks Rahul..article was insightful
 
 
 On Fri, Dec 13, 2013 at 12:25 AM, Rahul Menon ra...@apigee.com wrote:
 Here you go 
 
 http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html
 
 
 On Fri, Dec 13, 2013 at 7:19 AM, varun allampalli vshoori.off...@gmail.com 
 wrote:
 Hi Aaron,
 
 It seems like you answered the question here. 
 
 https://groups.google.com/forum/#!topic/nosql-databases/vjZA5vdycWA
 
 Can you give me the link to the blog which you mentioned 
 
 http://thelastpickle.com/2013/01/11/primary-keys-in-cql/
 
 Thanks in advance
 Varun
 
 
 On Thu, Dec 12, 2013 at 3:36 PM, varun allampalli vshoori.off...@gmail.com 
 wrote:
 Thanks Aaron, I was able to generate sstables and load using sstableloader. 
 But after loading the tables when I do a select query I get this, the table 
 has only one record. Is there anything I am missing or any logs I can look 
 at. 
 
 Request did not complete within rpc_timeout.
 
 
 On Wed, Dec 11, 2013 at 7:58 PM, Aaron Morton aa...@thelastpickle.com wrote:
 If you don’t need to use Hadoop then try the SSTableSimpleWriter and 
 sstableloader , this post is a little old but still relevant 
 http://www.datastax.com/dev/blog/bulk-loading
 
 Otherwise AFAIK BulkOutputFormat is what you want from hadoop 
 http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration
 
 Cheers
  
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 12/12/2013, at 11:27 am, varun allampalli vshoori.off...@gmail.com wrote:
 
 Hi All,
 
 I want to bulk insert data into cassandra. I was wondering of using 
 BulkOutputformat in hadoop. Is it the best way or using driver and doing 
 batch insert is the better way. 
 
 Are there any disandvantages of using bulkoutputformat. 
 
 Thanks for helping
 
 Varun

Re: Improving write performance in Cassandra and a few related issues...

2013-12-17 Thread Aaron Morton

 rite throughput is remaining at around 460 pkts/sec or sometimes even falling 
 below that rate as against the expected rate of around 920 pkts/sec. Is it 
 some kind of limitation of Cassandra or am I doing something wrong??? 
There is nothing in cassandra that would make that happen. Double check your 
client. 

I also see an 
 increase in packet drops when I try to store the packets from both the hosts 
 into the same keyspace. The packets are getting collected properly followed 
 by intervals in which they are being dropped in both the systems, at the same 
 time. Could this be some kind of a buffer issue??? 
What do you mean by packet drops ? 

Do you mean dropped messages in cassandra ? 

 Also, can write throughput be increased by distributing the write requests 
 between the 2 Cassandra nodes instead of sending the requests to a single 
 node? Currently, I dont see any improvement even  if I distribute the write 
 requests to different hosts. How can I improve the write performance overall?
Normally we expect 3k to 4k non counter writes per core per node, if you are 
not seeing that it may be configuration or the client. 

Hope that helps. 
 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 15/12/2013, at 7:51 pm, Krishna Chaitanya bnsk1990r...@gmail.com wrote:

 Hello,
I am a newbie to the Cassandra world and have a few doubts which I 
 wanted to clarify. I am having a software switch that stores netflow packets 
 into a shared memory segment and a daemon that reads that memory segment and 
 stores them into a 2-node Cassandra cluster. Currently, I am storing the 
 packets from 2 hosts into 2 different keyspaces, hence only writes and no 
 reads. The write throughput is coming to around 460 pkts/sec in each of the 
 keyspaces. But, when I try to store the packets into the same keyspace, I 
 observe that the write throughput is remaining at around 460 pkts/sec or 
 sometimes even falling below that rate as against the expected rate of around 
 920 pkts/sec. Is it some kind of limitation of Cassandra or am I doing 
 something wrong??? 
I also see an 
 increase in packet drops when I try to store the packets from both the hosts 
 into the same keyspace. The packets are getting collected properly followed 
 by intervals in which they are being dropped in both the systems, at the same 
 time. Could this be some kind of a buffer issue??? 
   The write requests from both the systems are 
 sent to the same node which is also the seed node. I am mostly using the 
 default Cassandra configuration with replication_factor set to 1 and without 
 durable_writes. The systems are i5s with 4 gb RAM. The data model is: each 
 second is the row key with all the packets collected in that second as the 
 columns. Also, can write throughput be increased by distributing the write 
 requests between the 2 Cassandra nodes instead of sending the requests to a 
 single node? Currently, I dont see any improvement even  if I distribute the 
 write requests to different hosts. How can I improve the write performance 
 overall?
 
 Thanks.
 -- 
 Regards,
 BNSK.

Re: Cassandra 1.2 : OutOfMemoryError: unable to create new native thread

2013-12-17 Thread Aaron Morton

Try using jstack to see if there are a lot of threads there. 

Are you using vNodea and Hadoop  ? 
https://issues.apache.org/jira/browse/CASSANDRA-6169

Cheers

 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 17/12/2013, at 2:48 am, Maciej Miklas mac.mik...@gmail.com wrote:

 the cassandra-env.sh has option
 JVM_OPTS=$JVM_OPTS -Xss180k
 
 it will give this error if you start cassandra with java 7. So increase the 
 value, or remove option.
 
 Regards,
 Maciej
 
 
 On Mon, Dec 16, 2013 at 2:37 PM, srmore comom...@gmail.com wrote:
 What is your thread stack size (xss) ? try increasing that, that could help. 
 Sometimes the limitation is imposed by the host provider (e.g.  amazon ec2 
 etc.)
 
 Thanks,
 Sandeep  
 
 
 On Mon, Dec 16, 2013 at 6:53 AM, Oleg Dulin oleg.du...@gmail.com wrote:
 Hi guys!
 
 I beleive my limits settings are correct. Here is the output of ulimits -a:
 
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1547135
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 10
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 32768
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 
 However,  I just had a couple of cassandra nodes go down over the weekend for 
 no apparent reason with the following error:
 
 java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:691)
at 
 java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
  
at 
 java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
  
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
  
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  
at java.lang.Thread.run(Thread.java:722)
 
 Any input is greatly appreciated.
 -- 
 Regards,
 Oleg Dulin
 http://www.olegdulin.com

Re: Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match

2013-12-17 Thread Aaron Morton

-tmp- files will sit in the data dir, if there was an error creating them 
during compaction or flushing to disk they will sit around until a restart. 

Check the logs for errors to see if compaction was failing on something.

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 17/12/2013, at 12:28 pm, Narendra Sharma narendra.sha...@gmail.com wrote:

 No snapshots.
 
 I restarted the node and now the Load in ring is in sync with the disk usage. 
 Not sure what caused it to go out of sync. However, the Live SStable count 
 doesn't match exactly with the number of data files on disk.
 
 I am going through the Cassandra code to understand what could be the reason 
 for the mismatch in the sstable count and also why there is no reference of 
 some of the data files in system.log.
 
 
 
 
 On Mon, Dec 16, 2013 at 2:45 PM, Arindam Barua aba...@247-inc.com wrote:
  
 
 Do you have any snapshots on the nodes where you are seeing this issue?
 
 Snapshots will link to sstables which will cause them not be deleted.
 
  
 
 -Arindam
 
  
 
 From: Narendra Sharma [mailto:narendra.sha...@gmail.com] 
 Sent: Sunday, December 15, 2013 1:15 PM
 To: user@cassandra.apache.org
 Subject: Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match
 
  
 
 We have 8 node cluster. Replication factor is 3. 
 
  
 
 For some of the nodes the Disk usage (du -ksh .) in the data directory for CF 
 doesn't match the Load reported in nodetool ring command. When we expanded 
 the cluster from 4 node to 8 nodes (4 weeks back), everything was okay. Over 
 period of last 2-3 weeks the disk usage has gone up. We increased the RF from 
 2 to 3 2 weeks ago.
 
  
 
 I am not sure if increasing the RF is causing this issue.
 
  
 
 For one of the nodes that I analyzed:
 
 1. nodetool ring reported load as 575.38 GB
 
  
 
 2. nodetool cfstats for the CF reported:
 
 SSTable count: 28
 
 Space used (live): 572671381955
 
 Space used (total): 572671381955
 
  
 
  
 
 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned
 
 46
 
  
 
 4. 'du -ksh .' in the data folder for CF returned
 
 720G
 
  
 
 The above numbers indicate that there are some sstables that are obsolete and 
 are still occupying space on disk. What could be wrong? Will restarting the 
 node help? The cassandra process is running for last 45 days with no 
 downtime. However, because the disk usage is high, we are not able to run 
 full compaction.
 
  
 
 Also, I can't find reference to each of the sstables on disk in the 
 system.log file. For eg I have one data file on disk as (ls -lth):
 
 86G Nov 20 06:14
 
  
 
 I have system.log file with first line:
 
 INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line 101) 
 Logging initialized
 
  
 
 The 86G file must be a result of some compaction. I see no reference of data 
 file in system.log file between 11/18 to 11/25. What could be the reason for 
 that? The only reference is dated 11/29 when the file was being streamed to 
 another node (new node).
 
  
 
 How can I identify the obsolete files and remove them? I am thinking about 
 following. Let me know if it make sense.
 
 1. Restart the node and check the state.
 
 2. Move the oldest data files to another location (to another mount point)
 
 3. Restart the node again
 
 4. Run repair on the node so that it can get the missing data from its peers.
 
  
 
  
 
 I compared the numbers of a healthy node for the same CF:
 
 1. nodetool ring reported load as 662.95 GB
 
  
 
 2. nodetool cfstats for the CF reported:
 
 SSTable count: 16
 
 Space used (live): 670524321067
 
 Space used (total): 670524321067
 
  
 
 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned
 
 16
 
  
 
 4. 'du -ksh .' in the data folder for CF returned
 
 625G
 
  
 
  
 
 -Naren
 
  
 
 
 
  
 
 -- 
 Narendra Sharma
 
 Software Engineer
 
 http://www.aeris.com
 
 http://narendrasharma.blogspot.com/
 
  
 
 
 
 
 -- 
 Narendra Sharma
 Software Engineer
 http://www.aeris.com
 http://narendrasharma.blogspot.com/

Re: various Cassandra performance problems when CQL3 is really used

2013-12-17 Thread Aaron Morton

 * select id from table where token(id)  token(some_value) and 
 secondary_index = other_val limit 2 allow filtering;
 
 Filtering absolutely kills the performance. On a table populated with 130.000 
 records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) 
 and secondary index built on column with low cardinality of its value set 
 this query takes 156 seconds to finish.
Yes, this is why you have to add allow_filtering. You are asking the nodes to 
read all the data that matches and filter in memory, that’s a SQL type 
operation. 

Your example query is somewhat complex and I doubt it could get decent 
performance, what does the query plan look like? 

IMHO you need to do further de-normalisation, you will get the best performance 
when you select rows by their full or part primary key.

 
 By the way, the performance is order of magnitude better if this patch is 
 applied:
That looks like it’s tuned to your specific need, it would ignore the max 
results included in the query. 

 * select id from table;
 
 As we saw in the trace log, the query - although it queries just row ids - 
 scans all columns of all the rows and (probably) compares TTL with current 
 time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that 
 if the table somehow mixes wide and narrow rows, the performance suffers 
 horribly.
Select all rows from a table requires a range scan, which reads all rows from 
all nodes. It should never be used production. 

Not sure what you mean by “scans all columns from all rows” a select by column 
name will use a SliceByNamesReadCommand which will only read the required 
columns from each SSTable (it normally short circuits though and read from 
less). 

if there is a TTL the ExpiringColumn.localExpirationTime must be checked, if 
there is no TTL it will no be checked. 

 As Cassandra checks all the columns in selects, performance suffers badly if 
 the collection is of any interesting size.
This is not true, could you provide an example where you think this is 
happening ? 

 Additionally, we saw various random irreproducible freezes, high CPU 
 consumption when nothing happens (even with trace log level set no activity 
 was reported) and highly inpredictable performance characteristics after 
 nodetool flush and/or major compaction.
What was the HW platform and what was the load ?
Typically freezes in the server correlate to JVM GC, the JVM GC can also be 
using the CPU. 
If you have wide rows or make large reads you may run into more JVM GC issues. 

nodetool flush will (as it says) flush all the tables to disk, if you have a 
lot tables and/or a lot of secondary indexes this can cause the switch lock to 
be held preventing write threads from progressing. Once flush threads stop 
waiting on the flush queue the lock will be released. See the help for 
memtable_flush_queue_size in the yaml file. 

major compaction is not recommended to be used in production. If you are seeing 
it cause performance problems I would guess it is related to JVM GC and/or the 
disk IO is not able to keep up. When used it creates a single SSTable for each 
table which will not be compacted again until (default) 3 other large SSTables 
are created or you run major compaction again. For this reason it is not 
recommended. 

 Conclusions:
 
 - do not use collections
 - do not use secondary indexes
 - do not use filtering
 - have your rows as narrow as possible if you need any kind of all row keys 
 traversal
These features all have a use, but it looks like you leaned on them heavily 
while creating a relational model. Specially the filtering, you have to 
explicitly enable  it to prevent the client sending queries that will take a 
long time. 
The only time row key traversal is used normally is reading data through 
hadoop. You should always strive to read row(s) from a table by  the full or 
partial primary key. 

 With these conclusions in mind, CQL seems redundant, plain old thrift may be 
 used, joins should be done client side and/or all indexes need to be handled 
 manually. Correct?
No.
CQL provide a set of functionality not present in the thrift API. 
Joins and indexes should generally be handled by denormlaising the data during 
writes. 

It sounds like your data model was too relational, you need to denormalise and 
read rows by primary key. Secondary indexes are useful when you have a query 
pattern that is used infrequently. 

Hope that helps. 


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 18/12/2013, at 3:47 am, Ondřej Černoš cern...@gmail.com wrote:

 Hi all,
 
 we are reimplementing a legacy interface of an inventory-like service 
 (currently built on top of mysql) on Cassandra and I thought I would share 
 some findings with the list. The interface semantics is given and cannot be 
 changed. We chose Cassandra due to its multiple datacenter capabilities

Re: AddContractPoint /VIP

2013-12-11 Thread Aaron Morton

 What is the good practice to put in the code as addContactPoint ie.,how many 
 servers ?
I use the same nodes as the seed list nodes for that DC. 

The idea of the seed list is that it’s a list of well known nodes, and it’s 
easier operationally to say we have one list of well known nodes that is used 
by the servers and the clients. 

 1) I am also thinking to put this way here   I am not sure this good or bad 
 if i conigure   4 serves into one VIP ( virtual IP/virtual DNS)
 and specifying that DSN in the code as ContactPoint,  so that that VIP is 
 smart enough to route to different nodes.
Too complicated. 

 2) Is that problem if i use multiple Data centers in future ?
You only need to give the client the local seeds, it will discover all the 
nodes. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 7/12/2013, at 7:12 am, chandra Varahala hadoopandcassan...@gmail.com wrote:

 Greetings,
 
 I have 4 node cassandra cluster that will grow upt to 10 nodes,we are using  
 CQL  Java client to access the  data.
 What is the good practice to put in the code as addContactPoint ie.,how many 
 servers ?
 
 1) I am also thinking to put this way here   I am not sure this good or bad 
 if i conigure   4 serves into one VIP ( virtual IP/virtual DNS)
 and specifying that DSN in the code as ContactPoint,  so that that VIP is 
 smart enough to route to different nodes.
 
 2) Is that problem if i use multiple Data centers in future ?
 
 
 thanks
 Chandra

Re: Write performance with 1.2.12

2013-12-11 Thread Aaron Morton

 Changed memtable_total_space_in_mb to 1024 still no luck.
Reducing memtable_total_space_in_mb will increase the frequency of flushing to 
disk, which will create more for compaction to do and result in increased IO. 

You should return it to the default.

 when I send traffic to one node its performance is 2x more than when I send 
 traffic to all the nodes.
  
What are you measuring, request latency or local read/write latency ? 

If it’s write latency it’s probably GC, if it’s read is probably IO or data 
model. 

Hope that helps. 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 7/12/2013, at 8:05 am, srmore comom...@gmail.com wrote:

 Changed memtable_total_space_in_mb to 1024 still no luck.
 
 
 On Fri, Dec 6, 2013 at 11:05 AM, Vicky Kak vicky@gmail.com wrote:
 Can you set the memtable_total_space_in_mb value, it is defaulting to 1/3 
 which is 8/3 ~ 2.6 gb in capacity
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
 
 The flushing of 2.6 gb to the disk might slow the performance if frequently 
 called, may be you have lots of write operations going on.
 
 
 
 On Fri, Dec 6, 2013 at 10:06 PM, srmore comom...@gmail.com wrote:
 
 
 
 On Fri, Dec 6, 2013 at 9:59 AM, Vicky Kak vicky@gmail.com wrote:
 You have passed the JVM configurations and not the cassandra configurations 
 which is in cassandra.yaml.
 
 Apologies, was tuning JVM and that's what was in my mind. 
 Here are the cassandra settings http://pastebin.com/uN42GgYT
 
  
 The spikes are not that significant in our case and we are running the 
 cluster with 1.7 gb heap.
 
 Are these spikes causing any issue at your end?
 
 There are no big spikes, the overall performance seems to be about 40% low.
  
 
 
 
 
 On Fri, Dec 6, 2013 at 9:10 PM, srmore comom...@gmail.com wrote:
 
 
 
 On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:
 Hard to say much without knowing about the cassandra configurations.
  
 The cassandra configuration is 
 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=4
 -XX:MaxTenuringThreshold=2
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly
 
  
 Yes compactions/GC's could skipe the CPU, I had similar behavior with my 
 setup.
 
 Were you able to get around it ?
  
 
 -VK
 
 
 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:
 We have a 3 node cluster running cassandra 1.2.12, they are pretty big 
 machines 64G ram with 16 cores, cassandra heap is 8G. 
 
 The interesting observation is that, when I send traffic to one node its 
 performance is 2x more than when I send traffic to all the nodes. We ran 
 1.0.11 on the same box and we observed a slight dip but not half as seen with 
 1.2.12. In both the cases we were writing with LOCAL_QUORUM. Changing CL to 
 ONE make a slight improvement but not much.
 
 The read_Repair_chance is 0.1. We see some compactions running.
 
 following is my iostat -x output, sda is the ssd (for commit log) and sdb is 
 the spinner.
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   66.460.008.950.010.00   24.58
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz 
 avgqu-sz   await  svctm  %util
 sda   0.0027.60  0.00  4.40 0.00   256.0058.18 
 0.012.55   1.32   0.58
 sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 sda2  0.0027.60  0.00  4.40 0.00   256.0058.18 
 0.012.55   1.32   0.58
 sdb   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 sdb1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 dm-0  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 dm-1  0.00 0.00  0.00  0.60 0.00 4.80 8.00 
 0.005.33   2.67   0.16
 dm-2  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 dm-3  0.00 0.00  0.00 24.80 0.00   198.40 8.00 
 0.249.80   0.13   0.32
 dm-4  0.00 0.00  0.00  6.60 0.0052.80 8.00 
 0.011.36   0.55   0.36
 dm-5  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 dm-6  0.00 0.00  0.00 24.80 0.00   198.40 8.00 
 0.29   11.60   0.13   0.32
 
 
 
 I can see I am cpu bound here but couldn't figure out exactly what is causing 
 it, is this caused by GC or Compaction ? I am thinking it is compaction, I 
 see a lot of context switches and interrupts in my vmstat output.
 
 I don't see GC activity in the logs but see some compaction activity. Has 
 anyone seen

Re: OOMs during high (read?) load in Cassandra 1.2.11

2013-12-11 Thread Aaron Morton

Do you have the back trace for from the heap dump so we can see what the array 
was and what was using it ?

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 10/12/2013, at 4:41 am, Klaus Brunner klaus.brun...@gmail.com wrote:

 2013/12/9 Nate McCall n...@thelastpickle.com:
 Do you have any secondary indexes defined in the schema? That could lead to
 a 'mega row' pretty easily depending on the cardinality of the value.
 
 That's an interesting point - but no, we don't have any secondary
 indexes anywhere. From the heap dump, it's fairly evident that it's
 not a single huge row but actually many rows.
 
 I'll keep watching if this occurs again, or if the compaction fixed it for 
 good.
 
 Thanks,
 
 Klaus

Re: Data Modelling Information

2013-12-11 Thread Aaron Morton

 create table messages(
   body text,
   username text,
   tags settext
   PRIMARY keys(username,tags)
 )

This statement is syntactically invalid, also you cannot use a collection type 
in the primary key. 

 1) I should be able to query by username and get all the messages for a 
 particular username

yes. 

 2) I should be able to query by tags and username ( likes select * from 
 messages where username='xya' and tags in ('awesome','phone'))
No.

 3) I should be able to query all messages by day and  order by desc and limit 
 to some value
No.

 Could you guys please let me know if creating a secondary index on the tags 
 field?

No, it’s not supported. 


 Or what would be the best way to model this data.

You need to describe the problem and how you want to read the data. I suggest 
taking a look at the data modelling videos from Patrick here 
http://planetcassandra.org/Learn/CassandraCommunityWebinars

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 10/12/2013, at 8:57 am, Shrikar archak shrika...@gmail.com wrote:

 Hi Data Model Experts,
 I have a few questions with data modelling for a particular application.
 
 example
 create table messages(
   body text,
   username text,
   tags settext
   PRIMARY keys(username,tags)
 )
 
 
 Requirements
 1) I should be able to query by username and get all the messages for a 
 particular username
 2) I should be able to query by tags and username ( likes select * from 
 messages where username='xya' and tags in ('awesome','phone'))
 3) I should be able to query all messages by day and  order by desc and limit 
 to some value
 
 
 Could you guys please let me know if creating a secondary index on the tags 
 field?
 Or what would be the best way to model this data.
 
 Thanks,
 Shrikar

Re: Nodetool repair exceptions in Cassandra 2.0.2

2013-12-11 Thread Aaron Morton

 [2013-12-08 11:04:02,047] Repair session ff16c510-5ff7-11e3-97c0-5973cc397f8f 
 for range (1246984843639507027,1266616572749926276] failed with error 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #ff16c510-5ff7-11e3-97c0-5973cc397f8f on keyspace_name/col_family1, 
 (1246984843639507027,1266616572749926276]] Validation failed in /10.x.x.48
the 10.x.x.48 node sent a tree response (merkle tree) to this node that did not 
contain the tree. This node then killed the repair session. 

Look for log messages on 10.x.x.48 that correlate with the repair session ID 
above. They may look like 

logger.error(Failed creating a merkle tree for  + desc + ,  + initiator +  
(see log for details)”);

or 

logger.info(String.format([repair #%s] Sending completed merkle tree to %s for 
%s/%s, desc.sessionId, initiator, desc.keyspace, desc.columnFamily));

Hope that helps. 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 10/12/2013, at 12:57 pm, Laing, Michael michael.la...@nytimes.com wrote:

 My experience is that you must upgrade to 2.0.3 ASAP to fix this.
 
 Michael
 
 
 On Mon, Dec 9, 2013 at 6:39 PM, David Laube d...@stormpath.com wrote:
 Hi All,
 
 We are running Cassandra 2.0.2 and have recently stumbled upon an issue with 
 nodetool repair. Upon running nodetool repair on each of the 5 nodes in the 
 ring (one at a time) we observe the following exceptions returned to standard 
 out;
 
 
 [2013-12-08 11:04:02,047] Repair session ff16c510-5ff7-11e3-97c0-5973cc397f8f 
 for range (1246984843639507027,1266616572749926276] failed with error 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #ff16c510-5ff7-11e3-97c0-5973cc397f8f on keyspace_name/col_family1, 
 (1246984843639507027,1266616572749926276]] Validation failed in /10.x.x.48
 [2013-12-08 11:04:02,063] Repair session 284c8b40-5ff8-11e3-97c0-5973cc397f8f 
 for range (-109256956528331396,-89316884701275697] failed with error 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #284c8b40-5ff8-11e3-97c0-5973cc397f8f on keyspace_name/col_family2, 
 (-109256956528331396,-89316884701275697]] Validation failed in /10.x.x.103
 [2013-12-08 11:04:02,070] Repair session 399e7160-5ff8-11e3-97c0-5973cc397f8f 
 for range (8901153810410866970,8915879751739915956] failed with error 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #399e7160-5ff8-11e3-97c0-5973cc397f8f on keyspace_name/col_family1, 
 (8901153810410866970,8915879751739915956]] Validation failed in /10.x.x.103
 [2013-12-08 11:04:02,072] Repair session 3ea73340-5ff8-11e3-97c0-5973cc397f8f 
 for range (1149084504576970235,1190026362216198862] failed with error 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #3ea73340-5ff8-11e3-97c0-5973cc397f8f on keyspace_name/col_family1, 
 (1149084504576970235,1190026362216198862]] Validation failed in /10.x.x.103
 [2013-12-08 11:04:02,091] Repair session 6f0da460-5ff8-11e3-97c0-5973cc397f8f 
 for range (-5407189524618266750,-5389231566389960750] failed with error 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #6f0da460-5ff8-11e3-97c0-5973cc397f8f on keyspace_name/col_family1, 
 (-5407189524618266750,-5389231566389960750]] Validation failed in /10.x.x.103
 [2013-12-09 23:16:36,962] Repair session 7efc2740-6127-11e3-97c0-5973cc397f8f 
 for range (1246984843639507027,1266616572749926276] failed with error 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #7efc2740-6127-11e3-97c0-5973cc397f8f on keyspace_name/col_family1, 
 (1246984843639507027,1266616572749926276]] Validation failed in /10.x.x.48
 [2013-12-09 23:16:36,986] Repair session a8c44260-6127-11e3-97c0-5973cc397f8f 
 for range (-109256956528331396,-89316884701275697] failed with error 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #a8c44260-6127-11e3-97c0-5973cc397f8f on keyspace_name/col_family2, 
 (-109256956528331396,-89316884701275697]] Validation failed in /10.x.x.210
 
 The /var/log/cassandra/system.log shows similar info as above with no real 
 explanation as to the root cause behind the exception(s).  There also does 
 not appear to be any additional info in /var/log/cassandra/cassandra.log. We 
 have tried restoring a recent snapshot of the keyespace in question to a 
 separate staging ring and the repair runs successfully and without exception 
 there. This is even after we tried insert/delete on the keyspace in the 
 separate staging ring. Has anyone seen this behavior before and what can we 
 do to resolve this? Any assistance would be greatly appreciated.
 
 Best regards,
 -Dave

Re: setting PIG_INPUT_INITIAL_ADDRESS environment . variable in Oozie for cassandra ...¿?

2013-12-11 Thread Aaron Morton

 Caused by: java.io.IOException: PIG_INPUT_INITIAL_ADDRESS or 
 PIG_INITIAL_ADDRESS environment variable not set
   at 
 org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(CassandraStorage.java:314)
   at 
 org.apache.cassandra.hadoop.pig.CassandraStorage.getSchema(CassandraStorage.java:358)
   at 
 org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
   ... 35 more

Have you checked these are set ?

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 11/12/2013, at 4:00 am, Miguel Angel Martin junquera 
mianmarjun.mailingl...@gmail.com wrote:

 Hi,
 
 I have an error with pig action in oozie 4.0.0  using cassandraStorage. 
 (cassandra 1.2.10)
 
 I can run pig scripts right  with cassandra. but whe I try to use 
 cassandraStorage to load data I have this error:
 
 
 Run pig script using PigRunner.run() for Pig version 0.8+
 Apache Pig version 0.10.0 (r1328203) 
 compiled Apr 20 2012, 00:33:25
 
 Run pig script using PigRunner.run() for Pig version 0.8+
 2013-12-10 12:24:39,084 [main] INFO  org.apache.pig.Main  - Apache Pig 
 version 0.10.0 (r1328203) compiled Apr 20 2012, 00:33:25
 2013-12-10 12:24:39,084 [main] INFO  org.apache.pig.Main  - Apache Pig 
 version 0.10.0 (r1328203) compiled Apr 20 2012, 00:33:25
 2013-12-10 12:24:39,095 [main] INFO  org.apache.pig.Main  - Logging error 
 messages to: 
 /tmp/hadoop-ec2-user/mapred/local/taskTracker/ec2-user/jobcache/job_201312100858_0007/attempt_201312100858_0007_m_00_0/work/pig-job_201312100858_0007.log
 2013-12-10 12:24:39,095 [main] INFO  org.apache.pig.Main  - Logging error 
 messages to: 
 /tmp/hadoop-ec2-user/mapred/local/taskTracker/ec2-user/jobcache/job_201312100858_0007/attempt_201312100858_0007_m_00_0/work/pig-job_201312100858_0007.log
 2013-12-10 12:24:39,501 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting 
 to hadoop file system at: hdfs://10.228.243.18:9000
 2013-12-10 12:24:39,501 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting 
 to hadoop file system at: hdfs://10.228.243.18:9000
 2013-12-10 12:24:39,510 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting 
 to map-reduce job tracker at: 10.228.243.18:9001
 2013-12-10 12:24:39,510 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting 
 to map-reduce job tracker at: 10.228.243.18:9001
 2013-12-10 12:24:40,505 [main] ERROR org.apache.pig.tools.grunt.Grunt  - 
 ERROR 2245: 
 file testCassandra.pig, line 7, column 7 Cannot get schema from loadFunc 
 org.apache.cassandra.hadoop.pig.CassandraStorage
 2013-12-10 12:24:40,505 [main] ERROR org.apache.pig.tools.grunt.Grunt  - 
 ERROR 2245: 
 file testCassandra.pig, line 7, column 7 Cannot get schema from loadFunc 
 org.apache.cassandra.hadoop.pig.CassandraStorage
 2013-12-10 12:24:40,505 [main] ERROR org.apache.pig.tools.grunt.Grunt  - 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245: 
 file testCassandra.pig, line 7, column 7 Cannot get schema from loadFunc 
 org.apache.cassandra.hadoop.pig.CassandraStorage
   at 
 org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:155)
   at 
 org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
   at 
 org.apache.pig.newplan.logical.relational.LOStore.getSchema(LOStore.java:68)
   at 
 org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60)
   at 
 org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:84)
   at 
 org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1617)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1611)
   at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1334)
   at org.apache.pig.PigServer.execute(PigServer.java:1239)
   at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
   at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
   at org.apache.pig.Main.run(Main.java:430)
   at org.apache.pig.PigRunner.run(PigRunner.java:49)
   at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:283)
   at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:223)
   at org.apache.oozie.action.hadoop.LauncherMain.run

Re: Exactly one wide row per node for a given CF?

2013-12-11 Thread Aaron Morton

  Querying the table was fast. What I didn’t do was test the table under load, 
 nor did I try this in a multi-node cluster.
As the number of columns in a row increases so does the size of the column 
index which is read as part of the read path. 

For background and comparisons of latency see 
http://thelastpickle.com/blog/2011/07/04/Cassandra-Query-Plans.html  or my talk 
on performance at the SF summit last year 
http://thelastpickle.com/speaking/2012/08/08/Cassandra-Summit-SF.html While the 
column index has been lifted to the -Index.db component AFAIK it must still be 
fully loaded.

Larger rows take longer to go through compaction, tend to cause more JVM GC and 
have issue during repair. See the in_memory_compaction_limit_in_mb comments in 
the yaml file. During repair we detect differences in ranges of rows and stream 
them between the nodes. If you have wide rows and a single column is our of 
sync we will create a new copy of that row on the node, which must then be 
compacted. I’ve seen the load on nodes with very wide rows go down by 150GB 
just by reducing the compaction settings. 

IMHO all things been equal rows in the few 10’s of MB work better. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 11/12/2013, at 2:41 am, Robert Wille rwi...@fold3.com wrote:

 I have a question about this statement:
 
 When rows get above a few 10’s  of MB things can slow down, when they get 
 above 50 MB they can be a pain, when they get above 100MB it’s a warning 
 sign. And when they get above 1GB, well you you don’t want to know what 
 happens then. 
 
 I tested a data model that I created. Here’s the schema for the table in 
 question:
 
 CREATE TABLE bdn_index_pub (
   tree INT,
   pord INT,
   hpath VARCHAR,
   PRIMARY KEY (tree, pord)
 );
 
 As a test, I inserted 100 million records. tree had the same value for every 
 record, and I had 100 million values for pord. hpath averaged about 50 
 characters in length. My understanding is that all 100 million strings would 
 have been stored in a single row, since they all had the same value for the 
 first component of the primary key. I didn’t look at the size of the table, 
 but it had to be several gigs (uncompressed). Contrary to what Aaron says, I 
 do want to know what happens, because I didn’t experience any issues with 
 this table during my test. Inserting was fast. The last batch of records 
 inserted in approximately the same amount of time as the first batch. 
 Querying the table was fast. What I didn’t do was test the table under load, 
 nor did I try this in a multi-node cluster.
 
 If this is bad, can somebody suggest a better pattern? This table was 
 designed to support a query like this: select hpath from bdn_index_pub where 
 tree = :tree and pord = :start and pord = :end. In my application, most 
 trees will have less than a million records. A handful will have 10’s of 
 millions, and one of them will have 100 million.
 
 If I need to break up my rows, my first instinct would be to divide each tree 
 into blocks of say 10,000 and change tree to a string that contains the tree 
 and the block number. Something like this:
 
 17:0, 0, ‘/’
 …
 17:0, , ’/a/b/c’
 17:1,1, ‘/a/b/d’
 …
 
 I’d then need to issue an extra query for ranges that crossed block 
 boundaries.
 
 Any suggestions on a better pattern?
 
 Thanks
 
 Robert
 
 From: Aaron Morton aa...@thelastpickle.com
 Reply-To: user@cassandra.apache.org
 Date: Tuesday, December 10, 2013 at 12:33 AM
 To: Cassandra User user@cassandra.apache.org
 Subject: Re: Exactly one wide row per node for a given CF?
 
 But this becomes troublesome if I add or remove nodes. What effectively I 
 want is to partition on the unique id of the record modulus N (id % N; 
 where N is the number of nodes).
 This is exactly the problem consistent hashing (used by cassandra) is 
 designed to solve. If you hash the key and modulo the number of nodes, adding 
 and removing nodes requires a lot of data to move. 
 
 I want to be able to randomly distribute a large set of records but keep 
 them clustered in one wide row per node.
 Sounds like you should revisit your data modelling, this is a pretty well 
 known anti pattern. 
 
 When rows get above a few 10’s  of MB things can slow down, when they get 
 above 50 MB they can be a pain, when they get above 100MB it’s a warning 
 sign. And when they get above 1GB, well you you don’t want to know what 
 happens then. 
 
 It’s a bad idea and you should take another look at the data model. If you 
 have to do it, you can try the ByteOrderedPartitioner which uses the row key 
 as a token, given you total control of the row placement. 
 
 Cheers
 
 
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 4/12/2013, at 8:32 pm, Vivek Mishra

Re:

2013-12-11 Thread Aaron Morton

 SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', 
 comparator_type='CompositeType', default_validation_class='UTF8Type', 
 key_validation_class='UTF8Type', column_validation_classes=validators)
  

CompositeType is a type composed of other types, see 

http://pycassa.github.io/pycassa/assorted/composite_types.html?highlight=compositetype

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/12/2013, at 6:15 am, Kumar Ranjan winnerd...@gmail.com wrote:

 Hey Folks,
 
 So I am creating, column family using pycassaShell. See below:
 
 validators = {
 
 'approved':  'BooleanType',  
 
 'text':  'UTF8Type', 
 
 'favorite_count':'IntegerType',  
 
 'retweet_count': 'IntegerType',  
 
 'expanded_url':  'UTF8Type', 
 
 'tuid':  'LongType', 
 
 'screen_name':   'UTF8Type', 
 
 'profile_image': 'UTF8Type', 
 
 'embedly_data':  'CompositeType',
 
 'created_at':'UTF8Type', 
 
 
 }
 
 SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', 
 comparator_type='CompositeType', default_validation_class='UTF8Type', 
 key_validation_class='UTF8Type', column_validation_classes=validators)

 
 I am getting this error:
 
 InvalidRequestException: InvalidRequestException(why='Invalid definition for 
 comparator org.apache.cassandra.db.marshal.CompositeType.'  
 
 
 
 My data will look like this:
 
 'row_key' : { 'tid' :
 
 {
 
 'expanded_url': u'http://instagram.com/p/hwDj2BJeBy/',
 
 'text': '#snowinginNYC Makes me so happy\xe2\x9d\x840brittles0 
 \xe2\x9b\x84 @ Grumman Studios http://t.co/rlOvaYSfKa',
 
 'profile_image': 
 u'https://pbs.twimg.com/profile_images/3262070059/1e82f895559b904945d28cd3ab3947e5_normal.jpeg',
 
 'tuid': 339322611,
 
 'approved': 'true',
 
 'favorite_count': 0,
 
 'screen_name': u'LonaVigi',
 
 'created_at': u'Wed Dec 11 01:10:05 + 2013',
 
 'embedly_data': {u'provider_url': u'http://instagram.com/', 
 u'description': ulonavigi's photo on Instagram, u'title': 
 u'#snwinginNYC Makes me so happy\u2744@0brittles0 \u26c4', u'url': 
 u'http://distilleryimage7.ak.instagram.com/5b880dec61c711e3a50b129314edd3b_8.jpg',
  u'thumbnail_width': 640, u'height': 640, u'width': 640, u'thumbnail_url': 
 u'http://distilleryimage7.ak.instagram.com/b880dec61c711e3a50b1293d14edd3b_8.jpg',
  u'author_name': u'lonavigi', u'version': u'1.0', u'provider_name': 
 u'Instagram', u'type': u'poto', u'thumbnail_height': 640, u'author_url': 
 u'http://instagram.com/lonavigi'},
 
 'tid': 410577192746500096,
 
 'retweet_count': 0
 
 } 
 
 }

Re: Cyclop - CQL3 web based editor

2013-12-11 Thread Aaron Morton

thanks, looks handy. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/12/2013, at 6:16 am, Parth Patil parthpa...@gmail.com wrote:

 Hi Maciej,
 This looks great! Thanks for building this.
 
 
 On Wed, Dec 11, 2013 at 12:45 AM, Murali muralidharan@gmail.com wrote:
 Hi Maciej,
 Thanks for sharing it.
 
 
 
 
 On Wed, Dec 11, 2013 at 2:09 PM, Maciej Miklas mac.mik...@gmail.com wrote:
 Hi all,
 
 This is the Cassandra mailing list, but I've developed something that is 
 strictly related to Cassandra, and some of you might find it useful, so I've 
 decided to send email to this group.
 
 This is web based CQL3 editor. The idea is, to deploy it once and have simple 
 and comfortable CQL3 interface over web - without need to install anything.
 
 The editor itself supports code completion, not only based on CQL syntax, but 
 also based database content - so for example the select statement will 
 suggest tables from active keyspace, or in where closure only columns from 
 table provided after select from
 
 The results are displayed in reversed table - rows horizontally and columns 
 vertically. It seems to be more natural for column oriented database.
  
 You can also export query results to CSV, or add query as browser bookmark.
  
 The whole application is based on wicket + bootstrap + spring and can be 
 deployed in any web 3.0 container.
  
 Here is the project (open source): https://github.com/maciejmiklas/cyclop
 
 
 Have a fun!
 Maciej
 
 
 
 -- 
 Thanks,
 Murali
 99025-5
 
 
 
 
 -- 
 Best,
 Parth

Re: CLUSTERING ORDER CQL3

2013-12-11 Thread Aaron Morton

You need to specify all the clustering key components in the CLUSTERING ORDER 
BY clause 

create table demo(oid int,cid int,ts timeuuid,PRIMARY KEY (oid,cid,ts)) WITH 
CLUSTERING ORDER BY (cid ASC, ts DESC);

cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/12/2013, at 10:44 am, Shrikar archak shrika...@gmail.com wrote:

 Hi All,
 
 My Usecase
 
 I want query result by ordered by timestamp DESC. But I don't want timestamp 
 to be the second column in the primary key as that will take of my querying 
 capability
 
 for example
 
 
 create table demo(oid int,cid int,ts timeuuid,PRIMARY KEY (oid,cid,ts)) WITH 
 CLUSTERING ORDER BY (ts DESC);
 
 Queries required:
 
 
 I want the result for all the below queries to be in DESC order of timestamp
 
 select * from demo where oid = 100;
 select * from demo where oid = 100 and cid = 10;
 select * from demo where oid = 100 and cid = 100 and ts  
 minTimeuuid('something');
 
 I am trying to create this table with CLUSTERING ORDER IN CQL and getting 
 this error
 
 
 cqlsh:viralheat create table demo(oid int,cid int,ts timeuuid,PRIMARY KEY 
 (oid,cid,ts)) WITH CLUSTERING ORDER BY (ts desc);
 Bad Request: Missing CLUSTERING ORDER for column cid
 
 In this document it mentions that we can have multple keys for cluster 
 ordering. any one know how to do that?
 
 Go here Datastax doc
 
 
 
 If I make the timestamp the second column then I cant have queries likes 
 
 
 select * from demo where oid = 100 and cid = 100 and ts  
 minTimeuuid('something');
 
 Thanks,
 
 Shrikar

Re: Bulkoutputformat

2013-12-11 Thread Aaron Morton

If you don’t need to use Hadoop then try the SSTableSimpleWriter and 
sstableloader , this post is a little old but still relevant 
http://www.datastax.com/dev/blog/bulk-loading

Otherwise AFAIK BulkOutputFormat is what you want from hadoop 
http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration

Cheers
 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/12/2013, at 11:27 am, varun allampalli vshoori.off...@gmail.com wrote:

 Hi All,
 
 I want to bulk insert data into cassandra. I was wondering of using 
 BulkOutputformat in hadoop. Is it the best way or using driver and doing 
 batch insert is the better way. 
 
 Are there any disandvantages of using bulkoutputformat. 
 
 Thanks for helping
 
 Varun

Re: efficient way to store 8-bit or 16-bit value?

2013-12-11 Thread Aaron Morton

 What do people recommend I do to store a small binary value in a column? I’d 
 rather not simply use a 32-bit int for a single byte value. 
blob is a byte array
or you could use the varint, a variable length integer, but you probably want 
the blob. 

cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/12/2013, at 1:33 pm, Andrey Ilinykh ailin...@gmail.com wrote:

 Column metadata is about 20 bytes. So, there is no big difference if you save 
 1 or 4 bytes.
 
 Thank you,
   Andrey
 
 
 On Wed, Dec 11, 2013 at 2:42 PM, onlinespending onlinespend...@gmail.com 
 wrote:
 What do people recommend I do to store a small binary value in a column? I’d 
 rather not simply use a 32-bit int for a single byte value. Can I have a one 
 byte blob? Or should I store it as a single character ASCII string? I imagine 
 each is going to have the overhead of storing the length (or null termination 
 in the case of a string). That overhead may be worse than simply using a 
 32-bit int.
 
 Also is it possible to partition on a single character or substring of 
 characters from a string (or a portion of a blob)? Something like:
 
 CREATE TABLE test (
 id text,
 value blob,
 PRIMARY KEY (string[0:1])
 )

Re: Write performance with 1.2.12

2013-12-11 Thread Aaron Morton

It is the write latency, read latency is ok. Interestingly the latency is low
when there is one node. When I join other nodes the latency drops about 1/3.
To be specific, when I start sending traffic to the other nodes the latency
for all the nodes increases, if I stop traffic to other nodes the latency
drops again, I checked, this is not node specific it happens to any node.
Is this the local write latency or the cluster wide write request latency ?

What sort of numbers are you seeing ?

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/12/2013, at 3:39 pm, srmore comom...@gmail.com wrote:

Thanks Aaron

On Wed, Dec 11, 2013 at 8:15 PM, Aaron Morton aa...@thelastpickle.com wrote:
Changed memtable_total_space_in_mb to 1024 still no luck.

Reducing memtable_total_space_in_mb will increase the frequency of flushing
to disk, which will create more for compaction to do and result in increased
IO.

You should return it to the default.

You are right, had to revert it back to default.

when I send traffic to one node its performance is 2x more than when I send
traffic to all the nodes.

What are you measuring, request latency or local read/write latency ?

If it’s write latency it’s probably GC, if it’s read is probably IO or data
model.

I don't see any GC activity in logs. Tried to control the compaction by
reducing the number of threads, did not help much.

Hope that helps.

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 7/12/2013, at 8:05 am, srmore comom...@gmail.com wrote:

Changed memtable_total_space_in_mb to 1024 still no luck.

On Fri, Dec 6, 2013 at 11:05 AM, Vicky Kak vicky@gmail.com wrote:
Can you set the memtable_total_space_in_mb value, it is defaulting to 1/3
which is 8/3 ~ 2.6 gb in capacity
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management

The flushing of 2.6 gb to the disk might slow the performance if frequently
called, may be you have lots of write operations going on.

On Fri, Dec 6, 2013 at 10:06 PM, srmore comom...@gmail.com wrote:

On Fri, Dec 6, 2013 at 9:59 AM, Vicky Kak vicky@gmail.com wrote:
You have passed the JVM configurations and not the cassandra configurations
which is in cassandra.yaml.

Apologies, was tuning JVM and that's what was in my mind.
Here are the cassandra settings http://pastebin.com/uN42GgYT

The spikes are not that significant in our case and we are running the
cluster with 1.7 gb heap.

Are these spikes causing any issue at your end?

There are no big spikes, the overall performance seems to be about 40% low.

On Fri, Dec 6, 2013 at 9:10 PM, srmore comom...@gmail.com wrote:

On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:
Hard to say much without knowing about the cassandra configurations.

The cassandra configuration is
-Xms8G
-Xmx8G
-Xmn800m
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=4
-XX:MaxTenuringThreshold=2
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

Yes compactions/GC's could skipe the CPU, I had similar behavior with my
setup.

Were you able to get around it ?

-VK

On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:
We have a 3 node cluster running cassandra 1.2.12, they are pretty big
machines 64G ram with 16 cores, cassandra heap is 8G.

The interesting observation is that, when I send traffic to one node its
performance is 2x more than when I send traffic to all the nodes. We ran
1.0.11 on the same box and we observed a slight dip but not half as seen
with 1.2.12. In both the cases we were writing with LOCAL_QUORUM. Changing
CL to ONE make a slight improvement but not much.

The read_Repair_chance is 0.1. We see some compactions running.

following is my iostat -x output, sda is the ssd (for commit log) and sdb is
the spinner.

avg-cpu: %user %nice %system %iowait %steal %idle
66.460.008.950.010.00 24.58

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 0.0027.60 0.00 4.40 0.00 256.0058.18
0.012.55 1.32 0.58
sda1 0.00 0.00 0.00 0.00 0.00 0.00

Re: user / password authentication advice

2013-12-11 Thread Aaron Morton

Not sure if you are asking about the authentication  authorisation in 
cassandra or how to implemented the same using cassandra. 

info on the cassandra authentication and authorisation is here 
http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/security/securityTOC.html

Hope that helps. 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/12/2013, at 4:31 pm, onlinespending onlinespend...@gmail.com wrote:

 Hi,
 
 I’m using Cassandra in an environment where many users can login to use an 
 application I’m developing. I’m curious if anyone has any advice or links to 
 documentation / blogs where it discusses common implementations or best 
 practices for user and password authentication. My cursory search online 
 didn’t bring much up on the subject. I suppose the information needn’t even 
 be specific to Cassandra.
 
 I imagine a few basic steps will be as follows:
 
 user types in username (e.g. email address) and password
 this is verified against a table storing username and passwords (encrypted in 
 some way)
 a token is return to the app / web browser to allow further transactions 
 using secure token (e.g. cookie)
 
 Obviously I’m only scratching the surface and it’s the detail and best 
 practices of implementing this user / password authentication that I’m 
 curious about.
 
 Thank you,
 Ben

Re: Repair hangs - Cassandra 1.2.10

2013-12-09 Thread Aaron Morton

 I changed logging to debug level, but still nothing is logged. 
 Again - any help will be appreciated. 
There is nothing at the ERROR level on any machine ?

check nodetool compactionstats to see if a validation compaction is running, 
the repair may be waiting on this. 

check nodetool netstats to see if streams are being exchanged, then check the 
logs on those machines. 

cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 4/12/2013, at 10:24 pm, Tamar Rosen ta...@correlor.com wrote:

 Update - I am still experiencing the above issues, but not all the time. I 
 was able to run repair (on this keyspace) from node 2 and from node 4, but 
 now a different keyspace hangs on these nodes, and I am still not able to run 
 repair on node 1. It seems random. I changed logging to debug level, but 
 still nothing is logged. 
 Again - any help will be appreciated. 
 
 Tamar
 
 
 On Mon, Dec 2, 2013 at 11:53 AM, Tamar Rosen ta...@correlor.com wrote:
 Hi,
 
 On AWS, we had a 2 node cluster with RF 2. 
 We added 2 more nodes, then changed RF to 3 on all our keyspaces. 
 Next step was to run nodetool repair, node by node. 
 (In the meantime, we found that we must use  CL quorum, which is affecting 
 our application's performance).
 Started with node 1, which is one of the old nodes.
 Ran:
 nodetool repair -pr
 
 It seemed to be progressing fine, running keyspace by keyspace, for about an 
 hour, but then it hung. The last messages in the output are:
  
 [2013-12-01 11:18:24,577] Repair command #4 finished
 [2013-12-01 11:18:24,594] Starting repair command #5, repairing 230 ranges 
 for keyspace correlor_customer_766
 
 It stayed like this for almost 24 hours. Then we read about the possibility 
 of this being related to not upgrading sstables, so we killed the process. We 
 were not sure whether we had run upgrade sstables (we upgraded from 1.2.4 a 
 couple of months ago)  
 
 So:
 Ran upgradesstables on a specific table in the keyspace that repair got stuck 
 on. (this was fast)
 nodetool upgradesstables correlor_customer_766 users
 Ran repair on that same table. 
 nodetool repair correlor_customer_766 users -pr
 
 This is again hanging. 
 The first and only output from this process is:
 [2013-12-02 08:22:41,221] Starting repair command #6, repairing 230 ranges 
 for keyspace correlor_customer_766
 
 Nothing else happened for more than an hour. 
 
 Any help and advice will be greatly appreciated.
 
 Tamar Rosen
 
 correlor.com

Re: Murmur Long.MIN_VALUE token allowed?

2013-12-09 Thread Aaron Morton

AFAIK any value that is a valid output from murmor3 is a valid token. 

The Murmur3Partitioner set’s min and max to long min and max…

public static final LongToken MINIMUM = new LongToken(Long.MIN_VALUE);
public static final long MAXIMUM = Long.MAX_VALUE;

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 5/12/2013, at 12:38 am, horschi hors...@gmail.com wrote:

 Hi,
 
 I just realized that I can move a node to Long.MIN_VALUE:
 
 127.0.0.1  rack1   Up Normal  1011.58 KB  100.00% 
 -9223372036854775808 
 
 Is that really a valid token for Murmur3Partitioner ?
 
 I thought that Long.MIN_VALUE (like -1 for Random) is not a regular token. 
 Shouldn't be only used for token-range-scans?
 
 kind regards,
 Christian

Re: Exactly one wide row per node for a given CF?

2013-12-09 Thread Aaron Morton

 But this becomes troublesome if I add or remove nodes. What effectively I 
 want is to partition on the unique id of the record modulus N (id % N; where 
 N is the number of nodes).
This is exactly the problem consistent hashing (used by cassandra) is designed 
to solve. If you hash the key and modulo the number of nodes, adding and 
removing nodes requires a lot of data to move. 

 I want to be able to randomly distribute a large set of records but keep them 
 clustered in one wide row per node.
Sounds like you should revisit your data modelling, this is a pretty well known 
anti pattern. 

When rows get above a few 10’s  of MB things can slow down, when they get above 
50 MB they can be a pain, when they get above 100MB it’s a warning sign. And 
when they get above 1GB, well you you don’t want to know what happens then. 

It’s a bad idea and you should take another look at the data model. If you have 
to do it, you can try the ByteOrderedPartitioner which uses the row key as a 
token, given you total control of the row placement. 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 4/12/2013, at 8:32 pm, Vivek Mishra mishra.v...@gmail.com wrote:

 So Basically you want to create a cluster of multiple unique keys, but data 
 which belongs to one unique should be colocated. correct?
 
 -Vivek
 
 
 On Tue, Dec 3, 2013 at 10:39 AM, onlinespending onlinespend...@gmail.com 
 wrote:
 Subject says it all. I want to be able to randomly distribute a large set of 
 records but keep them clustered in one wide row per node.
 
 As an example, lets say I’ve got a collection of about 1 million records each 
 with a unique id. If I just go ahead and set the primary key (and therefore 
 the partition key) as the unique id, I’ll get very good random distribution 
 across my server cluster. However, each record will be its own row. I’d like 
 to have each record belong to one large wide row (per server node) so I can 
 have them sorted or clustered on some other column.
 
 If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 
 at the time of creation and have the partition key set to this value. But 
 this becomes troublesome if I add or remove nodes. What effectively I want is 
 to partition on the unique id of the record modulus N (id % N; where N is the 
 number of nodes).
 
 I have to imagine there’s a mechanism in Cassandra to simply randomize the 
 partitioning without even using a key (and then clustering on some column).
 
 Thanks for any help.

Re: Exactly one wide row per node for a given CF?

2013-12-09 Thread Aaron Morton

 Basically this desire all stems from wanting efficient use of memory. 
Do you have any real latency numbers you are trying to tune ? 

Otherwise this sounds a little like premature optimisation.

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 5/12/2013, at 6:16 am, onlinespending onlinespend...@gmail.com wrote:

 Pretty much yes. Although I think it’d be nice if Cassandra handled such a 
 case, I’ve resigned to the fact that it cannot at the moment. The workaround 
 will be to partition on the LSB portion of the id (giving 256 rows spread 
 amongst my nodes) which allows room for scaling, and then cluster each row on 
 geohash or something else.
 
 Basically this desire all stems from wanting efficient use of memory. 
 Frequently accessed keys’ values are kept in RAM through the OS page cache. 
 But the page size is 4KB. This is a problem if you are accessing several 
 small records of data (say 200 bytes), since each record only occupies a 
 small % of a page. This is why it’s important to increase the probability 
 that neighboring data on the disk is relevant. Worst case would be to read in 
 a full 4KB page into RAM, of which you’re only accessing one record that’s a 
 couple hundred bytes. All of the other unused data of the page is wastefully 
 occupying RAM. Now project this problem to a collection of millions of small 
 records all indiscriminately and randomly scattered on the disk, and you can 
 easily see how inefficient your memory usage will become.
 
 That’s why it’s best to cluster data in some meaningful way, all in an effort 
 to increasing the probability that when one record is accessed in that 4KB 
 block that its neighboring records will also be accessed. This brings me back 
 to the question of this thread. I want to randomly distribute the data 
 amongst the nodes to avoid hot spotting, but within each node I want to 
 cluster the data meaningfully such that the probability that neighboring data 
 is relevant is increased.
 
 An example of this would be having a huge collection of small records that 
 store basic user information. If you partition on the unique user id, then 
 you’ll get nice random distribution but with no ability to cluster (each 
 record would occupy its own row). You could partition on say geographical 
 region, but then you’ll end up with hot spotting when one region is more 
 active than another. So ideally you’d like to randomly assign a node to each 
 record to increase parallelism, but then cluster all records on a node by say 
 geohash since it is more likely (however small that may be) that when one 
 user from a geographical region is accessed other users from the same region 
 will also need to be accessed. It’s certainly better than having some random 
 user record next to the one you are accessing at the moment.
 
 
 
 
 On Dec 3, 2013, at 11:32 PM, Vivek Mishra mishra.v...@gmail.com wrote:
 
 So Basically you want to create a cluster of multiple unique keys, but data 
 which belongs to one unique should be colocated. correct?
 
 -Vivek
 
 
 On Tue, Dec 3, 2013 at 10:39 AM, onlinespending onlinespend...@gmail.com 
 wrote:
 Subject says it all. I want to be able to randomly distribute a large set of 
 records but keep them clustered in one wide row per node.
 
 As an example, lets say I’ve got a collection of about 1 million records 
 each with a unique id. If I just go ahead and set the primary key (and 
 therefore the partition key) as the unique id, I’ll get very good random 
 distribution across my server cluster. However, each record will be its own 
 row. I’d like to have each record belong to one large wide row (per server 
 node) so I can have them sorted or clustered on some other column.
 
 If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 
 5 at the time of creation and have the partition key set to this value. But 
 this becomes troublesome if I add or remove nodes. What effectively I want 
 is to partition on the unique id of the record modulus N (id % N; where N is 
 the number of nodes).
 
 I have to imagine there’s a mechanism in Cassandra to simply randomize the 
 partitioning without even using a key (and then clustering on some column).
 
 Thanks for any help.

Re: Raid Issue on EC2 Datastax ami, 1.2.11

2013-12-09 Thread Aaron Morton

Thanks for the update Philip, other people have reported high await on a single 
volume previously but I don’t think it’s been blamed on noisy neighbours. It’s 
interesting that you can have noisy neighbours for IO only.

Out of interest was there much steal reported in top or iostat ? 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 6/12/2013, at 4:42 am, Philippe Dupont pdup...@teads.tv wrote:

 Hi again,
 
 I have much more in formations on this case :
 
 We did further investigations on the nodes affected and did find some await 
 problems on one of the 4 disk in raid:
 http://imageshack.com/a/img824/2391/s7q3.jpg
 
 Here was the iostat of the node :
 http://imageshack.us/a/img7/7282/qq3w.png
 
 You can see that the write and read throughput are exactly the same on the 4 
 disks of the instance. So the raid0 looks good enough. Yet, the global await, 
 r_await and w_await are 3 to 5 times bigger on xvde disk than in other disks.
 
 We reported this to amazon support, and there is their answer :
  Hello,
 
 I deeply apologize for any inconvenience this has been causing you and thank 
 you for the additional information and screenshots.
 
 Using the instance you based your iostat on (i-), I have looked 
 into the underlying hardware it is currently using and I can see it appears 
 to have a noisy neighbor leading to the higher await time on that 
 particular device.  Since most AWS services are multi-tenant, situations can 
 arise where one customer's resource has the potential to impact the 
 performance of a different customer's resource that reside on the same 
 underlying hardware (a noisy neighbor).  While these occurrences are rare, 
 they are nonetheless inconvenient and I am very sorry for any impact it has 
 created.
 
 I have also looked into the initial instance referred to when the case was 
 created (i-xxx) and cannot see any existing issues (neighboring or 
 otherwise) as to any I/O performance impacts; however, at the time the case 
 was created, evidence on our end suggests there was a noisy neighbor then as 
 well.  Can you verify if you are still experiencing above average await 
 times on this instance?
 
 If you would like to mitigate the impact of encountering noisy neighbors, 
 you can look into our Dedicated Instance option; Dedicated Instances launch 
 on hardware dedicated to only a single customer (though this can feasibly 
 lead to a situation where a customer is their own noisy neighbor).  However, 
 this is an option available only to instances that are being launched into a 
 VPC and may require modification of the architecture of your use-case.  I 
 understand the instances belonging to your cluster in question have been 
 launched into EC2-Classic, I just wanted to bring this your attention as a 
 possible solution.  You can read more about Dedicated Instances here:
 http://aws.amazon.com/dedicated-instances/
 
 Again, I am very sorry for the performance impact you have been experiencing 
 due to having noisy neighbors.  We understand the frustration and are always 
 actively working to increase capacity so the effects of noisy neighbors is 
 lessened.  I hope this information has been useful and if you have any 
 additional questions whatsoever, please do not hesitate to ask! 
 
 To conclude, the only other solution to avoid VPC and Reserved Instance is to 
 replace this instance by a new one, hoping to not having other Noisy 
 neighbors...
 I hope that will help someone.
 
 Philippe
 
 
 2013/11/28 Philippe DUPONT pdup...@teads.tv
 Hi,
 
 We have a Cassandra cluster of 28 nodes. Each one is an EC2 m1.xLarge based 
 on datastax AMI with 4 storage in raid0 mode.
 
 Here is the ticket we opened with amazon support :
 
 This raid is created using the datastax public AMI : ami-b2212dc6. Sources 
 are also available here : https://github.com/riptano/ComboAMI
 
 As you can see in the screenshot attached 
 (http://imageshack.com/a/img854/4592/xbqc.jpg)  randomly but frequently one 
 of the storage get fully used (100%) but 3 others are standing in low use.
 
 Because of this, the node becomes slow and the whole cassandra cluster is 
 impacted. We are losing data due to writes fails and availability for our 
 customers.
 
 it was in this state for one hour, and we decided to restart it.
 
 We already removed 3 other instances because of this same issue.
 (see other screenshots)
 http://imageshack.com/a/img824/2391/s7q3.jpg
 http://imageshack.com/a/img10/556/zzk8.jpg
 
 Amazon support took a close look at the instance as well as it's underlying 
 hardware for any potential health issues and both seem to be healthy.
 
 Have someone already experienced something like this ?
 
 Should I contact the AMI author better?
 
 Thanks a lot,
 
 Philippe.

Re: Unable to run hadoop_cql3_word_count examples

2013-12-08 Thread Aaron Morton

 InvalidRequestException(why:consistency level LOCAL_ONE not compatible with 
 replication strategy (org.apache.cassandra.locator.SimpleStrategy)) at 
 
 
The LOCAL_ONE consistency level can only be used with the 
NetworkTopologyStrategy. 

I had a quick look and the code does not use LOCAL_ONE, did you make a change?

Cheers
 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 3/12/2013, at 10:03 pm, Parth Patil parthpa...@gmail.com wrote:

 Hi,
 I am new to Cassandra and I am exploring the Hadoop integration (MapReduce) 
 provided by Cassandra.
 
 I am trying to run the hadoop examples provided in the cassandra's repo under 
 examples/hadoop_cql3_word_count. I am using the cassandra-2.0 branch. I have 
 a single node cassandra running locally. I was able to run the 
 ./bin/word_count_setup step successfully but when I run the ./bin/word_count 
 step I am getting the following error :
 
 java.lang.RuntimeException at 
 org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:661)
  at 
 org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.(CqlPagingRecordReader.java:297)
  at 
 org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:163)
  at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 
 Caused by: InvalidRequestException(why:consistency level LOCAL_ONE not 
 compatible with replication strategy 
 (org.apache.cassandra.locator.SimpleStrategy)) at 
 org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:52627)
  at 
 org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:52604)
  at 
 org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:52519)
  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1785)
  at 
 org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1770)
  at 
 org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:631)
  ... 6 more
 
 Has anyone seen this before ? Am I missing something ?
 
 
 -- 
 Best,
 Parth

Re: Commitlog replay makes dropped and recreated keyspace and column family rows reappear

2013-12-08 Thread Aaron Morton

Do you have the logs from after the restart ? 
Did it include a Drop Keyspace…” INFO level message ? 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 4/12/2013, at 2:44 am, Desimpel, Ignace ignace.desim...@nuance.com wrote:

 Hi,
  
 I have the impression that there is an issue with dropping a keyspace and 
 then recreating the keyspace (and column families), combined with a restart 
 of the database
  
 My test goes as follows:
  
 Create keyspace K and column families C.
 Insert rows X0 column family  C0
 Query for X0 : found rows : OK
 Drop keyspace K
 Query for X0 : found no rows : OK
  
 Create keyspace K and column families C.
 Insert rows X1 column family  C1
 Query for X0 : not found : OK
 Query for X1 : found : OK
  
 Stop the Cassandra database
 Start the Cassandra database
 Query for X1 : found : OK
 Query for X0 : found : NOT OK !
  
 Did someone tested this scenario?
  
 Using : CASSANDRA VERSION 2.02, thrift, java 1.7.x, centos
  
 Ignace Desimpel

Re: CQL workaround for modifying a primary key

2013-12-08 Thread Aaron Morton

I just tested this with 1.2.9 and DROP TABLE took a snapshot and moved the 
existing files out of the dir. 

Do you have some more steps to reproduce ? 

Cheers
A
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 4/12/2013, at 11:23 am, Ike Walker ike.wal...@flite.com wrote:

 What is the best practice for modifying the primary key definition of a table 
 in Cassandra 1.2.9?
 
 Say I have this table:
 
 CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
 );
 
 I want to add a new column named version and include that column in the 
 primary key.
 
 CQL will let me add the column, but you can't change the primary key for an 
 existing table.
 
 So I drop the table and recreate it:
 
 DROP TABLE temperature;
 
 CREATE TABLE temperature (
weatherstation_id text,
version int,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,version,event_time)
 );
 
 But then I start getting errors like this:
 
 java.io.FileNotFoundException: 
 /var/lib/cassandra/data/test/temperature/test-temperature-ic-8316-Data.db (No 
 such file or directory)
 
 So I guess the drop table doesn't actually delete the data, and I end up with 
 a problem like this:
 
 https://issues.apache.org/jira/browse/CASSANDRA-4857
 
 What's a good workaround for this, assuming I don;t want to change the name 
 of my table? Should I just truncate the table, then drop it and recreate it?
 
 Thanks.
 
 -Ike Walker

Re: While inserting data into Cassandra using Hector client

2013-11-28 Thread Aaron Morton

Hector is designed to use Column Families created via the thrift interface, 
e.g. using cassandra-cli 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 25/11/2013, at 8:51 pm, Santosh Shet santosh.s...@vista-one-solutions.com 
wrote:

 Hi,
  
 I am getting below error while inserting data into Cassandra using Hector 
 client.
  
 me.prettyprint.hector.api.exceptions.HInvalidRequestException: 
 InvalidRequestException(why:Not enough bytes to read value of component 0)
  
 I am facing this problem after upgrading Cassandra from 1.2.3 to version 
 2.0.2. Earlier I was able to insert data using same code.
  
 Below are the scripts used to create keyspace and table.
  
 CREATE KEYSPACE demo_one
  WITH REPLICATION = {'class' : 'SimpleStrategy', 
 'replication_factor': 1};
  
 CREATE TABLE investmentvehicle(key text PRIMARY KEY);
  
 Could you provide some inputs to troubleshoot this issue.
  
 Thanks,
 Santosh Shet
 Software Engineer | VistaOne Solutions
 Direct India : +91 80 30273829 | Mobile India : +91 8105720582
 Skype : santushet

Re: Multiple writers writing to a cassandra node...

2013-11-28 Thread Aaron Morton

I am a newbie to the Cassandra world. I would like to know if its possible 
 for two different nodes to write to a single Cassandra node
 
Yes. 

  Currently, I am getting a IllegalRequestException, what (): Default 
 TException on the first system, 
 
 
What is the full error stack ? 


 Occasionally, also hitting frame size has negative value thrift exception 
 when the traffic is high and packets are getting stored very fast.
 
 
On the client or the server ? Can you post the full error stack ? 

 Currently using Cassandra 2.0.0 with libQtCassandra library.
 
 
Please upgrade to 2.0.3. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 26/11/2013, at 4:42 am, Krishna Chaitanya bnsk1990r...@gmail.com wrote:

 Hello,
I am a newbie to the Cassandra world. I would like to know if its possible 
 for two different nodes to write to a single Cassandra node. I have a packet 
 collector software which runs in two different systems. I would like both of 
 them to write the packets to a single node(same keyspace and columnfamily). 
 Currently using Cassandra 2.0.0 with libQtCassandra library.
  Currently, I am getting a 
 IllegalRequestException, what (): Default TException on the first system,  
 the moment I try to store from the second system, but the second system works 
 fine. When I restart the program on the first system, the second system gets 
 the exception and the first one works fine. Occasionally, also hitting frame 
 size has negative value thrift exception when the traffic is high and 
 packets are getting stored very fast.
   Can someone please point out what I am doing wrong?  Thanks in advance..

Re: Nodetool cleanup

2013-11-28 Thread Aaron Morton

 I hope I get this right :) 
Thanks for contributing :)

 a repair will trigger a mayor compaction on your node which will take up a 
 lot of CPU and IO performance. It needs to do this to build up the data 
 structure that is used for the repair. After the compaction this is streamed 
 to the different nodes in order to repair them. 
It does not trigger a major compaction, that’s what we call running compaction 
on the command line and compacting all SSTables into one big one. 

it will flush all the data to disk that will create some additional compaction. 

The major concern is that s a disk IO intensive operation, it reads all the 
data and writes data to new SSTables (a one to one mapping). If you have all 
nodes doing this at the same time there may be some degraded performance. And 
as it’s all nodes it’s not possible for the Dynamic Snitch to avoid nodes if 
they are overloaded.

Cleanup is less intensive than repair, but it’s still a good idea to stagger 
it. If you need to run it on all machines (or you have very powerful machines) 
it’s probably going to be OK. 
 
Hope that helps. 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 26/11/2013, at 5:14 am, Artur Kronenberg artur.kronenb...@openmarket.com 
wrote:

 Hi Julien,
 
 I hope I get this right :) 
 
 a repair will trigger a mayor compaction on your node which will take up a 
 lot of CPU and IO performance. It needs to do this to build up the data 
 structure that is used for the repair. After the compaction this is streamed 
 to the different nodes in order to repair them. 
 
 If you trigger this on every node simultaneously you basically take the 
 performance away from your cluster. I would expect cassandra still to 
 function, just way slower then before. Triggering it node after node will 
 leave your cluster with more resources to handle incoming requests. 
 
 
 Cheers,
 
 Artur 
 On 25/11/13 15:12, Julien Campan wrote:
 Hi,
 
 I'm working with Cassandra 1.2.2 and I have a question about nodetool 
 cleanup. 
 In the documentation , it's writted  Wait for cleanup to complete on one 
 node before doing the next
 
 I would like to know, why we can't perform a lot of cleanup in a same time ? 
 
 
 Thanks

Re: Intermittent connection error

2013-11-26 Thread Aaron Morton

 The inability to truncate is actually my bigger problem. If I could truncate 
 tables, then I wouldn't have to create so many sessions, and the frequency of 
 this error would be at tolerable levels.
Can you truncate through cqlsh ? 

 Running this program occasionally produces the following output:
Looks like a node is getting evicted from the pool, try turning the logging 
level up to DEBUG see if it says anything. 

For DS driver specific questions you may have better luck using the mail list 
here https://github.com/datastax/java-driver

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 22/11/2013, at 9:16 am, Robert Wille rwi...@fold3.com wrote:

 Sure:
 
 package com.footnote.tools.cassandra;
 
 import com.datastax.driver.core.Cluster;
 import com.datastax.driver.core.Cluster.Builder;
 import com.datastax.driver.core.Session;
 
 public class Test
 {
   public static void main(String[] args)
   {
   try
   {
   Builder builder = Cluster.builder();
   
   Cluster c = 
 builder.addContactPoint(cas121.devf3.com).withPort(9042).build();
   
   Session s = c.connect(rwille);
   
   s.execute(select rhpath from browse_document_tree);
   
   s.shutdown();
   }
   catch (Exception e)
   {
   e.printStackTrace();
   }
 
   System.exit(0);
   }
 }
 
 Running this program occasionally produces the following output:
 
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/Users/rwille/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/Users/rwille/workspace_fold3/dev-backend/extern/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 - Cannot find LZ4 class, you should make sure the LZ4 library is in the 
 classpath if you intend to use it. LZ4 compression will not be available for 
 the protocol.
 - [Control connection] Cannot connect to any host, scheduling retry
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
 tried for query failed (no host was tried)
   at 
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64)
   at 
 com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
   at 
 com.datastax.driver.core.Session$Manager.setKeyspace(Session.java:461)
   at com.datastax.driver.core.Cluster.connect(Cluster.java:178)
   at com.footnote.tools.cassandra.Test.main(Test.java:17)
 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s) tried for query failed (no host was tried)
   at 
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:96)
   at com.datastax.driver.core.Session$Manager.execute(Session.java:513)
   at 
 com.datastax.driver.core.Session$Manager.executeQuery(Session.java:549)
   at 
 com.datastax.driver.core.Session$Manager.setKeyspace(Session.java:455)
   ... 2 more
 
 It isn't very often that it fails. I had to run it about 20 times before it 
 got an error. However, because I cannot truncate, I have resorted to dropping 
 and recreating my schema for every unit test. I often have a random test case 
 fail with this same error. 
 
 The inability to truncate is actually my bigger problem. If I could truncate 
 tables, then I wouldn't have to create so many sessions, and the frequency of 
 this error would be at tolerable levels.
 
 Thanks in advance.
 
 Robert
 
 From: Turi, Ferenc (GE Power  Water, Non-GE) ferenc.t...@ge.com
 Reply-To: user@cassandra.apache.org
 Date: Thursday, November 21, 2013 12:26 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: RE: Intermittent connection error
 
 Hi,
  
 Please attach the source to have deeper look at it.
  
 Ferenc
  
 From: Robert Wille [mailto:rwi...@fold3.com] 
 Sent: Thursday, November 21, 2013 7:11 PM
 To: user@cassandra.apache.org
 Subject: Intermittent connection error
  
 I intermittently get the following error when I try to execute my first query 
 after connecting:
  
 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
 host(s) tried for query failed (no host was tried)
 at 
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64)
 at 
 com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException

Re: 1.1.11: system keyspace is filling up

2013-11-26 Thread Aaron Morton

 What happens if they are not being successfully delivered ? Will they 
 eventually TTL-out ?
They have a TTL set to the gc_grace_seconds on the CF at the time of the write. 

I’ve also seen hints build up in multi DC systems due to timeouts on the 
coordinator.  i.e. the remote nodes are up, co-ordinator starts the writes, 
remote nodes process the request (no dropped messages), but the response is 
lost. These are tracked as timeouts on the MessagingServiceMBean. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 22/11/2013, at 6:00 pm, Rahul Menon ra...@apigee.com wrote:

 Oleg, 
 
 The system keyspace is not replicated it is local to the node. You should 
 check your logs to see if there are Timeouts from streaming hints, i believe 
 the default value to stream hints it 10 seconds. When i ran into this problem 
 i truncated hints to clear out the space and then ran a repair so ensure that 
 all the data was consistant across all nodes, even if there was a failure. 
 
 -rm 
 
 
 On Tue, Nov 5, 2013 at 6:29 PM, Oleg Dulin oleg.du...@gmail.com wrote:
 What happens if they are not being successfully delivered ? Will they 
 eventually TTL-out ?
 
 
 
 Also, do I need to truncate hints on every node or is it replicated ?
 
 
 
 Oleg
 
 
 
 On 2013-11-04 21:34:55 +, Robert Coli said:
 
 
 
 On Mon, Nov 4, 2013 at 11:34 AM, Oleg Dulin oleg.du...@gmail.com wrote:
 
 I have a dual DC setup, 4 nodes, RF=4 in each.
 
 
 
 The one that is used as primary has its system keyspace fill up with 200 gigs 
 of data, majority of which is hints.
 
 
 
 Why does this happen ?
 
 
 
 How can I clean it up ?
 
 
 
 If you have this many hints, you probably have flapping / frequent network 
 partition, or very overloaded nodes. If you compare the number of hints to 
 the number of dropped messages, that would be informative. If you're hinting 
 because you're dropping, increase capacity. If you're hinting because of 
 partition, figure out why there's so much partition.
 
 
 
 WRT cleaning up hints, they will automatically be cleaned up eventually, as 
 long as they are successfully being delivered. If you need to manually clean 
 them up you can truncate system.hints keyspace.
 
 
 
 =Rob
 
  
 
 
 
 
 
 
 -- 
 
 Regards,
 
 Oleg Dulin
 
 http://www.olegdulin.com

Re: Large system.Migration CF after upgrade to 1.1

2013-11-26 Thread Aaron Morton

 We have noticed that a cluster we upgraded to 1.1.6 (from 1.0.*) still has a 
 single large (~4GB) row in system.Migrations on each cluster node. 
There is some code in there to drop that CF at startup, but I’m not sure on the 
requirements for it to run. if the time stamps have not been updated in a while 
copy them out of the way and restart. 

 We are also seeing heap pressure / Full GC issues when we do schema updates 
 to this cluster
How much memory does the machine have and how is the JVM configured ? 

On pre 1.1 that is often a result of memory pressure from the bloom filters and 
compression meta data being on the JVM heap. Do you have a lot (i.e.  
500Million ) rows per node ? 

Check how small CMS can get the heap, it may be the case that it just cannot 
reduce it further. 

As a work around you can: increase the heap, increase bloom_filter_fp_chance 
(per cf) and index_interval (yaml). My talk called “In case of emergency break 
glass” at the summit in SF this year talks about this 
http://thelastpickle.com/speaking/2013/06/11/Speaking-Cassandra-Summit-SF-2013.html

Long term moving to 1.2 will help. 

Hope that helps. 

 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 23/11/2013, at 10:35 am, Andrew Cooper andrew.coo...@nisc.coop wrote:

 We have noticed that a cluster we upgraded to 1.1.6 (from 1.0.*) still has a 
 single large (~4GB) row in system.Migrations on each cluster node.  We are 
 also seeing heap pressure / Full GC issues when we do schema updates to this 
 cluster.  If the two are related, is it possible to somehow remove/truncate 
 the system.Migrations CF?  If I understand correctly, version 1.1 no longer 
 uses this CF, instead using the system.schema_* CF's.   We have multiple 
 clusters and clusters which were built from scratch at version 1.1 or 1.2 do 
 no have data in system.Migrations.
 
 I would appreciate any advice and I can provide more details if needed.  
 
 -Andrew
 
 Andrew Cooper
 National Information Solutions Cooperative®
 3201 Nygren Drive NW
 Mandan, ND 58554
 + e-mail: andrew.coo...@nisc.coop
 ( phone: 866.999.6472 ext 6824
 ( direct: 701-667-6824

Re: How to set Cassandra config directory path

2013-11-25 Thread Aaron Morton

 I noticed when I gave the path directly to cassandra.yaml, it works fine. 
 Can't I give the directory path here, as mentioned in the doc?
Documentation is wrong, the -Dcassandra.config param is used for the path of 
the yaml file not the config directory. 

I’ve emailed d...@datastax.com to let them know. 

 What I really want to do is to give the cassandra-topology.properties path to 
 Cassandra.
Set the CASSANDRA_CONF env var in cassandra-in.sh


Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 6:15 am, Bhathiya Jayasekara tobhathi...@gmail.com wrote:

 Hi all,
 
 I'm trying to set conf directory path to Cassandra. According to [1], I can 
 set it using a system variable as cassandra.config=directory 
 
 But it doesn't seem to work for me when I give conf directory path. I get 
 following exception.
 
 [2013-11-20 22:24:38,273] ERROR 
 {org.apache.cassandra.config.DatabaseDescriptor} -  Fatal configuration error
 org.apache.cassandra.exceptions.ConfigurationException: Cannot locate 
 /home/bhathiya/cassandra/conf/etc
   at 
 org.apache.cassandra.config.DatabaseDescriptor.getStorageConfigURL(DatabaseDescriptor.java:117)
   at 
 org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:134)
   at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:126)
   at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:216)
   at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:446)
   at 
 org.wso2.carbon.cassandra.server.CassandraServerController$1.run(CassandraServerController.java:48)
   at java.lang.Thread.run(Thread.java:662)
 Cannot locate /home/bhathiya/cassandra/conf/etc
 Fatal configuration error; unable to start server.  See log for stacktrace.
 
 I noticed when I gave the path directly to cassandra.yaml, it works fine. 
 Can't I give the directory path here, as mentioned in the doc?
 
 What I really want to do is to give the cassandra-topology.properties path to 
 Cassandra.
 
 [1] 
 http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/tools/toolsCUtility_t.html
 
 
 Thanks,
 Bhathiya

Re: Cannot TRUNCATE

2013-11-25 Thread Aaron Morton

If it’s just a test system nuke it and try again :)

Was there more than one node at any time ? Does nodetool status show only one 
node ? 

Cheers
 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 7:45 am, Robert Wille rwi...@fold3.com wrote:

 I've got a single node with all empty tables, and truncate fails with the 
 following error: Unable to complete request: one or more nodes were 
 unavailable.
 
 Everything else seems fine. I can insert, update, delete, etc.
 
 The only thing in the logs that looks relevant is this:
 
 INFO [HANDSHAKE-/192.168.98.121] 2013-11-20 11:36:59,064 
 OutboundTcpConnection.java (line 386) Handshaking version with /192.168.98.121
 INFO [HANDSHAKE-/192.168.98.121] 2013-11-20 11:37:04,064 
 OutboundTcpConnection.java (line 395) Cannot handshake version with 
 /192.168.98.121
 
 I'm running Cassandra 2.0.2. I get the same error in cqlsh as I do with the 
 java driver.
 
 Thanks
 
 Robert

Re: Config changes to leverage new hardware

2013-11-25 Thread Aaron Morton

 However, for both writes and reads there was virtually no difference in the 
 latencies.
What sort of latency were you getting ? 

 I’m still not very sure where the current *write* bottleneck is though. 
What numbers are you getting ? 
Could the bottle neck be the client ? Can it send writes fast enough to 
saturate the nodes ?

As a rule of thumb you should get 3,000 to 4,000 (non counter) writes per 
second per core. 

 Sample iostat data (captured every 10s) for the dedicated disk where commit 
 logs are written is below. Does this seem like a bottle neck?
Does not look too bad. 

 Another interesting thing is that the linux disk cache doesn’t seem to be 
 growing in spite of a lot of free memory available. 
Things will only get paged in when they are accessed. 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 12:42 pm, Arindam Barua aba...@247-inc.com wrote:

  
 Thanks for the suggestions Aaron.
  
 As a follow up, we ran a bunch of tests with different combinations of these 
 changes on a 2-node ring. The load was generated using cassandra-stress, run 
 with default values to write 30 million rows, and read them back.
 However, for both writes and reads there was virtually no difference in the 
 latencies.
  
 The different combinations attempted:
 1.   Baseline test with none of the below changes.
 2.   Grabbing the TLAB setting from 1.2
 3.   Moving the commit logs too to the 7 disk RAID 0.
 4.   Increasing the concurrent_read to 32, and concurrent_write to 64
 5.   (3) + (4), i.e. moving commit logs to the RAID + increasing 
 concurrent_read and concurrent_write config to 32 and 64.
  
 The write latencies were very similar, except them being ~3x worse for the 
 99.9th percentile and above for scenario (5) above.
 The read latencies were also similar, with (3) and (5) being a little worse 
 for the 99.99th percentile.
  
 Overall, not making any changes, i.e. (1) performed as well or slightly 
 better than any of the other changes.
  
 Running cassandra-stress on both the old and new hardware without making any 
 config changes, the write performance was very similar, but the new hardware 
 did show ~10x improvement in the read for the 99.9th percentile and higher. 
 After thinking about this, the reason why we were not seeing any difference 
 with our test framework was perhaps the nature of the test where we write the 
 rows, and then do a bunch of reads to read the rows that were just written 
 immediately following. The data is read back from the memtables, and never 
 from the disk/sstables. Hence the new hardware’s increased RAM and size of 
 the disk cache or higher number of disks never helps.
  
 I’m still not very sure where the current *write* bottleneck is though. The 
 new hardware has 32 cores vs 8 cores of the old hardware. Moving the commit 
 log from a dedicated disk to a 7 RAID-0 disk system (where it would be shared 
 by other data though) didn’t make a difference too. (unless the extra 
 contention on the RAID nullified the positive effects of the RAID).
  
 Sample iostat data (captured every 10s) for the dedicated disk where commit 
 logs are written is below. Does this seem like a bottle neck? When the commit 
 logs are written the await/svctm ratio is high.
  
 Device: rrqm/s   wrqm/s   r/s   w/srMB/swMB/s avgrq-sz 
 avgqu-sz   await  svctm  %util
0.00 8.09  0.04  8.85 0.00 0.0715.74 0.00  
   0.12   0.03   0.02
0.00   768.03  0.00  9.49 0.00 3.04   655.41 0.04  
   4.52   0.33   0.31
0.00 8.10  0.04  8.85 0.00 0.0715.75 0.00  
   0.12   0.03   0.02
0.00   752.65  0.00 10.09 0.00 2.98   604.75 0.03  
   3.00   0.26   0.26
  
 Another interesting thing is that the linux disk cache doesn’t seem to be 
 growing in spite of a lot of free memory available. The total disk cache used 
 reported by ‘free’ is less than the size of the sstables written with over 
 100 GB unused RAM.
 Even in production, where we have the older hardware running with 32 GB RAM 
 for a long time now, looking at 5 hosts in 1 DC, only 2.5 GB to 8 GB was used 
 for the disk cache. The Cassandra java process uses the 8 GB allocated to it, 
 and at least 10-15 GB on all the hosts is not used at all.
  
 Thanks,
 Arindam
  
 From: Aaron Morton [mailto:aa...@thelastpickle.com] 
 Sent: Wednesday, November 06, 2013 8:34 PM
 To: Cassandra User
 Subject: Re: Config changes to leverage new hardware
  
 Running Cassandra 1.1.5 currently, but evaluating to upgrade to 1.2.11 soon.
 You will make more use of the extra memory moving to 1.2 as it moves bloom 
 filters and compression data off heap. 
  
 Also grab the TLAB setting from cassandra-env.sh in v1.2
  
 As of now, our performance tests (our application specific as well

Re: Is there any open source software for automatized deploy C* in PRD?

2013-11-25 Thread Aaron Morton

 Thanks, But I suppose it’s just for Debian? Am I right?
There are debian and rpm packages, and people deploy them or the binary 
packages with with chef and similar tools. 

It may be easier to answer your question if you describe the specific platform 
/ needs. 

cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 10:35 pm, Boole.Z.Guo (mis.cnsh04.Newegg) 41442 
boole.z@newegg.com wrote:

 Thanks, But I suppose it’s just for Debian? Am I right?
 Any others?
  
 Best Regards,
 Boole Guo
 Software Engineer, NESC-SH.MIS
 +86-021-51530666*41442
 Floor 19, KaiKai Plaza, 888, Wanhangdu Rd, Shanghai (200042)
  
 发件人: Mike Adamson [mailto:mikeat...@gmail.com] 
 发送时间: 2013年11月21日 17:16
 收件人: user@cassandra.apache.org
 主题: Re: Is there any open source software for automatized deploy C* in PRD?
  
 Hi Boole,
 
 Have you tried chef? There is this cookbook for deploying cassandra:
 
 http://community.opscode.com/cookbooks/cassandra
 
 MikeA
  
 
 On 21 November 2013 01:33, Boole.Z.Guo (mis.cnsh04.Newegg) 41442 
 boole.z@newegg.com wrote:
 Hi all,
 Is there any open source software for automatized deploy C* in PRD?
  
 Best Regards,
 Boole Guo
 Software Engineer, NESC-SH.MIS
 +86-021-51530666*41442
 Floor 19, KaiKai Plaza, 888, Wanhangdu Rd, Shanghai (200042)
 ONCE YOU KNOW, YOU NEWEGG.
 
 CONFIDENTIALITY NOTICE: This email and any files transmitted with it may 
 contain privileged or otherwise confidential information. It is intended only 
 for the person or persons to whom it is addressed. If you received this 
 message in error, you are not authorized to read, print, retain, copy, 
 disclose, disseminate, distribute, or use this message any part thereof or 
 any information contained therein. Please notify the sender immediately and 
 delete all copies of this message. Thank you in advance for your cooperation.
 保密注意：此邮件及其附随文件可能包含了保密信息。该邮件的目的是发送给指定收件人。如果您非指定收件人而错误地收到了本邮件，您将无权阅读、打印、保存、复制、泄露、传播、分发或使用此邮件全部或部分内容或者邮件中包含的任何信息。请立即通知发件人，并删除该邮件。感谢您的配合！

Re: Migration Cassandra 2.0 to Cassandra 2.0.2

2013-11-25 Thread Aaron Morton

 Mr Coli What's the difference between deploy binaries and the binary package ?
 I upload the binary package on the Apache Cassandra Homepage, Am I wrong ?
Yes you can use the instructions here for the binary package 
http://wiki.apache.org/cassandra/DebianPackaging

When you use the binary package it creates the directory locations, installs 
the init scripts and makes it a lot easier to start and stop cassandra. I  
recommend using them. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 11:06 pm, Bonnet Jonathan. 
jonathan.bon...@externe.bnpparibas.com wrote:

 Thanks Mr Coli and Mr Wee for your answears,
 
 Mr Coli What's the difference between deploy binaries and the binary package ?
 I upload the binary package on the Apache Cassandra Homepage, Am I wrong ?
 
 Mr Wee i think you hit the right way, cause my lib directory in my
 Cassandra_Home are different between the two versions. In the Home for the
 old version /produits/cassandra/install_cassandra/apache-cassandra-2.0.0/lib
 i have:
 
 [cassandra@s00vl9925761 lib]$ ls -ltr
 total 14564
 -rw-r- 1 cassandra cassandra  123898 Aug 28 15:07 thrift-server-0.3.0.jar
 -rw-r- 1 cassandra cassandra   42854 Aug 28 15:07
 thrift-python-internal-only-0.7.0.zip
 -rw-r- 1 cassandra cassandra   55066 Aug 28 15:07 snaptree-0.1.jar
 -rw-r- 1 cassandra cassandra 1251514 Aug 28 15:07 snappy-java-1.0.5.jar
 -rw-r- 1 cassandra cassandra  270552 Aug 28 15:07 snakeyaml-1.11.jar
 -rw-r- 1 cassandra cassandra8819 Aug 28 15:07 slf4j-log4j12-1.7.2.jar
 -rw-r- 1 cassandra cassandra   26083 Aug 28 15:07 slf4j-api-1.7.2.jar
 -rw-r- 1 cassandra cassandra  134133 Aug 28 15:07
 servlet-api-2.5-20081211.jar
 -rw-r- 1 cassandra cassandra 1128961 Aug 28 15:07 netty-3.5.9.Final.jar
 -rw-r- 1 cassandra cassandra   80800 Aug 28 15:07 metrics-core-2.0.3.jar
 -rw-r- 1 cassandra cassandra  134748 Aug 28 15:07 lz4-1.1.0.jar
 -rw-r- 1 cassandra cassandra  481534 Aug 28 15:07 log4j-1.2.16.jar
 -rw-r- 1 cassandra cassandra  347531 Aug 28 15:07 libthrift-0.9.0.jar
 -rw-r- 1 cassandra cassandra   16046 Aug 28 15:07 json-simple-1.1.jar
 -rw-r- 1 cassandra cassandra   91183 Aug 28 15:07 jline-1.0.jar
 -rw-r- 1 cassandra cassandra   17750 Aug 28 15:07 jbcrypt-0.3m.jar
 -rw-r- 1 cassandra cassandra5792 Aug 28 15:07 jamm-0.2.5.jar
 -rw-r- 1 cassandra cassandra  765648 Aug 28 15:07
 jackson-mapper-asl-1.9.2.jar
 -rw-r- 1 cassandra cassandra  228286 Aug 28 15:07 
 jackson-core-asl-1.9.2.jar
 -rw-r- 1 cassandra cassandra   96046 Aug 28 15:07 high-scale-lib-1.1.2.jar
 -rw-r- 1 cassandra cassandra 1891110 Aug 28 15:07 guava-13.0.1.jar
 -rw-r- 1 cassandra cassandra   66843 Aug 28 15:07 disruptor-3.0.1.jar
 -rw-r- 1 cassandra cassandra   91982 Aug 28 15:07
 cql-internal-only-1.4.0.zip
 -rw-r- 1 cassandra cassandra   54345 Aug 28 15:07
 concurrentlinkedhashmap-lru-1.3.jar
 -rw-r- 1 cassandra cassandra   25490 Aug 28 15:07 compress-lzf-0.8.4.jar
 -rw-r- 1 cassandra cassandra  284220 Aug 28 15:07 commons-lang-2.6.jar
 -rw-r- 1 cassandra cassandra   30085 Aug 28 15:07 commons-codec-1.2.jar
 -rw-r- 1 cassandra cassandra   36174 Aug 28 15:07 commons-cli-1.1.jar
 -rw-r- 1 cassandra cassandra 1695790 Aug 28 15:07
 apache-cassandra-thrift-2.0.0.jar
 -rw-r- 1 cassandra cassandra   71117 Aug 28 15:07
 apache-cassandra-clientutil-2.0.0.jar
 -rw-r- 1 cassandra cassandra 3265185 Aug 28 15:07 
 apache-cassandra-2.0.0.jar
 -rw-r- 1 cassandra cassandra 1928009 Aug 28 15:07 antlr-3.2.jar
 drwxr-x--- 2 cassandra cassandra4096 Oct  1 14:16 licenses
 
 In my new home i have
 /produits/cassandra/install_cassandra/apache-cassandra-2.0.2/lib:
 
 [cassandra@s00vl9925761 lib]$ ls -ltr
 total 9956
 -rw-r- 1 cassandra cassandra  123920 Oct 24 09:21 thrift-server-0.3.2.jar
 -rw-r- 1 cassandra cassandra   52477 Oct 24 09:21
 thrift-python-internal-only-0.9.1.zip
 -rw-r- 1 cassandra cassandra   55066 Oct 24 09:21 snaptree-0.1.jar
 -rw-r- 1 cassandra cassandra 1251514 Oct 24 09:21 snappy-java-1.0.5.jar
 -rw-r- 1 cassandra cassandra  270552 Oct 24 09:21 snakeyaml-1.11.jar
 -rw-r- 1 cassandra cassandra   26083 Oct 24 09:21 slf4j-api-1.7.2.jar
 -rw-r- 1 cassandra cassandra   22291 Oct 24 09:21 
 reporter-config-2.1.0.jar
 -rw-r- 1 cassandra cassandra 1206119 Oct 24 09:21 netty-3.6.6.Final.jar
 -rw-r- 1 cassandra cassandra   82123 Oct 24 09:21 metrics-core-2.2.0.jar
 -rw-r- 1 cassandra cassandra  165505 Oct 24 09:21 lz4-1.2.0.jar
 -rw-r- 1 cassandra cassandra  217054 Oct 24 09:21 libthrift-0.9.1.jar
 -rw-r- 1 cassandra cassandra   16046 Oct 24 09:21 json-simple-1.1.jar
 -rw-r- 1 cassandra cassandra   91183 Oct 24 09:21 jline-1.0.jar
 -rw-r- 1 cassandra cassandra   17750 Oct 24 09:21 jbcrypt-0.3m.jar
 -rwxrwxrwx 1 cassandra

Re: Error: Unable to search across multiple secondary index types

2013-11-20 Thread Aaron Morton

 java.lang.RuntimeException: java.lang.RuntimeException: Unable to search 
 across multiple secondary index types
A query that used two secondary indexed columns would require query plan to 
determine the most efficient approach. We don’t support features like that. 

 I would expect an empty response, but instead I get Request did not complete 
 within rpc_timeout.” info on cqlsh interface and there is an error in 
 cassandra logs:
That sounds like a bug, you should have gotten an error. 

Could you raise a bug on https://issues.apache.org/jira/browse/CASSANDRA

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 15/11/2013, at 10:22 pm, sielski siel...@man.poznan.pl wrote:

 Hello,
 
 I’ve installed Cassandra 2.0.2 and I’m trying to query a cassandra table 
 using a SELECT statement with two WHERE clauses on columns with secondary 
 indexes but Cassandra throws an error as in the subject.
 It’s easy to reproduce this problem. My table structure is as follows:
 CREATE TABLE test (c1 VARCHAR, c2 VARCHAR, c3 VARCHAR, PRIMARY KEY (c1, c2);
 CREATE INDEX test_i1 ON test (c2);
 CREATE INDEX test_i2 ON test (c3); 
 
 Then I execute a simple query on an empty table:
 SELECT * FROM test WHERE c2='whatever' AND c3 ='whatever' ALLOW FILTERING;
 
 I would expect an empty response, but instead I get Request did not complete 
 within rpc_timeout.” info on cqlsh interface and there is an error in 
 cassandra logs:
 ERROR 09:57:36,394 Exception in thread Thread[ReadStage:35,5,main]
 java.lang.RuntimeException: java.lang.RuntimeException: Unable to search 
 across multiple secondary index types
   at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: Unable to search across multiple 
 secondary index types
   at 
 org.apache.cassandra.db.index.SecondaryIndexManager.search(SecondaryIndexManager.java:535)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.search(ColumnFamilyStore.java:1649)
   at 
 org.apache.cassandra.db.RangeSliceCommand.executeLocally(RangeSliceCommand.java:135)
   at 
 org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1414)
   at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1927)
 
 Is it a bug or there is a reason why I cannot execute such a query on this 
 model? I saw an issue https://issues.apache.org/jira/browse/CASSANDRA-5851 
 which is similar to mine but it’s marked as resolved in 2.0.0 and I’m using 
 the most recent version.
 
 — 
 Regards,
 Krzysztof Sielski

Re: DESIGN QUESTION: Need to update only older data in cassandra

2013-11-20 Thread Aaron Morton

  The problems occurs during the day where updates can be sent that possibly 
 contain older data then the nightly batch update. 
If you have a an application level sequence for updates (I used that term to 
avoid saying timestamp) you could use it as the cassandra timestamp. As long as 
you know it increases it’s fine. You can specify the timestamp for a column via 
either thrift or cql3. 

When the updates come in during the day if they have the older time stamp just 
send the write and it will be ignored. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 17/11/2013, at 8:45 am, Lawrence Turcotte lawrence.turco...@gmail.com 
wrote:

 that is, data consists of of an account id with a timestamp column that 
 indicates when the account was updated. This is not to be confused with row 
 insertion/update times tamp maintained by Cassandra for conflict resolution 
 within the Cassanda Nodes. Furthermore the account has about 200 columns and 
 updates occur nightly in batch mode where roughly 300-400 million updates are 
 sent. The problems occurs during the day where updates can be sent that 
 possibly contain older data then the nightly batch update. As such the 
 requirement to first look at the account update time stamp in the database 
 and comparing the proposed update time stamp to determine whether to update 
 or not.
 
 The idea here is that a read before update in Cassandra is generally not a 
 good idea. To alleviate this problem I was thinking of either maintaining a 
 separate Cassandra db with only two columns of account id and update time 
 stamp and using this as a look up before updating or setting a stored 
 procedure within the main database to do the read and update if the data 
 within the database is older.
 
 UPDATE Account SET some columns WHERE lastUpdateTimeStamp  
 proposedUpdateTimeStamp.
 
 I am kind of leaning towards the separate database or keys pace as a simple 
 look up to determine whether to update the data in the main Cassandra 
 database, that is the database that contain the 200 columns of account data. 
 If this is the best choice then I would like to explore the pros and cons of 
 creating a separate Cassandra Node cluster for look up of account update time 
 stamps vs just adding another key space within the main Cassandra database in 
 terms of performance implications. In this account and time stamp only 
 database I would need to also update the time stamp when the main database 
 would be updated.
 
 Any thoughts are welcome
 
 Lawrence

Re: Disaster recovery question

2013-11-20 Thread Aaron Morton

 The first particular test we tried 
What as the disk_failure_policy  setting ? 

 1) There were NO errors in the log on the node where we removed the commit 
 log SSD drive - this surprised us (of course our ops monitoring would detect 
 the downed disk too, but we hope to be able to look for ERROR level logging 
 in system.log to cause alerts also)
Can you reproduce this without needing to physically pull the drive ? 
Obviously there should be an error or warning there. Even if the 
disk_failure_policy says to ignore it should still log. 

 2) The node with no commit log disk just kept writing to memtables, but:
 3) This was causing major CMS GC issues which eventually caused the node to 
 appear down (nodetool status) to all other nodes, and indeed it itself saw 
 all other nodes as down. That said dynamic snitch and latency detection in 
 clients seemed to prevent this being much of a problem other than it seems 
 potentially undesirable from a server side standpoint.
The commit log has a queue that is 1024 * num processes long. If the write 
thread can get into this queue it will proceed (when using periodic commit 
log), so if there was no error I would expect writes to work for a little. But 
eventually this queue will get full and the write threads will not be able to 
proceed. The queue for the Mutation stage is essentially unbounded, so while 
the other nodes are sending writes it will continue to fill up. Leading to the 
CMS issues. 

Seeing nodes as down is a side effect of JVM GC preventing the Gossip threads 
from running frequently enough. 
 
  that said maybe someone knows off the top of their head if there is a 
 config setting that would start failing writes (due to memtable size) before 
 GC became an issue, and we just have this misconfigured.
Nope. 
Cassandra does not have an explicit back pressure mechanism. The best we have 
is the dynamic snitch and the gossip to eventually mark a node as down. 

 5) I guess the question is what is the best way to bring up a failed node 
  a) delete all data first? 
  b) clear data but restore from previous sstable from backup to miminise 
 subsequent data transfer
  c) other suggestions
It depends on the failure. In your example I would have brought it back either 
with or without the commit log, or with the commit log except the most recently 
modified file. There is protection in the commit log reply to only reply 
mutations that match the crc check. When it was back online I would run a 
repair (without -pr) to repair all the data on the node. 

I’m not sure the level DB error has to do with the commit log reply. 


 6) Our experience is that taking nodes down that have problems, then 
 deleting data (subsets if we can see partial corruption) and re-adding is 
 much safer (but our cluster is VERY fast). 
You should not need to do this, what sort of corruptions ? 


Hope that helps. 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 17/11/2013, at 3:56 pm, graham sanderson gra...@vast.com wrote:

 agreed; that was a parallel issue from our ops (I apologize and will try to 
 avoid duplicates) - I was asking the question from the architecture side as 
 to what should happen rather than describing it as a bug. Nonetheless, I/We 
 are still curious if anyone has an answer.
 
 On Nov 16, 2013, at 6:13 PM, Mikhail Stepura mikhail.step...@outlook.com 
 wrote:
 
 Looks like someone has the same (1-4) questions:
 https://issues.apache.org/jira/browse/CASSANDRA-6364
 
 -M
 
 graham sanderson  wrote in message 
 news:7161e7e0-cf24-4b30-b9ca-2faafb0c4...@vast.com...
 
 We are currently looking to deploy on the 2.0 line of cassandra, but 
 obviously are watching for bugs (we are currently on 2.0.2) - we are aware 
 of a couple of interesting known bugs to be fixed in 2.0.3 and one in 2.1, 
 but none have been observed (in production use cases) or are likely to 
 affect our current proposed deployment.
 
 I have a few general questions:
 
 The first particular test we tried was to physically remove the SSD commit 
 drive for one of the nodes whilst under HEAVY write load (maybe a few 
 hundred MB/s of data to be replicated 3 times - 6 node single local data 
 center) and also while running read performance tests.. We currently have 
 both node (CQL3) and Astyanax (Thrift) clients.
 
 Frankly everything was pretty good (no read/write failures or indeed 
 (observed) latency issues) except, and maybe people can comment on any of 
 these:
 
 1) There were NO errors in the log on the node where we removed the commit 
 log SSD drive - this surprised us (of course our ops monitoring would detect 
 the downed disk too, but we hope to be able to look for ERROR level logging 
 in system.log to cause alerts also)
 2) The node with no commit log disk just kept writing to memtables, but:
 3) This was causing major CMS GC issues which eventually caused the node

Re: Nodes not added to existing cluster

2013-11-20 Thread Aaron Morton

 - broadcast_address is set to the instance's public address
You only need this if you have a multi region setup. 

  I’ve gisted the results here: 
 https://gist.github.com/skyebook/be5ee75a000a1e6d65d0

This error

TRACE [HANDSHAKE-/NODE_1_PUBLIC_IP] 2013-11-18 06:57:13,984 
OutboundTcpConnection.java (line 393) Cannot handshake version with 
/NODE_1_PUBLIC_IP
java.nio.channels.AsynchronousCloseException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:402)
at 
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.io.InputStream.read(InputStream.java:101)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.cassandra.net.OutboundTcpConnection$1.run(OutboundTcpConnection.java:387)

Is preventing the node from reading the version and results in this line being 
printed ( -2147483648 is the no version flag)

 OutboundTcpConnection.java (line 333) Target max version is -2147483648; no 
 version information yet, will retry

 
Not really sure why that exception is being thrown, the help does not make it 
clear 
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/AsynchronousCloseException.html

Check the networking. 

Hope that helps. 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 18/11/2013, at 8:36 pm, Skye Book skye.b...@gmail.com wrote:

 Hi there,
 
 I’m bringing this thread back as its something that I thought was solved and 
 is apparently not fixed on my end.
 
 To recap, I’m having trouble getting a node to join a cluster.  Configuration 
 seems all right using the EC2MultiRegionSnitch but new nodes are unable to 
 handshake with seeds.
 
 - Security Group has 22  1024-65535 open
 - Nodes are configured with password authentication using CassandraAuthorizer
 - internode_authenticator is commented out in configuration
 - rpc_address is set to the instance’s private address
 - listen_address is set to the instance’s private address
 - broadcast_address is set to the instance's public address
 
 As was suggested earlier, I’ve enabled TRACE logging for 
 OutboundTcpConnection and get the following dumped into system.log when the 
 new node is started up without itself in the seed list (if its own IP is in 
 the list it just creates a new single node cluster).  I’ve gisted the results 
 here: https://gist.github.com/skyebook/be5ee75a000a1e6d65d0
 
 It looks like the handshake process completely and utterly fails as it seems 
 unable to get any information from the other nodes as evidenced by:
 OutboundTcpConnection.java (line 386) Handshaking version with 
 /NODE_1_PUBLIC_IP
 OutboundTcpConnection.java (line 386) Handshaking version with 
 /NODE_2_PUBLIC_IP
 OutboundTcpConnection.java (line 333) Target max version is -2147483648; no 
 version information yet, will retry
 
 Thanks in advance for any light you all might be able to shed on what’s going 
 on.
 
 On Sep 26, 2013, at 9:03 PM, Aaron Morton aa...@thelastpickle.com wrote:
 
  INFO 05:03:49,015 Cannot handshake version with /aa.bb.cc.dd
  INFO 05:03:49,017 Handshaking version with /aa.bb.cc.dd
 If you can turn up logging to TRACE for 
 org.apache.cassandra.net.OutboundTcpConnection it will include the full 
 error. 
 
 The two addresses that it is unable to handshake with are the other two 
 addresses of nodes in the cluster I'm unable to join.
 Are you mixing versions ? 
 
 
 Cheers
 
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 26/09/2013, at 5:13 PM, Skye Book skye.b...@gmail.com wrote:
 
 Hi Aaron, thanks for the clarification.
 
 As might be expected, having the broadcast_address fixed hasn't fixed 
 anything.  What I did find after writing my last email is that output.log 
 is littered with these:
 
  INFO 05:03:49,015 Cannot handshake version with /aa.bb.cc.dd
  INFO 05:03:49,017 Handshaking version with /aa.bb.cc.dd
  INFO 05:03:49,803 Cannot handshake version with /ww.xx.yy.zz
  INFO 05:03:49,805 Handshaking version with /ww.xx.yy.zz
 
 The two addresses that it is unable to handshake with are the other two 
 addresses of nodes in the cluster I'm unable to join.  I started thinking 
 that maybe EC2 was having an-advertised problem communicating between AZ's 
 but bringing up nodes in both of the other availability zones resulted in 
 the same wrong behavior.
 
 I've gist'd my cassandra.yaml, its pretty standard and hasn't caused an 
 issue in the past for me.  
 https://gist.github.com/skyebook/ec9364cdcec02e803ffc
 
 Skye Book
 http://skyebook.net -- @sbook

Re: Read inconsistency after backup and restore to different cluster

2013-11-19 Thread Aaron Morton

 we then take the snapshot archive generated FROM cluster-A_node1 and 
 copy/extract/restore TO cluster-B_node1,  then we 
sounds correct.

 Depending on what additional comments/recommendation you or another member of 
 the list may have (if any) based on the clarification I've made above,

Also if you backup the system data it will bring along the tokens. This can be 
a pain if you want to change the cluster name. 

cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 15/11/2013, at 10:44 am, David Laube d...@stormpath.com wrote:

 Thank you for the detailed reply Rob!  I have replied to your comments 
 in-line below;
 
 On Nov 14, 2013, at 1:15 PM, Robert Coli rc...@eventbrite.com wrote:
 
 On Thu, Nov 14, 2013 at 12:37 PM, David Laube d...@stormpath.com wrote:
 It is almost as if the data only exists on some of the nodes, or perhaps the 
 token ranges are dramatically different --again, we are using vnodes so I am 
 not exactly sure how this plays into the equation.
 
 The token ranges are dramatically different, due to vnode random token 
 selection from not setting initial_token, and setting num_tokens.
 
 You can verify this by listing the tokens per physical node in nodetool 
 gossipinfo or (iirc) nodetool status.
  
 5. Copy 1 of the 5 snapshot archives from cluster-A to each of the five 
 nodes in the new cluster-B ring.
 
 I don't understand this at all, do you mean that you are using one source 
 node's data to load each of of the target nodes? Or are you just saying 
 there's a 1:1 relationship between source snapshots and target nodes to load 
 into? Unless you have RF=N, using one source for 5 target nodes won't work.
 
 We have configured RF=3 for the keyspace in question. Also, from a client 
 perspective, we read with CL=1 and write with CL=QUORUM. Since we have 5 
 nodes total in cluster-A, we snapshot keyspace_name on each of the five nodes 
 which results in a snapshot directory on each of the five nodes that we 
 archive and ship off to s3. We then take the snapshot archive generated FROM 
 cluster-A_node1 and copy/extract/restore TO cluster-B_node1,  then we take 
 the snapshot archive FROM cluster-A_node2 and copy/extract/restore TO 
 cluster-B_node2 and so on and so forth.
 
 
 To do what I think you're attempting to do, you have basically two options.
 
 1) don't use vnodes and do a 1:1 copy of snapshots
 2) use vnodes and
a) get a list of tokens per node from the source cluster
b) put a comma delimited list of these in initial_token in cassandra.yaml 
 on target nodes
c) probably have to un-set num_tokens (this part is unclear to me, you 
 will have to test..)
d) set auto_bootstrap:false in cassandra.yaml
e) start target nodes, they will not-bootstrap into the same ranges as 
 the source cluster
f) load schema / copy data into datadir (being careful of 
 https://issues.apache.org/jira/browse/CASSANDRA-6245)
g) restart node or use nodetool refresh (I'd probably restart the node to 
 avoid the bulk rename that refresh does) to pick up sstables
h) remove auto_bootstrap:false from cassandra.yaml

 I *believe* this *should* work, but have never tried it as I do not 
 currently run with vnodes. It should work because it basically makes 
 implicit vnode tokens explicit in the conf file. If it *does* work, I'd 
 greatly appreciate you sharing details of your experience with the list. 
 
 I'll start with parsing out the token ranges that our vnode config ends up 
 assigning in cluster-A, and doing some creative config work on the target 
 cluster-B we are trying to restore to as you have suggested. Depending on 
 what additional comments/recommendation you or another member of the list may 
 have (if any) based on the clarification I've made above, I will absolutely 
 report back my findings here.
 
 
 
 General reference on tasks of this nature (does not consider vnodes, but 
 treat vnodes as just a lot of physical nodes and it is mostly relevant) : 
 http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra
 
 =Rob

Re: making sense of output from Eclipse Memory Analyzer tool taken from .hprof file

2013-11-19 Thread Aaron Morton

What version of cassandra are you using ?
What are the JVM settings? (check with ps aux | grep cassandra)


OOM in cassandra 1.2+ is rare but there is also 
https://issues.apache.org/jira/browse/CASSANDRA-5706 and 
https://issues.apache.org/jira/browse/CASSANDRA-6087

 One instance of org.apache.cassandra.db.ColumnFamilyStore loaded by 
 sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8 occupies 984,094,664 
 (11.64%) bytes.
938MB is a bit of memory, the CFS and data tracker are dealing with the 
memtable. This may indicate things are not being flushed from memory correctly. 

 •java.lang.Thread @ 0x73e1f74c8 CompactionExecutor:158 - 839,225,000 (9.92%) 
 bytes.
 •java.lang.Thread @ 0x717f08178 MutationStage:31 - 809,909,192 (9.58%) bytes.
 •java.lang.Thread @ 0x717f082c8 MutationStage:5 - 649,667,472 (7.68%) bytes.
 •java.lang.Thread @ 0x717f083a8 MutationStage:21 - 498,081,544 (5.89%) bytes.
 •java.lang.Thread @ 0x71b357e70 MutationStage:11 - 444,931,288 (5.26%) bytes.
maybe very big rows and/or very big mutations. 

hope that helps. 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 15/11/2013, at 12:34 pm, Mike Koh defmike...@gmail.com wrote:

 I am investigating Java Out of memory heap errors. So I created an .hprof 
 file and loaded it into Eclipse Memory Analyzer Tool which gave some Problem 
 Suspects.
 
 First one looks like:
 
 One instance of org.apache.cassandra.db.ColumnFamilyStore loaded by 
 sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8 occupies 984,094,664 
 (11.64%) bytes. The memory is accumulated in one instance of 
 org.apache.cassandra.db.DataTracker$View loaded by 
 sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8.
 
 
 If I click around into the verbiage, I believe I can pick out the name of a 
 column family but that is about it. Can someone explain what the above means 
 in more detail and if it is indicative of a problem?
 
 
 Next one looks like:
 -
 •java.lang.Thread @ 0x73e1f74c8 CompactionExecutor:158 - 839,225,000 (9.92%) 
 bytes.
 •java.lang.Thread @ 0x717f08178 MutationStage:31 - 809,909,192 (9.58%) bytes.
 •java.lang.Thread @ 0x717f082c8 MutationStage:5 - 649,667,472 (7.68%) bytes.
 •java.lang.Thread @ 0x717f083a8 MutationStage:21 - 498,081,544 (5.89%) bytes.
 •java.lang.Thread @ 0x71b357e70 MutationStage:11 - 444,931,288 (5.26%) bytes.
 --
 If I click into the verbiage, they above Compaction and Mutations all seem to 
 be referencing the same column family. Are the above related? Is there a way 
 I can tell more exactly what is being compacted and/or mutated more 
 specifically than which column family?

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2979 matches

Mail list logo