Re: Cassandra API Library.

2012-08-27 Thread Paolo Bernardi

On 08/23/2012 01:40 PM, Thomas Spengler wrote:

4) pelops (Thrift,Java)


I've been using Pelops for quite some time with pretty good results; it 
felt much cleaner than Hector.


Paolo

--
@bernarpa
http://paolobernardi.wordpress.com



Re: Secondary index partially created

2012-08-27 Thread aaron morton
If you are still having problems can you post the query and the output from 
nodetool cfstats on one of the nodes that fails ? 

cfstats will tell us if the secondary index was built. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2012, at 6:02 AM, Roshni Rajagopal roshni.rajago...@wal-mart.com 
wrote:

 What does List my_column_family in CLI show on all the nodes?
 Perhaps the syntax u're using isn't correct?  You should be getting the
 same data on all the nodes irrespective of which node's CLI you use.
 The replication factor is for redundancy to have copies of the data on
 different nodes to help if nodes go down. Even if you had a replication
 factor of 1 you should still get the same data on all nodes.
 
 
 
 On 24/08/12 11:05 PM, Richard Crowley r...@rcrowley.org wrote:
 
 On Thu, Aug 23, 2012 at 6:54 PM, Richard Crowley r...@rcrowley.org wrote:
 I have a three-node cluster running Cassandra 1.0.10.  In this cluster
 is a keyspace with RF=3.  I *updated* a column family via Astyanax to
 add a column definition with an index on that column.  Then I ran a
 backfill to populate the column in every row.  Then I tried to query
 the index from Java and it failed but so did cassandra-cli:
 
get my_column_family where my_column = 'my_value';
 
 Two out of the three nodes are unable to query the new index and throw
 this error:
 
InvalidRequestException(why:No indexed columns present in index
 clause with operator EQ)
 
 The third is able to query the new index happily but doesn't find any
 results, even when I expect it to.
 
 This morning the one node that's able to query the index is also able
 to produce the expected results.  I'm a dummy and didn't use science
 so I don't know if the `nodetool compact` I ran across the cluster had
 anything to do with it.  Regardless, it did not change the situation
 in any other way.
 
 
 `describe cluster;` in cassandra-cli confirms that all three nodes
 have the same schema and `show schema;` confirms that schema includes
 the new column definition and its index.
 
 The my_column_family.my_index-hd-* files only exist on that one node
 that can query the index.
 
 I ran `nodetool repair` on each node and waited for `nodetool
 compactionstats` to report zero pending tasks.  Ditto for `nodetool
 compact`.  The nodes that failed still fail.  The node that succeeded
 still succeed.
 
 Can anyone shed some light?  How do I convince it to let me query the
 index from any node?  How do I get it to find results?
 
 Thanks,
 
 Richard
 
 This email and any files transmitted with it are confidential and intended 
 solely for the individual or entity to whom they are addressed. If you have 
 received this email in error destroy it immediately. *** Walmart Confidential 
 ***



Re: Commit log periodic sync?

2012-08-27 Thread aaron morton
 Brutally. kill -9.
that's fine. I was thinking about reboot -f -n

 We are wondering if the fsync of the commit log was working.
I would say yes only because there other reported problems. 

I think case I would not expect to see data lose. If you are still in a test 
scenario can you try to reproduce the problem ? If possible can you reproduce 
it with a single node ?

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2012, at 11:00 AM, rubbish me rubbish...@googlemail.com wrote:

 Thanks, Aaron, for your reply - please see the inline.
 
 
 On 24 Aug 2012, at 11:04, aaron morton wrote:
 
 - we are running on production linux VMs (not ideal but this is out of our 
 hands)
 Is the VM doing anything wacky with the IO ?
 
 Could be.  But I thought we would ask here first.  This is a bit difficult to 
 prove cos we dont have the control over these VMs.
 
  
 
 As part of a DR exercise, we killed all 6 nodes in DC1,
 Nice disaster. Out of interest, what was the shutdown process ?
 
 Brutally. kill -9.
 
 
 
 We noticed that data that was written an hour before the exercise, around 
 the last memtables being flushed,was not found in DC1. 
 To confirm, data was written to DC 1 at CL LOCAL_QUORUM before the DR 
 exercise. 
 
 Was the missing data written before or after the memtable flush ? I'm trying 
 to understand if the data should have been in the commit log or the 
 memtables. 
 
 Missing data was those written after the last flush.  These data was 
 retrievable before the DR exercise.
 
 
 Can you provide some more info on how you are detecting it is not found in 
 DC 1?
 
 
 We tried hector, consistencylevel=local quorum.  We had missing column or the 
 whole row.  
 
 We tried cassandra-cli on DC1 nodes, same.
 
 However once we run the same query on DC2, C* must have then done a 
 read-repair. That particular piece of result data would appear in DC1 again.
 
 
 If we understand correctly, commit logs are being written first and then to 
 disk every 10s. 
 Writes are put into a bounded queue and processed as fast as the IO can keep 
 up. Every 10s a sync messages is added to the queue. Not that the commit log 
 segment may rotate at any time which requires a sync. 
 
 A loss of data across all nodes in a DC seems odd. If you can provide some 
 more information we may be able to help. 
 
 
 We are wondering if the fsync of the commit log was working.  But we saw no 
 errors / warning in logs.  Wondering if there is way to verify
 
 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 24/08/2012, at 6:01 AM, rubbish me rubbish...@googlemail.com wrote:
 
 Hi all
 
 First off, let's introduce the setup. 
 
 - 6 x C* 1.1.2 in active DC (DC1), another 6 in another (DC2)
 - keyspace's RF=3 in each DC
 - Hector as client.
 - client talks only to DC1 unless DC1 can't serve the request. In which 
 case talks only to DC2
 - commit log was periodically sync with the default setting of 10s. 
 - consistency policy = LOCAL QUORUM for both read and write. 
 - we are running on production linux VMs (not ideal but this is out of our 
 hands)
 -
 As part of a DR exercise, we killed all 6 nodes in DC1, hector starts 
 talking to DC2, all the data was still there, everything continued to work 
 perfectly. 
 
 Then we brought all nodes, one by one, in DC1 up. We saw a message saying 
 all the commit logs were replayed. No errors reported.  We didn't run 
 repair at this time. 
 
 We noticed that data that was written an hour before the exercise, around 
 the last memtables being flushed,was not found in DC1. 
 
 If we understand correctly, commit logs are being written first and then to 
 disk every 10s. At worst we lost the last 10s of data. What could be the 
 cause of this behaviour? 
 
 With the blessing of C* we could recovered all these data from DC2. But we 
 would like to understand why. 
 
 Many thanks in advanced. 
 
 Amy
 
 
 
 



Re: Cassandra 1.1.4 RPM required

2012-08-27 Thread Marco Schirrmeister

On Aug 23, 2012, at 12:15 PM, Adeel Akbar wrote:

 Dear Aaron, Its required username and password which I have not. Can yo share 
 direct link?
 

There is no username and password for the Datastax rpm repository.
http://rpm.datastax.com/community/

But there is no 1.1.4 version yet from Datastax.


If you really need a 1.1.4 rpm. You can give my build a shot.
I just started rolling my own packages for some reasons.
Until my public rpm repo goes online, you can grab here the cassandra rpm.
http://people.ogilvy.de/~mschirrmeister/linux/cassandra/

If you want, test it out. It's just a first build and not heavily tested.


Marco



Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-27 Thread aaron morton
 After thinking about how
 sstables are done on disk, it seems best (required??) to write out
 each row at once.  
Sort of. We only want one instance of the row per SSTable created. 


 Any other tips to improve load time or reduce the load on the cluster
 or subsequent compaction activity? 

Less SSTables means less compaction. So go as high as you can on the 
bufferSizeInMB param for the 
SSTableSimpleUnsortedWriter. 

There is also a SSTableSimpleWriter. Because it expects rows to be ordered it 
does not buffer and can create bigger sstables.
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java


 Right now my Cassandra data store has about 4 months of data and we
 have 5 years of historical 
ingest all the histories!

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2012, at 12:56 PM, Aaron Turner synfina...@gmail.com wrote:

 So I've read: http://www.datastax.com/dev/blog/bulk-loading
 
 Are there any tips for using sstableloader /
 SSTableSimpleUnsortedWriter to migrate time series data from a our old
 datastore (PostgreSQL) to Cassandra?  After thinking about how
 sstables are done on disk, it seems best (required??) to write out
 each row at once.  Ie: if each row == 1 years worth of data and you
 have say 30,000 rows, write one full row at a time (a full years worth
 of data points for a given metric) rather then 1 data point for 30,000
 rows.
 
 Any other tips to improve load time or reduce the load on the cluster
 or subsequent compaction activity?   All my CF's I'll be writing to
 use compression and leveled compaction.
 
 Right now my Cassandra data store has about 4 months of data and we
 have 5 years of historical (not sure yet how much we'll actually load
 yet, but minimally 1 years worth).
 
 Thanks!
 
 -- 
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
 carpe diem quam minimum credula postero



Re: unsubscribe

2012-08-27 Thread aaron morton
http://wiki.apache.org/cassandra/FAQ#unsubscribe

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/08/2012, at 4:12 AM, Shen szs...@gmail.com wrote:

 



Re: Expanding cluster to include a new DR datacenter

2012-08-27 Thread aaron morton
I did a quick test on a clean 1.1.4 and it worked 

Can you check the logs for errors ? Can you see your schema change in there ?

Also what is the output from show schema; in the cli ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2012, at 6:53 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:

 Yes
  
 [default@unknown] describe cluster;
 Cluster Information:
Snitch: org.apache.cassandra.locator.PropertyFileSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
 9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 
 10.20.8.1, 10.20.8.2, 10.20.8.3]
  
 From: Mohit Anchlia [mailto:mohitanch...@gmail.com] 
 Sent: Friday, August 24, 2012 1:55 PM
 To: user@cassandra.apache.org
 Subject: Re: Expanding cluster to include a new DR datacenter
  
 That's interesting can you do describe cluster?
 
 On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 So I’m at the point of updating the keyspaces from Simple to NetworkTopology 
 and I’m not sure if the changes are being accepted using Cassandra-cli.
  
 I issue the change:
  
 [default@EBonding] update keyspace EBonding
 ... with placement_strategy = 
 'org.apache.cassandra.locator.NetworkTopologyStrategy'
 ... and strategy_options={Fisher:2};
 9511e292-f1b6-3f78-b781-4c90aeb6b0f6
 Waiting for schema agreement...
 ... schemas agree across the cluster
  
 Then I do a describe and it still shows the old strategy.  Is there something 
 else that I need to do?  I’ve exited and restarted Cassandra-cli and it still 
 shows the SimpleStrategy for that keyspace.  Other nodes show the same 
 information.
  
 [default@EBonding] describe EBonding;
 Keyspace: EBonding:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:2]
  
  
 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] 
 Sent: Thursday, August 23, 2012 11:06 AM
 To: user@cassandra.apache.org
 Subject: RE: Expanding cluster to include a new DR datacenter
  
 Thanks for the information!  Answers my questions.
  
 From: Tyler Hobbs [mailto:ty...@datastax.com] 
 Sent: Wednesday, August 22, 2012 7:10 PM
 To: user@cassandra.apache.org
 Subject: Re: Expanding cluster to include a new DR datacenter
  
 If you didn't see this particular section, you may find it 
 useful:http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster
 
 Some comments inline:
 
 On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 We are in the process of building out a new DR system in another Data Center, 
 and we want to mirror our Cassandra environment to that DR.  I have a couple 
 questions on the best way to do this after reading the documentation on the 
 Datastax website.  We didn’t initially plan for this to be a DR setup when 
 first deployed a while ago due to budgeting, but now we need to.  So I’m just 
 trying to nail down the order of doing this as well as any potential issues.
  
 For the nodes, we don’t plan on querying the servers in this DR until we fail 
 over to this data center.   We are going to have 5 similar nodes in the DR, 
 should I join them into the ring at token+1?
 
 Join them at token+10 just to leave a little space.  Make sure you're using 
 LOCAL_QUORUM for your queries instead of regular QUORUM.
  
  
 All keyspaces are set to the replication strategy of SimpleStrategy.  Can I 
 change the replication strategy after joining the new nodes in the DR to 
 NetworkTopologyStategy with the updated replication factor for each dr?
 
 Switch your keyspaces over to NetworkTopologyStrategy before adding the new 
 nodes.  For the strategy options, just list the first dc until the second is 
 up (e.g. {main_dc: 3}).
  
  
 Lastly, is changing snitch from default of SimpleSnitch to 
 RackInferringSnitch going to cause any issues?  Since its in the 
 Cassandra.yaml file I assume a rolling restart to pick up the value would be 
 ok?
 
 This is the first thing you'll want to do.  Unless your node IPs would 
 naturally put all nodes in a DC in the same rack, I recommend using 
 PropertyFileSnitch, explicitly using the same rack.  (I tend to prefer 
 PFSnitch regardless; it's harder to accidentally mess up.)  A rolling restart 
 is required to pick up the change.  Make sure to fill out 
 cassandra-topology.properties first if using PFSnitch.
  
  
 This is all on Cassandra 1.1.4, Thanks for any help!
  
  
 
 
 
 -- 
 Tyler Hobbs
 DataStax
 



Re: QUORUM writes, QUORUM reads -- and eventual consistency

2012-08-27 Thread aaron morton
  Doesn't this mean that the read does not reflect the most recent write?
Yes. 
A write that fails is not a write. 

 If it were to have read the newer data from the 1 node and then afterwards 
 read the old data from the other 2 then there is a consistency problem, but 
 in the example you give the second reader seems to still have a consistent 
 view.
In the scenario of a TimedOutException for a write that is entirely possible. 
The write is not considered to be successful at the CL requested. So R + W  N 
does not hold for that datum. 

When in doubt, ask Werner…

when R + W  N we have strong consistency…
Strong consistency. After the update completes, any subsequent access (by A, 
B, or C) will return the updated value.

when R + W = N we have weak / eventual consistency…
*Eventual consistency. This is a specific form of weak consistency; the 
storage system guarantees that if no new updates are made to the object, 
eventually *all* accesses will return the last updated value.

http://queue.acm.org/detail.cfm?id=1466448
(emphasis added)

In C* this may mean HH or RR or repair or standard CL checks kicking in to make 
the second read return the correct consistent value. 

 Isn't it cheaper to retry the mutation on _any exception_ than to have a 
 transaction in place for the majority of non failing writes?
Yes (with the counter exception). 

if you get an UnavailableException it's from the point of view of the 
coordinator. it may be the case that the coordinator is isolated and all the 
other nodes are UP and happy. 

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/08/2012, at 5:03 AM, Guillermo Winkler gwink...@inconcertcc.com wrote:

 Isn't it cheaper to retry the mutation on _any exception_ than to have a 
 transaction in place for the majority of non failing writes?
 
 The special case to be considered is obviously counters which are not 
 idempotent
 
 https://issues.apache.org/jira/browse/CASSANDRA-2495 
 
 
 
 On Sat, Aug 25, 2012 at 4:38 AM, Russell Haering russellhaer...@gmail.com 
 wrote:
 The issue is that it is possible for a quorum write to return an
 error, but for the result of the write to still be reflected in the
 view seen by the client. There is really no performant way around this
 (although reading at ALL can make it much less frequent). Guaranteeing
 complete success or failure would (barring a creative solution I'm
 unaware of) require a transactional commit of some sort across the
 replica nodes for the key being written to. The performance tradeoff
 might be desirable under some circumstances, but if this is a
 requirement you should probably look at other databases.
 
 Some good rules to play by (someone correct me if these aren't 100% true):
 
 1. For writes to a single key, an UnavailableException means the write
 failed totally (clients will never see the data you wrote)
 2. For writes to a single key, a TimedOutException means you cannot
 know whether the write succeeded or failed
 3. For writes to multiple keys, either an UnavailableException or a
 TimedOutException means you cannot know whether the write succeeded or
 failed.
 
 -Russell
 
 On Sat, Aug 25, 2012 at 12:17 AM, Guillermo Winkler
 gwink...@inconcertcc.com wrote:
  Hi Philip,
 
  From http://wiki.apache.org/cassandra/ArchitectureOverview
 
  Quorum write: blocks until quorum is reached
 
  By my understanding if you _did_ a quorum write it means it successfully
  completed.
 
  Guille
 
 
  I *think* we're saying the same thing here. The addition of the word
  successful (or something more suitable) would make the documentation more
  precise, not less.
 



Re: Decreasing the number of nodes in the ring

2012-08-27 Thread Henrik Schröder
Removetoken should only be used when removing a dead node from a cluster,
it's a much slower and more expensive operation since it triggers a repair
so that the remaining nodes can figure out which data they should now have.
Decommission on the other hand is much simpler, the node that's being
decommissioned streams the data it has to the nodes that should have it,
and then removes itself.

I don't know exactly what your load is like, but I think the best way to
accomplish it is like this:
You have nodes: 1 2 3 4 5 6 7 8 9
Add SSD nodes: S1 1 2 3 S2 4 5 6 S3 7 8 9
Decommission 1, 4, 7
Check if you can remove more nodes
Decommission 2, 5, 8
Check if you can remove more nodes
Decommission 3, 6, 9
And when you've stopped, make sure your ring is balanced by using nodetool
move.

It's probably a bad idea to run with a lopsided cluster where some servers
are much faster than the others. If you have a replication factor of 3,
that means that half of your data will be on two slow and one fast machines
(so quorum will be slow) and the oher half will be on two fast and one slow
machine (so quorum will be fast). This leads to the somewhat unintuitive
conclusion that you can make the cluster go faster by removing nodes.

But it's your data and your cluster, so you need to measure and benchmark
and figure out what's best for you and your app.


/Henrik

On Mon, Aug 27, 2012 at 4:22 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 use nodetool decommission and nodetool removetoken


 On Sun, Aug 26, 2012 at 5:31 PM, Senthilvel Rangaswamy 
 senthil...@gmail.com wrote:

 We have a cluster of 9 nodes in the ring. We would like SSD backed boxes.
 But we may not need 9
 nodes in that case. What is the best way to downscale the cluster to 6 or
 3 nodes.

 --
 ..Senthil

 If there's anything more important than my ego around, I want it
  caught and shot now.
 - Douglas Adams.





Understanding Cassandra + MapReduce + Composite Columns

2012-08-27 Thread Víctor Penela
Hi!

I'm trying to use Hadoop's MapReduce on top of a Cassandra environment, and
I've running into some issues while using Composite Columns. I'm currently
using Cassandra 1.1.2 (I wouldn't mind having to update it) and Hadoop
1.0.3 (I'd rather keep this version).

What I would like to do is send slices divided by the first key of the
composite key, and do some processing, taking into account the rest of the
elements of the composite key (as well as other columns).

I've built a sandbox keyspace with some column families in order to test
this:

CREATE TABLE test_1 (
  field1 text,
  field2 text,
  field3 text,
  field4 text,
  PRIMARY KEY (field1)
) ;
CREATE TABLE test_2 (
  field1 text,
  field2 text,
  field3 text,
  field4 text,
  PRIMARY KEY (field1, field2)
) ;

The Job configuration (the relevant elements for Cassandra) is as follows:
// Cassandra config
ConfigHelper.setInputRpcPort(conf, 9160);
ConfigHelper.setInputInitialAddress(conf, localhost);
ConfigHelper.setInputPartitioner(conf, ByteOrderedPartitioner);
ConfigHelper.setInputColumnFamily(conf, KEYSPACE, INPUT_COLUMN_FAMILY);

SlicePredicate predicate = new SlicePredicate();
predicate.setSlice_range(new
SliceRange().setStart(ByteBufferUtil.EMPTY_BYTE_BUFFER).setFinish(ByteBufferUtil.EMPTY_BYTE_BUFFER).setCount(5));
ConfigHelper.setInputSlicePredicate(conf, predicate);

My dummy maps tries only to log the different keys and values received.
map(ByteBuffer key, SortedMapByteBuffer, IColumn columns,
OutputCollectorText, IntWritable output,  Reporter reporter) { ... }

With CF test_1 everything seems to work fine.

With CF test_2, I only receive field1 value inside the ByteBuffer key. The
rest of the composite key seems to be encoded into each key of the
SortedMap with the particular key of that column (field3, field4, ...), but
I don't know exactly how to extract it (I'm a bit new with ByteBuffers, so
any help there will be welcome :)). Is there anyway to specify the schema
of this particular CF at MR level, in order to be able to extract the
secondary key?

Thanks!


Re: Secondary index partially created

2012-08-27 Thread Richard Crowley
On Mon, Aug 27, 2012 at 12:59 AM, aaron morton aa...@thelastpickle.com wrote:
 If you are still having problems can you post the query and the output from
 nodetool cfstats on one of the nodes that fails ?

driftx got me sorted.  It escaped me that a rolling restart was
necessary to build secondary indexes, which was masked by one node
deciding to build its portion without a restart.

Thanks,

Richard


unsubscribe

2012-08-27 Thread Nikolaidis, Christos


This e-mail and files transmitted with it are confidential, and are intended 
solely for the use of the individual or entity to whom this e-mail is 
addressed. If you are not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you are not one of the named recipient(s) or otherwise 
have reason to believe that you received this message in error, please 
immediately notify sender by e-mail, and destroy the original message. Thank 
You.


Re: unsubscribe

2012-08-27 Thread Eric Evans
On Mon, Aug 27, 2012 at 9:50 AM, Nikolaidis, Christos
cnikolai...@epsilon.com wrote:

 This e-mail and files transmitted with it are confidential, and are intended 
 solely for the use of the individual or entity to whom this e-mail is 
 addressed. If you are not the intended recipient, or the employee or agent 
 responsible to deliver it to the intended recipient, you are hereby notified 
 that any dissemination, distribution or copying of this communication is 
 strictly prohibited. If you are not one of the named recipient(s) or 
 otherwise have reason to believe that you received this message in error, 
 please immediately notify sender by e-mail, and destroy the original message. 
 Thank You.

Since I am not in a position to unsubscribe anyone, I can only assume
that I have received this message in error.  As per the frightening
legalese quoted above, I hereby notify you by email, and will now
proceed to destroy the original message.

Please don't sue me.

Love,

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: unsubscribe

2012-08-27 Thread André Cruz
On Aug 27, 2012, at 4:11 PM, Eric Evans eev...@acunu.com wrote:
 Since I am not in a position to unsubscribe anyone, I can only assume
 that I have received this message in error.  As per the frightening
 legalese quoted above, I hereby notify you by email, and will now
 proceed to destroy the original message.

I think you have just engaged in dissemination, distribution or copying of 
this communication. Better blow up your computer and make a run for it.

André



RE: unsubscribe

2012-08-27 Thread Nikolaidis, Christos
No worries :-)  I was replying to the list so whoever manages it can 
unsubscribe me.

-Original Message-
From: Eric Evans [mailto:eev...@acunu.com]
Sent: Monday, August 27, 2012 11:12 AM
To: user@cassandra.apache.org
Subject: Re: unsubscribe

On Mon, Aug 27, 2012 at 9:50 AM, Nikolaidis, Christos cnikolai...@epsilon.com 
wrote:

 This e-mail and files transmitted with it are confidential, and are intended 
 solely for the use of the individual or entity to whom this e-mail is 
 addressed. If you are not the intended recipient, or the employee or agent 
 responsible to deliver it to the intended recipient, you are hereby notified 
 that any dissemination, distribution or copying of this communication is 
 strictly prohibited. If you are not one of the named recipient(s) or 
 otherwise have reason to believe that you received this message in error, 
 please immediately notify sender by e-mail, and destroy the original message. 
 Thank You.

Since I am not in a position to unsubscribe anyone, I can only assume that I 
have received this message in error.  As per the frightening legalese quoted 
above, I hereby notify you by email, and will now proceed to destroy the 
original message.

Please don't sue me.

Love,

--
Eric Evans
Acunu | http://www.acunu.com | @acunu

This e-mail and files transmitted with it are confidential, and are intended 
solely for the use of the individual or entity to whom this e-mail is 
addressed. If you are not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you are not one of the named recipient(s) or otherwise 
have reason to believe that you received this message in error, please 
immediately notify sender by e-mail, and destroy the original message. Thank 
You.


Re: unsubscribe

2012-08-27 Thread André Cruz
On Aug 27, 2012, at 4:16 PM, Nikolaidis, Christos cnikolai...@epsilon.com 
wrote:

 No worries :-)  I was replying to the list so whoever manages it can 
 unsubscribe me.

That's not how you unsubscribe. You need to send an email to 
user-unsubscr...@cassandra.apache.org.

André



can you use hostnames in the topology file?

2012-08-27 Thread Hiller, Dean
In the example, I see all ips being used, but our machines are on dhcp so I 
would prefer using hostnames for everything(plus if a machine goes down, I can 
bring it back online on another machine with a different ip but same hostname).

If I use hostname, does the listen_address have to be hardwired to that same 
EXACT hostname for lookup purposes as well?  Or will localhost grab the 
hostname though it looks like it grabs the ip.

Thanks,
Dean


Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-27 Thread Aaron Turner
On Mon, Aug 27, 2012 at 1:19 AM, aaron morton aa...@thelastpickle.com wrote:
 After thinking about how
 sstables are done on disk, it seems best (required??) to write out
 each row at once.

 Sort of. We only want one instance of the row per SSTable created.

Ah, good clarification, although I think for my purposes they're one
in the same.


 Any other tips to improve load time or reduce the load on the cluster
 or subsequent compaction activity?

 Less SSTables means less compaction. So go as high as you can on the
 bufferSizeInMB param for the
 SSTableSimpleUnsortedWriter.

Ok.

 There is also a SSTableSimpleWriter. Because it expects rows to be ordered
 it does not buffer and can create bigger sstables.
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java

Hmmm prolly not realistic in my situation... doing so would likely
thrash the disks on my PG server a lot more and kill my read
throughput and that server is already hitting a wall.


 Right now my Cassandra data store has about 4 months of data and we
 have 5 years of historical

 ingest all the histories!

Actually, I was a little worried about how much space that would
take... my estimates was ~305GB/year, which is a lot when you consider
the 300-400GB/node limit (something I didn't know about at the time).
However, compression has turned out to be extremely efficient on my
dataset... just under 4 months of data is less then 2GB!  I'm pretty
thrilled.


-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


RE: Expanding cluster to include a new DR datacenter

2012-08-27 Thread Bryce Godfrey
Show schema output show the simple strategy still
[default@unknown] show schema EBonding;
create keyspace EBonding
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 2}
  and durable_writes = true;

This is the only thing I see in the system log at the time on all the nodes:

INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 
659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 
serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing 
Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) 
Completed flushing 
/opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$


Should I turn the logging level up on something to see some more info maybe?

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, August 27, 2012 1:35 AM
To: user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

I did a quick test on a clean 1.1.4 and it worked

Can you check the logs for errors ? Can you see your schema change in there ?

Also what is the output from show schema; in the cli ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2012, at 6:53 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:


Yes

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 
10.20.8.2, 10.20.8.3]

From: Mohit Anchlia [mailto:mohitanch...@gmail.comhttp://gmail.com]
Sent: Friday, August 24, 2012 1:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

That's interesting can you do describe cluster?
On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
... and strategy_options={Fisher:2};
9511e292-f1b6-3f78-b781-4c90aeb6b0f6
Waiting for schema agreement...
... schemas agree across the cluster

Then I do a describe and it still shows the old strategy.  Is there something 
else that I need to do?  I've exited and restarted Cassandra-cli and it still 
shows the SimpleStrategy for that keyspace.  Other nodes show the same 
information.

[default@EBonding] describe EBonding;
Keyspace: EBonding:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]


From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com]
Sent: Thursday, August 23, 2012 11:06 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Expanding cluster to include a new DR datacenter

Thanks for the information!  Answers my questions.

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, August 22, 2012 7:10 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

If you didn't see this particular section, you may find it 
useful:http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

Some comments inline:
On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
We are in the process of building out a new DR system in another Data Center, 
and we want to mirror our Cassandra environment to that DR.  I have a couple 
questions on the best way to do this after reading the documentation on the 
Datastax website.  We didn't initially plan for this to be a DR setup when 
first deployed a while ago due to budgeting, but now we need to.  So I'm just 
trying to nail down the order of doing this as well as any potential issues.

For the nodes, we don't plan on querying the servers in this DR until we fail 
over to this data center.   We are going to have 5 similar nodes in the DR, 
should I join them into the ring at token+1?

Join them at token+10 just to leave a little space.  Make sure you're using 
LOCAL_QUORUM for your queries instead of regular QUORUM.


All keyspaces are set to the replication strategy of SimpleStrategy.  Can I 
change the replication strategy after joining the new nodes in the DR to 
NetworkTopologyStategy with the updated replication factor for each dr?

Switch your keyspaces over to 

Re: Expanding cluster to include a new DR datacenter

2012-08-27 Thread Mohit Anchlia
In your update command is it possible to specify RF for both DC? You could
just do DC1:2, DC2:0.

On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey
bryce.godf...@azaleos.comwrote:

  Show schema output show the simple strategy still

 [default@unknown] show schema EBonding;

 create keyspace EBonding

   with placement_strategy = 'SimpleStrategy'

   and strategy_options = {replication_factor : 2}

   and durable_writes = true;

 ** **

 This is the only thing I see in the system log at the time on all the
 nodes:

 ** **

 INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java
 (line 659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228
 serialized/live bytes, 4 ops)

 INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264)
 Writing Memtable-schema_keyspaces@1157216346(183/228 serialized/live
 bytes, 4 ops)

 INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305)
 Completed flushing
 /opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$

 ** **

 ** **

 Should I turn the logging level up on something to see some more info
 maybe?

 ** **

 *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Monday, August 27, 2012 1:35 AM

 *To:* user@cassandra.apache.org
 *Subject:* Re: Expanding cluster to include a new DR datacenter

  ** **

 I did a quick test on a clean 1.1.4 and it worked 

 ** **

 Can you check the logs for errors ? Can you see your schema change in
 there ?

 ** **

 Also what is the output from show schema; in the cli ? 

 ** **

 Cheers

 ** **

 -

 Aaron Morton

 Freelance Developer

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 25/08/2012, at 6:53 PM, Bryce Godfrey bryce.godf...@azaleos.com
 wrote:



 

  Yes

  

 [default@unknown] describe cluster;

 Cluster Information:

Snitch: org.apache.cassandra.locator.PropertyFileSnitch

Partitioner: org.apache.cassandra.dht.RandomPartitioner

Schema versions:

 9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5,
 10.20.8.1, 10.20.8.2, 10.20.8.3]

  

 *From:* Mohit Anchlia [mailto:mohitanch...@gmail.com]
 *Sent:* Friday, August 24, 2012 1:55 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Expanding cluster to include a new DR datacenter

  

 That's interesting can you do describe cluster?

 On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey bryce.godf...@azaleos.com
 wrote:

  So I’m at the point of updating the keyspaces from Simple to
 NetworkTopology and I’m not sure if the changes are being accepted using
 Cassandra-cli.

  

 I issue the change:

  

 [default@EBonding] update keyspace EBonding

 ... with placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy'

 ... and strategy_options={Fisher:2};

 9511e292-f1b6-3f78-b781-4c90aeb6b0f6

 Waiting for schema agreement...

 ... schemas agree across the cluster

  

 Then I do a describe and it still shows the old strategy.  Is there
 something else that I need to do?  I’ve exited and restarted Cassandra-cli
 and it still shows the SimpleStrategy for that keyspace.  Other nodes show
 the same information.

  

 [default@EBonding] describe EBonding;

 Keyspace: EBonding:

   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy

   Durable Writes: true

 Options: [replication_factor:2]

  

  

 *From:* Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 *Sent:* Thursday, August 23, 2012 11:06 AM
 *To:* user@cassandra.apache.org
 *Subject:* RE: Expanding cluster to include a new DR datacenter

  

 Thanks for the information!  Answers my questions.

  

 *From:* Tyler Hobbs [mailto:ty...@datastax.com ty...@datastax.com]
 *Sent:* Wednesday, August 22, 2012 7:10 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Expanding cluster to include a new DR datacenter

  

 If you didn't see this particular section, you may find it useful:
 http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

 Some comments inline:

 On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey bryce.godf...@azaleos.com
 wrote:

  We are in the process of building out a new DR system in another Data
 Center, and we want to mirror our Cassandra environment to that DR.  I have
 a couple questions on the best way to do this after reading the
 documentation on the Datastax website.  We didn’t initially plan for this
 to be a DR setup when first deployed a while ago due to budgeting, but now
 we need to.  So I’m just trying to nail down the order of doing this as
 well as any potential issues.

  

 For the nodes, we don’t plan on querying the servers in this DR until we
 fail over to this data center.   

Re: one node with very high loads

2012-08-27 Thread Rob Coli
On Mon, Aug 27, 2012 at 9:25 AM, Senthilvel Rangaswamy
senthil...@gmail.com wrote:
 We are running 1.1.2 on m1.xlarge with ephemeral store for data. We are
 seeing very high loads on one of the nodes in the ring, 30+.

My first hunch would be that you are sending all client requests to
this one node, so it is coordinating 30x as many requests as it
should.

If that's not the case, if I were you I would attempt to determine if
the high i/o is high read or write on the node, via a tool like iotop.
You can also compare the tpstats of two nodes with similar uptimes to
see if your node is performing more of any stage than other members of
its cohort.

Once you determine whether it's read or write, determine which files
are being read or written.. :)

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


JMX(RMI) dynamic port allocation problem still exists?

2012-08-27 Thread Yang
in my previous job we ran across the issue that JMX allocates ports for RMI
dynamically, so that
nodetool does not work if our env is in EC2, and all the ports have to be
specifically opened, and
we can't open a range of ports, but only specific ports.

at the time, we followed this:

https://blogs.oracle.com/jmxetc/entry/connecting_through_firewall_using_jmx


to create a small javaagent jar for cassandra startup, so that we use a
fixed RMI port.



now, does Cassandra come with an out-of-the box solution to fix the above
problem? or do I have
to create that little javaagent jar myself?

Thanks
Yang


cassandra twitter ruby client

2012-08-27 Thread Yuhan Zhang
Hi all,

I'm playing with cassandra's ruby client written by twitter,  trying to
perform a simple get.

but looks like it assumed the value types to be uft8 string. however, my
values are in double (keyed and column names are utf8types).
The values that I got are like:
{Top:?\ufffd\ufffd\ufffd\u\u\u\u, ... }

how do I pass double serializer to the api client?


Thank you.

Yuhan


Re: JMX(RMI) dynamic port allocation problem still exists?

2012-08-27 Thread Hiller, Dean
In cassandra-env.sh, search on JMX_PORT and it is set to 7199 (ie. Fixed) so 
that solves your issue, correct?

Dean

From: Yang tedd...@gmail.commailto:tedd...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, August 27, 2012 3:44 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: JMX(RMI) dynamic port allocation problem still exists?

ow, does Cassandra come with an out-of-the box solution to fix the above 
problem? or do I have
to create that little javaagent jar myself?


Re: JMX(RMI) dynamic port allocation problem still exists?

2012-08-27 Thread Yang
no, the priblem is that jmx listens on 7199, once an incoming connection is
made, it literally tells the other side come and connect to me on these 2
rmi ports,  and open up  2 random  Rmi ports

we used to use the trick in the above link to resolve this
On Aug 27, 2012 3:04 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 In cassandra-env.sh, search on JMX_PORT and it is set to 7199 (ie. Fixed)
 so that solves your issue, correct?

 Dean

 From: Yang tedd...@gmail.commailto:tedd...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Monday, August 27, 2012 3:44 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: JMX(RMI) dynamic port allocation problem still exists?

 ow, does Cassandra come with an out-of-the box solution to fix the above
 problem? or do I have
 to create that little javaagent jar myself?



Re: Counters and replication factor

2012-08-27 Thread Radim Kolar

Dne 25.5.2012 2:41, Edward Capriolo napsal(a):


Also it does not sound like you have run anti entropy repair. You 
should do that when upping rf.
i run entropy repairs and it still does not fix counters. I have some 
reports from users with same problem but nobody discovered repeatable 
scenario. I am currently in migrating phase to Infinispan data grid, it 
does not seems to have problems with distributed counters.


RE: Expanding cluster to include a new DR datacenter

2012-08-27 Thread Bryce Godfrey
Same results.  I restarted the node also to see if it just wasn't picking up 
the changes and it still shows Simple.

When I specify the DC for strategy_options I should be using the DC name from 
properfy file snitch right?  Ours is Fisher and TierPoint so that's what I 
used.

From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 1:21 PM
To: user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

In your update command is it possible to specify RF for both DC? You could just 
do DC1:2, DC2:0.
On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Show schema output show the simple strategy still
[default@unknown] show schema EBonding;
create keyspace EBonding
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 2}
  and durable_writes = true;

This is the only thing I see in the system log at the time on all the nodes:

INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 
659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 
serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing 
Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) 
Completed flushing 
/opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$


Should I turn the logging level up on something to see some more info maybe?

From: aaron morton 
[mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com]
Sent: Monday, August 27, 2012 1:35 AM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

I did a quick test on a clean 1.1.4 and it worked

Can you check the logs for errors ? Can you see your schema change in there ?

Also what is the output from show schema; in the cli ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 25/08/2012, at 6:53 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:

Yes

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 
10.20.8.2, 10.20.8.3]

From: Mohit Anchlia 
[mailto:mohitanchlia@mailto:mohitanchlia@gmail.comhttp://gmail.com/]
Sent: Friday, August 24, 2012 1:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

That's interesting can you do describe cluster?
On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
... and strategy_options={Fisher:2};
9511e292-f1b6-3f78-b781-4c90aeb6b0f6
Waiting for schema agreement...
... schemas agree across the cluster

Then I do a describe and it still shows the old strategy.  Is there something 
else that I need to do?  I've exited and restarted Cassandra-cli and it still 
shows the SimpleStrategy for that keyspace.  Other nodes show the same 
information.

[default@EBonding] describe EBonding;
Keyspace: EBonding:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]


From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com]
Sent: Thursday, August 23, 2012 11:06 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Expanding cluster to include a new DR datacenter

Thanks for the information!  Answers my questions.

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, August 22, 2012 7:10 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

If you didn't see this particular section, you may find it 
useful:http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

Some comments inline:
On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
We are in the process of building out a new DR system in another Data Center, 
and we want to mirror our Cassandra environment to that DR.  I have a couple 
questions on the best way to do this after reading the documentation on the 
Datastax website.  We didn't initially plan for this to 

Re: cassandra twitter ruby client

2012-08-27 Thread Peter Sanford
That library requires you to serialize and deserialize the data
yourself. So to insert a ruby Float you would

  value = 28.21
  [value].pack('G')
  @client.insert(:somecf, 'key', {'floatval' = [value].pack('G')})

and to read it back out:

  value = @client.get(:somecf, 'key', ['floatval']).unpack('G')[0]

Note that the cassandra-cql library will do (most) typecasts for you.

-psanford

On Mon, Aug 27, 2012 at 2:49 PM, Yuhan Zhang yzh...@onescreen.com wrote:
 Hi all,

 I'm playing with cassandra's ruby client written by twitter,  trying to
 perform a simple get.

 but looks like it assumed the value types to be uft8 string. however, my
 values are in double (keyed and column names are utf8types).
 The values that I got are like:
 {Top:?\ufffd\ufffd\ufffd\u\u\u\u, ... }

 how do I pass double serializer to the api client?


 Thank you.

 Yuhan


Re: Expanding cluster to include a new DR datacenter

2012-08-27 Thread Mohit Anchlia
Can you describe your schema again with TierPoint in it?

On Mon, Aug 27, 2012 at 3:22 PM, Bryce Godfrey bryce.godf...@azaleos.comwrote:

  Same results.  I restarted the node also to see if it just wasn’t
 picking up the changes and it still shows Simple.  

 ** **

 When I specify the DC for strategy_options I should be using the DC name
 from properfy file snitch right?  Ours is “Fisher” and “TierPoint” so
 that’s what I used.

 ** **

 *From:* Mohit Anchlia [mailto:mohitanch...@gmail.com]
 *Sent:* Monday, August 27, 2012 1:21 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: Expanding cluster to include a new DR datacenter

  ** **

 In your update command is it possible to specify RF for both DC? You could
 just do DC1:2, DC2:0.

 On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey bryce.godf...@azaleos.com
 wrote:

  Show schema output show the simple strategy still

 [default@unknown] show schema EBonding;

 create keyspace EBonding

   with placement_strategy = 'SimpleStrategy'

   and strategy_options = {replication_factor : 2}

   and durable_writes = true;

  

 This is the only thing I see in the system log at the time on all the
 nodes:

  

 INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java
 (line 659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228
 serialized/live bytes, 4 ops)

 INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264)
 Writing Memtable-schema_keyspaces@1157216346(183/228 serialized/live
 bytes, 4 ops)

 INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305)
 Completed flushing
 /opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$

  

  

 Should I turn the logging level up on something to see some more info
 maybe?

  

 *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Monday, August 27, 2012 1:35 AM 


 *To:* user@cassandra.apache.org
 *Subject:* Re: Expanding cluster to include a new DR datacenter

  

 I did a quick test on a clean 1.1.4 and it worked 

  

 Can you check the logs for errors ? Can you see your schema change in
 there ?

  

 Also what is the output from show schema; in the cli ? 

  

 Cheers

  

 -

 Aaron Morton

 Freelance Developer

 @aaronmorton

 http://www.thelastpickle.com

  

 On 25/08/2012, at 6:53 PM, Bryce Godfrey bryce.godf...@azaleos.com
 wrote:

 ** **

  Yes

  

 [default@unknown] describe cluster;

 Cluster Information:

Snitch: org.apache.cassandra.locator.PropertyFileSnitch

Partitioner: org.apache.cassandra.dht.RandomPartitioner

Schema versions:

 9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5,
 10.20.8.1, 10.20.8.2, 10.20.8.3]

  

 *From:* Mohit Anchlia [mailto:mohitanch...@gmail.com]
 *Sent:* Friday, August 24, 2012 1:55 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Expanding cluster to include a new DR datacenter

  

 That's interesting can you do describe cluster?

 On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey bryce.godf...@azaleos.com
 wrote:

  So I’m at the point of updating the keyspaces from Simple to
 NetworkTopology and I’m not sure if the changes are being accepted using
 Cassandra-cli.

  

 I issue the change:

  

 [default@EBonding] update keyspace EBonding

 ... with placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy'

 ... and strategy_options={Fisher:2};

 9511e292-f1b6-3f78-b781-4c90aeb6b0f6

 Waiting for schema agreement...

 ... schemas agree across the cluster

  

 Then I do a describe and it still shows the old strategy.  Is there
 something else that I need to do?  I’ve exited and restarted Cassandra-cli
 and it still shows the SimpleStrategy for that keyspace.  Other nodes show
 the same information.

  

 [default@EBonding] describe EBonding;

 Keyspace: EBonding:

   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy

   Durable Writes: true

 Options: [replication_factor:2]

  

  

 *From:* Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 *Sent:* Thursday, August 23, 2012 11:06 AM
 *To:* user@cassandra.apache.org
 *Subject:* RE: Expanding cluster to include a new DR datacenter

  

 Thanks for the information!  Answers my questions.

  

 *From:* Tyler Hobbs [mailto:ty...@datastax.com ty...@datastax.com]
 *Sent:* Wednesday, August 22, 2012 7:10 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Expanding cluster to include a new DR datacenter

  

 If you didn't see this particular section, you may find it useful:
 http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

 Some comments inline:


Re: Dynamic Column Families in CQLSH v3

2012-08-27 Thread aaron morton
It's not possible to have Dynamic Columns in CQL 3. The CF definition must 
specify the column names you expect to store. 

The COMPACT STORAGE 
(http://www.datastax.com/docs/1.1/references/cql/CREATE_COLUMNFAMILY) clause of 
the Create CF statement means can have column names that are part dynamic part 
static. But if you want to have CF's where the app code controls the column 
names you need to create the CF using the CLI and stick with the Thrift API. 
(because SELECT in CQL 3 does not support arbitrary column slicing.)  

Background http://www.mail-archive.com/user@cassandra.apache.org/msg23636.html

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 2:24 PM, Erik Onnen eon...@gmail.com wrote:

 Hello All,
 
 Attempting to create what the Datastax 1.1 documentation calls a
 Dynamic Column Family
 (http://www.datastax.com/docs/1.1/ddl/column_family#dynamic-column-families)
 via CQLSH.
 
 This works in v2 of the shell:
 
 create table data ( key varchar PRIMARY KEY) WITH comparator=LongType;
 
 When defined this way via v2 shell, I can successfully switch to v3
 shell and query the CF fine.
 
 The same syntax in v3 yields:
 
 Bad Request: comparator is not a valid keyword argument for CREATE TABLE
 
 The 1.1 documentation indicates that comparator is a valid option for
 at least ALTER TABLE:
 
 http://www.datastax.com/docs/1.1/configuration/storage_configuration#comparator
 
 This leads me to believe that the correct way to create a dynamic
 column family is to create a table with no named columns and alter the
 table later but that also does not work:
 
 create table data (key varchar PRIMARY KEY);
 
 yields:
 
 Bad Request: No definition found that is not part of the PRIMARY KEY
 
 So, my question is, how do I create a Dynamic Column Family via the CQLSH v3?
 
 Thanks!
 -erik



sstableloader error

2012-08-27 Thread Swathi Vikas
Hi,
 
I had uploaded data using sstablelaoder to a single node cluster earlier 
without any problem. Now, while trying to upload to 3 node cluster it is giving 
me below error: 
 
localhost:~/apache-cassandra-1.0.7/sstableloader_folder # bin/sstableloader 
DEMO/
Starting client (and waiting 30 seconds for gossip) ...
Streaming revelant part of DEMO/UMD-hc-1-Data.db to [/10.245.28.232, 
/10.245.28.231, /10.245.28.230]
progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)]
 [/10.245.28.231 0/1 (0)] [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] 
[/10.245.28.231 0/1 (0)] [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] 
[/10.245.28.231 0/1 (0)] [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] 
[/10.245.28.231 0/1 (0)] [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] 
[/10.245.28.231 0/1 (0)] [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] 
[/10.245.28.231 0/1 (0)] [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] 
[/10.245.28.231 0/1 (0)] [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] 
[/10.245.28.231 0/1 (0)] [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] 
[/10.245.28.231 0/1 (0)] [/10.245.28.230 0/0 (100)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 21:41:15,200 Failed attempt 1 to connect to /10.245.28.232 to 
stream null. Retrying in 2 ms. (java.net.ConnectException: Connection timed 
out)
progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
[/10.245.28.230 0/0 (100)] [total: 0 - 0MB/s (avg: 
0MB/s)]^Clocalhost:~/apache-cassandra-1.0.7/sstableloader_folder #

I am running cassandra on foreground. So, on all of the cassandra nodes i get 
the below message:
 INFO 21:40:30,335 Node /192.168.11.11 is now part of the cluster
 INFO 21:40:30,336 InetAddress /192.168.11.11 is now UP
 INFO 21:41:55,320 InetAddress /192.168.11.11 is now dead.
 INFO 21:41:55,321 FatClient /192.168.11.11 has been silent for 3ms, 
removing from gossip

I used ByteOrderPartitioner and filled intial token on all nodes.
I have set seeds as 10.245.28.230,10.245.28.231
I have properly set listen address, rpc_address(0.0.0.0) and ports
 
One thing i noticed is that, when i try to connect to this cluster using 
client(libQtCassandra) and try to create column family, all the nodes respond 
and column family got created properly. 
 
Can anyone help me please.
 
Thanks and Regards,
Swat.vikas

Re: cassandra twitter ruby client

2012-08-27 Thread Yuhan Zhang
Hi Peter,

works well. Thanks for lot!  :D   will check out cassandra-cql.

Yuhan

On Mon, Aug 27, 2012 at 3:34 PM, Peter Sanford
psanf...@nearbuysystems.comwrote:

 That library requires you to serialize and deserialize the data
 yourself. So to insert a ruby Float you would

   value = 28.21
   [value].pack('G')
   @client.insert(:somecf, 'key', {'floatval' = [value].pack('G')})

 and to read it back out:

   value = @client.get(:somecf, 'key', ['floatval']).unpack('G')[0]

 Note that the cassandra-cql library will do (most) typecasts for you.

 -psanford

 On Mon, Aug 27, 2012 at 2:49 PM, Yuhan Zhang yzh...@onescreen.com wrote:
  Hi all,
 
  I'm playing with cassandra's ruby client written by twitter,  trying to
  perform a simple get.
 
  but looks like it assumed the value types to be uft8 string. however, my
  values are in double (keyed and column names are utf8types).
  The values that I got are like:
  {Top:?\ufffd\ufffd\ufffd\u\u\u\u, ... }
 
  how do I pass double serializer to the api client?
 
 
  Thank you.
 
  Yuhan



Automating nodetool repair

2012-08-27 Thread Edward Sargisson

Hi all,
So nodetool repair has to be run regularly on all nodes. Does anybody 
have any interesting strategies or tools for doing this or is everybody 
just setting up cron to do it?


For example, one could write some Puppet code to splay the cron times 
around so that only one should be running at once.
Or, perhaps, a central orchestrator that is given some known quiet time 
and works its way through the list, running nodetool repair one at a 
time (using RPC?) until it runs out of time.


Cheers,
Edward
--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
http://www.globalrelay.com/services/message*--- *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.




Re: Automating nodetool repair

2012-08-27 Thread Aaron Turner
I use cron.  On one box I just do:

for n in node1 node2 node3 node4 ; do
   nodetool -h $n repair
   sleep 120
done

A lot easier then managing a bunch of individual crontabs IMHO
although I suppose I could of done it with puppet, but then you always
have to keep an eye out that your repairs don't overlap over time.

On Mon, Aug 27, 2012 at 4:52 PM, Edward Sargisson
edward.sargis...@globalrelay.net wrote:
 Hi all,
 So nodetool repair has to be run regularly on all nodes. Does anybody have
 any interesting strategies or tools for doing this or is everybody just
 setting up cron to do it?

 For example, one could write some Puppet code to splay the cron times around
 so that only one should be running at once.
 Or, perhaps, a central orchestrator that is given some known quiet time and
 works its way through the list, running nodetool repair one at a time (using
 RPC?) until it runs out of time.

 Cheers,
 Edward
 --

 Edward Sargisson

 senior java developer
 Global Relay

 edward.sargis...@globalrelay.net


 866.484.6630
 New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore
 (+65.3158.1301)

 Global Relay Archive supports email, instant messaging, BlackBerry,
 Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook
 and more.


 Ask about Global Relay Message — The Future of Collaboration in the
 Financial Services World


 All email sent to or from this address will be retained by Global Relay’s
 email archiving system. This message is intended only for the use of the
 individual or entity to which it is addressed, and may contain information
 that is privileged, confidential, and exempt from disclosure under
 applicable law.  Global Relay will not be liable for any compliance or
 technical information provided herein.  All trademarks are the property of
 their respective owners.



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


Re: QUORUM writes, QUORUM reads -- and eventual consistency

2012-08-27 Thread Philip O'Toole
Cool - thanks to all for the replies. I believe I have what I need now. 

Philip

On Aug 25, 2012, at 12:17 AM, Guillermo Winkler gwink...@inconcertcc.com 
wrote:

 Hi Philip, 
 
 From http://wiki.apache.org/cassandra/ArchitectureOverview
 
 Quorum write: blocks until quorum is reached
 
 By my understanding if you _did_ a quorum write it means it successfully 
 completed.
 
 Guille
 
 
 I *think* we're saying the same thing here. The addition of the word 
 successful (or something more suitable) would make the documentation more 
 precise, not less.


Re: Order of the cyclic group of hashed partitioners

2012-08-27 Thread aaron morton
Sorry I don't understand your question. 

Can you explain it a bit more or maybe someone else knows.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/08/2012, at 7:16 PM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote:

 
 Thank you Aaron. 
 This limit was pushed down in RandomPartitioner but the question still 
 exists... 
 
   
 aaron morton aa...@thelastpickle.com a écrit sur 26/08/2012 23:35:50 :
 
   AbstractHashedPartitioner 
  does not exist in the trunk. 
  https://git-wip-us.apache.org/repos/asf?p=cassandra.git;
  a=commitdiff;h=a89ef1ffd4cd2ee39a2751f37044dba3015d72f1
  
  
  Cheers
  
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
  
  On 24/08/2012, at 10:51 PM, Romain HARDOUIN romain.hardo...@urssaf.fr 
  wrote:
  
   
   Hi, 
   
   AbstractHashedPartitioner defines a maximum of 2**127 hence an 
  order of (2**127)+1. 
   I'd say that tokens of such partitioners are intented to be 
  distributed in Z/(127), hence a maximum of (2**127)-1. 
   Could there be a mix up between maximum and order? 
   This is a detail but could someone confirm/invalidate? 
   
   Regards, 
   
   Romain
  



Re: Cassandra 1.1.4 RPM required

2012-08-27 Thread aaron morton
 Dear Aaron, Its required username and password which I have not. Can yo 
 share direct link?
There is no security on the wiki, you should be able to see 
http://wiki.apache.org/cassandra/GettingStarted

What about this page ? http://wiki.apache.org/cassandra/DebianPackaging

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/08/2012, at 8:14 PM, Marco Schirrmeister ma...@schirrmeister.net wrote:

 
 On Aug 23, 2012, at 12:15 PM, Adeel Akbar wrote:
 
 Dear Aaron, Its required username and password which I have not. Can yo 
 share direct link?
 
 
 There is no username and password for the Datastax rpm repository.
 http://rpm.datastax.com/community/
 
 But there is no 1.1.4 version yet from Datastax.
 
 
 If you really need a 1.1.4 rpm. You can give my build a shot.
 I just started rolling my own packages for some reasons.
 Until my public rpm repo goes online, you can grab here the cassandra rpm.
 http://people.ogilvy.de/~mschirrmeister/linux/cassandra/
 
 If you want, test it out. It's just a first build and not heavily tested.
 
 
 Marco