Re: Rolling upgrade from 1.1.12 to 1.2.5 visibility issue

2013-06-21 Thread Polytron Feng
Hi aaron,

Thank you for your reply. We tried to increase PHI threshold but still met
same issue. We used Ec2Snitch and PropertyFileSnitch instead and they work
without this problem. It seems only happened with Ec2MultiRegionSnitch
config. Although we can workaround this problem by PropertyFileSnitch, we
hit another bug: EOFException in
https://issues.apache.org/jira/browse/CASSANDRA-5476. We will try to
upgrade to 1.1.12 first and waiting for the fix of issue 5476.

Thank you!




On Thu, Jun 20, 2013 at 5:49 PM, aaron morton aa...@thelastpickle.comwrote:

 I once had something like this, looking at your logs I donot think it's
 the same thing but here is a post on it
 http://thelastpickle.com/2011/12/15/Anatomy-of-a-Cassandra-Partition/

 It's a little different in 1.2 but the GossipDigestAckVerbHandler (and
 ACK2) should be calling Gossiper.instance.notifyFailureDetector which will
 result in the FailureDetector being called. This will keep the remote node
 marked as up. it looks like this is happening.


 TRACE [GossipTasks:1] 2013-06-19 07:44:52,359 FailureDetector.java
 (line 189) PHI for /54.254.xxx.xxx : 8.05616263930532

 The default phi_convict_threshold is 8, so this node thinks the other is
 just sick enough to be marked as down.

 As a work around try increasing the phi_convict_threshold to 12. Not sure
 why the 1.2 node thinks this, not sure if anything has changed.

 I used to think there was a way to dump the phi values for nodes, but I
 cannot find it. If you call dumpInterArrivalTimes on
 the org.apache.cassandra.net:type=FailureDetector MBean it will dump a
 file in the temp dir called failuredetector-* with the arrival times for
 messages from the other nodes. That may help.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 19/06/2013, at 8:34 PM, Polytron Feng liqpolyt...@gmail.com wrote:


 Hi,

 We are trying to roll upgrade from 1.0.12 to 1.2.5, but we found that the
 1.2.5 node cannot see other old nodes.
 Therefore, we tried to upgrade to 1.1.12 first, and it works.
 However, we still saw the same issue when rolling upgrade from 1.1.12 to
 1.2.5.
 This seems to be the fixed issue as
 https://issues.apache.org/jira/browse/CASSANDRA-5332 but we still saw it
 in 1.2.5.

 Enviroment:
OS: CentOS 6
JDK: 6u31
cluster:3 nodes for testing, in EC2
Snitch: Ec2MultiRegionSnitch
NetworkTopologyStrategy: strategy_options = { ap-southeast:3 }

 We have 3 nodes and we upgraded 122.248.xxx.xxx to 1.2.5 first, the other
 2 nodes are still 1.1.12.
 When we restarted the upgraded node, it will see the other 2 old nodes as
 UP in the log.
 However, after a few seconds, these 2 nodes will be marked as DOWN.
 This is the ring info from 1.2.5 node - 122.248.xxx.xxx

 Note: Ownership information does not include topology; for complete
 information, specify a keyspace

 Datacenter: ap-southeast
 ==
 Address  RackStatus State   LoadOwns
  Token

   113427455640312821154458202477256070486
 122.248.xxx.xxx  1b  Up Normal  69.74 GB33.33%
  1
 54.251.xxx.xxx   1b  Down   Normal  69.77 GB33.33%
  56713727820156410577229101238628035243
 54.254.xxx.xxx   1b  Down   Normal  70.28 GB33.33%
  113427455640312821154458202477256070486


 but Old 1.1.12 nodes can see new node:

 Note: Ownership information does not include topology, please specify
 a keyspace.
 Address DC  RackStatus State   Load
  OwnsToken

  113427455640312821154458202477256070486
 122.248.xxx.xxx ap-southeast1b  Up Normal  69.74 GB
  33.33%  1
 54.251.xxx.xxx  ap-southeast1b  Up Normal  69.77 GB
  33.33%  56713727820156410577229101238628035243
 54.254.xxx.xxx  ap-southeast1b  Up Normal  70.28 GB
  33.33%  113427455640312821154458202477256070486


 We enabled trace log level to check gossip related logs. The log below
 from 1.2.5 node shows that the
 other 2 nodes are UP in the beginning. They seem to complete SYN/ACK/ACK2
 handshake cycle.

 TRACE [GossipStage:1] 2013-06-19 07:44:43,047
 GossipDigestSynVerbHandler.java (line 40) Received a GossipDigestSynMessage
 from /54.254.xxx.xxx
 TRACE [GossipStage:1] 2013-06-19 07:44:43,047
 GossipDigestSynVerbHandler.java (line 71) Gossip syn digests are :
 /54.254.xxx.xxx:1371617084:10967 /54.251.xxx.xxx:1371625851:2055
 TRACE [GossipStage:1] 2013-06-19 07:44:43,048 Gossiper.java (line 945)
 requestAll for /54.254.xxx.xxx
 .

 TRACE [GossipStage:1] 2013-06-19 07:44:43,080
 GossipDigestSynVerbHandler.java (line 84) Sending a GossipDigestAckMessage
 to /54.254.xxx.xxx
 TRACE [GossipStage:1] 2013-06-19 07:44:43,080 MessagingService.java
 (line 601) /122.248.216.142 

Re: Heap is not released and streaming hangs at 0%

2013-06-21 Thread aaron morton
  nodetool -h localhost flush didn't do much good.
Do you have 100's of millions of rows ?
If so see recent discussions about reducing the bloom_filter_fp_chance and 
index_sampling. 

If this is an old schema you may be using the very old setting of 0.000744 
which creates a lot of bloom filters. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote:

 If you want, you can try to force the GC through Jconsole. Memory-Perform GC.
 
 It theoretically triggers a full GC and when it will happen depends on the JVM
 
 -Wei
 
 From: Robert Coli rc...@eventbrite.com
 To: user@cassandra.apache.org
 Sent: Tuesday, June 18, 2013 10:43:13 AM
 Subject: Re: Heap is not released and streaming hangs at 0%
 
 On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote:
  But then shouldn't JVM C G it eventually ? I can still see Cassandra alive
  and kicking but looks like the heap is locked up even after the traffic is
  long stopped.
 
 No, when GC system fails this hard it is often a permanent failure
 which requires a restart of the JVM.
 
  nodetool -h localhost flush didn't do much good.
 
 This adds support to the idea that your heap is too full, and not full
 of memtables.
 
 You could try nodetool -h localhost invalidatekeycache, but that
 probably will not free enough memory to help you.
 
 =Rob



Re: Joining distinct clusters with the same schema together

2013-06-21 Thread aaron morton
  Question 2: is this a sane strategy?
 
 On its face my answer is not... really? 
I'd go with a solid no. 

Just because the the three independent clusters have a schema that looks the 
same does not make them the same. The schema is a versioned document, you will 
not be able to merge them by merging the DC's later without downtime. 

It will be easier to go with a multi DC setup from the start. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/06/2013, at 6:36 AM, Eric Stevens migh...@gmail.com wrote:

 On its face my answer is not... really? What do you view yourself as
 getting with this technique versus using built in replication? As an
 example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
 consistency level operations?
 
 Doing replication manually sounds like a recipe for the DC's eventually 
 getting subtly out of sync with each other.  If a connection goes down 
 between DC's, and you are taking data at both, how will you catch each other 
 up?  C* already offers that resolution for you, and you'd have to work pretty 
 hard to reproduce it for no obvious benefit that I can see.  
 
 For minimum effort, definitely rely on Cassandra's well-tested codebase for 
 this.
 
 
 
 
 On Wed, Jun 19, 2013 at 2:27 PM, Robert Coli rc...@eventbrite.com wrote:
 On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala
 fsareshw...@quantcast.com wrote:
  Each datacenter will have a cassandra cluster with a separate set of seeds
  specific to that datacenter. However, the cluster name will be the same.
 
  Question 1: is this enough to guarentee that the three datacenters will have
  distinct cassandra clusters as well? Or will one node in datacenter A still
  somehow be able to join datacenter B's ring.
 
 If they have network connectivity and the same cluster name, they are
 the same logical cluster. However if your nodes share tokens and you
 have auto_bootstrap=yes (the implicit default) the second node you
 attempt to start will refuse to start because you are trying to
 bootstrap it into the range of a live node.
 
  For now, we are planning on using our own relay mechanism to transfer
  data changes from one datacenter to another.
 
 Are you planning to use the streaming commitlog functionality for
 this? Not sure how you would capture all changes otherwise, except
 having your app just write the same thing to multiple places? Unless
 data timestamps are identical between clusters, otherwise identical
 data will not merge properly, as cassandra uses data timestamps to
 merge.
 
  Question 2: is this a sane strategy?
 
 On its face my answer is not... really? What do you view yourself as
 getting with this technique versus using built in replication? As an
 example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
 consistency level operations?
 
  Question 3: eventually, we want to turn all these cassandra clusters into 
  one
  large multi-datacenter cluster. What's the best practice to do this? Should 
  I
  just add nodes from all datacenters to the list of seeds and let cassandra
  resolve differences? Is there another way I don't know about?
 
 If you are using NetworkTopologyStrategy and have the same cluster
 name for your isolated clusters, all you need to do is :
 
 1) configure NTS to store replicas on a per-datacenter basis
 2) ensure that your nodes are in different logical data centers (by
 default, all nodes are in DC1/rack1)
 3) ensure that clusters are able to reach each other
 4) ensure that tokens do not overlap between clusters (the common
 technique with manual token assignment is that each node gets a range
 which is off-by-one)
 5) ensure that all nodes seed lists contain (recommended) 3 seeds from each DC
 6) rolling restart (so the new seed list is picked up)
 7) repair (should only be required if writes have not replicated via
 your out of band mechanism)
 
 Vnodes change the picture slightly because the chance of your clusters
 having conflicting tokens increases with the number of token ranges
 you have.
 
 =Rob
 



Re: error on startup: unable to find sufficient sources for streaming range

2013-06-21 Thread aaron morton
 On some of my nodes, I'm getting the following exception when cassandra starts
How many nodes? 
Is this a new node or an old one and this problem just started ? 

What version are you on ? 

Do you have this error from system.log ? It includes the thread name which is 
handy to debug things. Also looks like there are some lines missing from the 
first error. 
 
it looks like an error that may happen when a node is bootstrapping or 
replacing an existing node. If you can provide some more context we may be able 
to help.

Cheers
 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/06/2013, at 10:36 AM, Faraaz Sareshwala fsareshw...@quantcast.com wrote:

 Hi,
 
 I couldn't find any information on the following error so I apologize if it 
 has
 already been discussed.
 
 On some of my nodes, I'm getting the following exception when cassandra starts
 up:
 
 2013-06-19 22:17:39.480414500 Exception encountered during startup: unable to 
 find sufficient sources for streaming range 
 (-4250921392403750427,-4250887922781325324]
 2013-06-19 22:17:39.482733500 ERROR Exception in thread 
 Thread[StorageServiceShutdownHook,5,main] 
 (CassandraDaemon.java:org.apache.cassandra.service.CassandraDaemon$1:175)
 2013-06-19 22:17:39.482735500 java.lang.NullPointerException
 2013-06-19 22:17:39.482735500   at 
 org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:321)
 2013-06-19 22:17:39.482736500   at 
 org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:362)
 2013-06-19 22:17:39.482736500   at 
 org.apache.cassandra.service.StorageService.access$000(StorageService.java:88)
 2013-06-19 22:17:39.482751500   at 
 org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:513)
 2013-06-19 22:17:39.482752500   at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 2013-06-19 22:17:39.482752500   at java.lang.Thread.run(Thread.java:662)
 
 Can someone point me to more information about what could cause this error?
 
 Faraaz



Re: Performance Difference between Cassandra version

2013-06-21 Thread aaron morton
 I am trying to see whether there will be any performance difference between 
 Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly?
1.0 has key and row caches defined per CF, 1.1 has global ones which are better 
utilised and easier to manage. 
1.2 moves bloom filters and compression meta off heap which reduces GC, which 
will help. 
Things normally get faster.

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/06/2013, at 11:24 AM, Franc Carter franc.car...@sirca.org.au wrote:

 On Thu, Jun 20, 2013 at 9:18 AM, Raihan Jamal jamalrai...@gmail.com wrote:
 I am trying to see whether there will be any performance difference between 
 Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly?
 
 Has anyone seen any major performance difference?
 
 We are part way through a performance comparison between 1.0.9 with Size 
 Tiered Compaction and 1.2.4 with Leveled Compaction - for our use case it 
 looks like a significant performance improvement on the read side.  We are 
 finding compaction lags when we do very large bulk loads, but for us this is 
 an initialisation task and that's a reasonable trade-off
 
 cheers
 
 -- 
 Franc Carter | Systems architect | Sirca Ltd
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514 
 Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215
 



Re: Unit Testing Cassandra

2013-06-21 Thread aaron morton
  2) Second (in which I am more interested in) is for performance 
  (stress/load) testing. 
Sometimes you can get cassandra-stress (shipped in the bin distro) to 
approximate the expected work load. It's then pretty easy to benchmark and 
tests you configuration changes.

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/06/2013, at 2:25 PM, Shahab Yunus shahab.yu...@gmail.com wrote:

 Thanks Edward, Ben and Dean for the pointers. Yes, I am using Java and these 
 sounds promising for unit testing, at least.
 
 Regards,
 Shahab
 
 
 On Wed, Jun 19, 2013 at 9:58 AM, Edward Capriolo edlinuxg...@gmail.com 
 wrote:
 You really do not need much in java you can use the embedded server. Hector 
 wrap a simple class around thiscalled  EmbeddedServerHelper
 
 
 On Wednesday, June 19, 2013, Ben Boule ben_bo...@rapid7.com wrote:
  Hi Shabab,
 
  Cassandra-Unit has been helpful for us for running unit tests without 
  requiring a real cassandra instance to be running.   We only use this to 
  test our DAO code which interacts with the Cassandra client.  It 
  basically starts up an embedded instance of cassandra and fools your 
  client/driver into using it.  It uses a non-standard port and you just need 
  to make sure you can set the port as a parameter into your client code.
 
  https://github.com/jsevellec/cassandra-unit
 
  One important thing is to either clear out the keyspace in between tests or 
  carefully separate your data so different tests don't collide with each 
  other in the embedded database.
 
  Setup/tear down time is pretty reasonable.
 
  Ben
  
  From: Shahab Yunus [shahab.yu...@gmail.com]
  Sent: Wednesday, June 19, 2013 8:46 AM
  To: user@cassandra.apache.org
  Subject: Re: Unit Testing Cassandra
 
  Thanks Stephen for you reply and explanation. My bad that I mixed those up 
  and wasn't clear enough. Yes, I have different 2 requests/questions.
  1) One is for the unit testing.
  2) Second (in which I am more interested in) is for performance 
  (stress/load) testing. Let us keep integration aside for now.
  I do see some stuff out there but wanted to know recommendations from the 
  community given their experience.
  Regards,
  Shahab
 
  On Wed, Jun 19, 2013 at 3:15 AM, Stephen Connolly 
  stephen.alan.conno...@gmail.com wrote:
 
  Unit testing means testing in isolation the smallest part.
  Unit tests should not take more than a few milliseconds to set up and 
  verify their assertions.
  As such, if your code is not factored well for testing, you would 
  typically use mocking (either by hand, or with mocking libraries) to mock 
  out the bits not under test.
  Extensive use of mocks is usually a smell of code that is not well 
  designed *for testing*
  If you intend to test components integrated together... That is 
  integration testing.
  If you intend to test performance of the whole or significant parts of the 
  whole... That is performance testing.
  When searching for the above, you will not get much luck if you are 
  looking for them in the context of unit testing as those things are 
  *outside the scope of unit testing
 
  On Wednesday, 19 June 2013, Shahab Yunus wrote:
 
  Hello,
 
  Can anyone suggest a good/popular Unit Test tools/frameworks/utilities out
  there for unit testing Cassandra stores? I am looking for testing from 
  performance/load and monitoring perspective. I am using 1.2.
 
  Thanks a lot.
 
  Regards,
  Shahab
 
 
  --
  Sent from my phone
 
  This electronic message contains information which may be confidential or 
  privileged. The information is intended for the use of the individual or 
  entity named above. If you are not the intended recipient, be aware that 
  any disclosure, copying, distribution or use of the contents of this 
  information is prohibited. If you have received this electronic 
  transmission in error, please notify us by e-mail at 
  (postmas...@rapid7.com) immediately.
 



Re: Get fragments of big files (videos)

2013-06-21 Thread aaron morton
You should split the large blobs into multiple rows, and I would use 10MB per 
row as a good rule of thumb. 

See http://www.datastax.com/dev/blog/cassandra-file-system-design for a 
description of blob store in cassandra

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/06/2013, at 8:54 PM, Simon Majou si...@majou.org wrote:

 Thanks Serge
 
 Simon
 
 
 On Thu, Jun 20, 2013 at 10:48 AM, Serge Fonville
 serge.fonvi...@gmail.com wrote:
 Also, after a quick Google.
 
 http://wiki.apache.org/cassandra/CassandraLimitations states values cannot
 exceed 2GB, it also answers you offset question
 
 HTH
 Kind regards/met vriendelijke groet,
 
 Serge Fonville
 
 http://www.sergefonville.nl
 
 Convince Microsoft!
 They need to add TRUNCATE PARTITION in SQL Server
 https://connect.microsoft.com/SQLServer/feedback/details/417926/truncate-partition-of-partitioned-table
 
 
 2013/6/20 Sachin Sinha sinha.sac...@gmail.com
 
 Fragment them in rows, that will help.
 
 
 On 20 June 2013 09:43, Simon Majou si...@majou.org wrote:
 
 Hello,
 
 If I store a video into a column, how can I get a fragment of it
 without having to download it entirely ? Is there a way to give an
 offset on a column ?
 
 Do I have to fragment it over a lot of small fixed sizes columns ? Is
 there any disadvantage to do so ? For example fragment a 10GB file
 into 1 000 columns of 10 MB ?
 
 Simon
 
 
 



Re: Compaction not running

2013-06-21 Thread aaron morton
 Do you think it's worth posting an issue, or not enough traceable evidence ?
If you can reproduce it then certainly file a bug. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/06/2013, at 9:41 PM, Franc Carter franc.car...@sirca.org.au wrote:

 On Thu, Jun 20, 2013 at 7:27 PM, aaron morton aa...@thelastpickle.com wrote:
 nodetool compactionstats, gives
 
 pending tasks: 13120
 If there are no errors in the log, I would say this is a bug. 
 
 This happened after the node ran out of file descriptors, so an edge case 
 wouldn't surprise me.
 
 I've rebuilt the node (blown the data way and am running a nodetool rebuild). 
 Do you think it's worth posting an issue, or not enough traceable evidence ?
 
 cheers
  
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 19/06/2013, at 11:41 AM, Franc Carter franc.car...@sirca.org.au wrote:
 
 On Wed, Jun 19, 2013 at 9:34 AM, Bryan Talbot btal...@aeriagames.com wrote:
 Manual compaction for LCS doesn't really do much.  It certainly doesn't 
 compact all those little files into bigger files.  What makes you think that 
 compactions are not occurring? 
 
 Yeah, that's what I thought, however:-
 
 nodetool compactionstats, gives
 
 pending tasks: 13120
Active compaction remaining time :n/a
 
 when I run nodetool compact in a loop the pending tasks goes down gradually.
 
 This node also has vastly higher latencies (x10) than the other nodes. I saw 
 this with a previous CF than I 'manually compacted', and when the pending 
 tasks reached low numbers (stuck on 9) then latencies were back to low 
 milliseconds
 
 cheers
  
 -Bryan
 
 
 
 On Tue, Jun 18, 2013 at 3:59 PM, Franc Carter franc.car...@sirca.org.au 
 wrote:
 On Sat, Jun 15, 2013 at 11:49 AM, Franc Carter franc.car...@sirca.org.au 
 wrote:
 On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli rc...@eventbrite.com wrote:
 On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter franc.car...@sirca.org.au 
 wrote:
  We are running a test system with Leveled compaction on Cassandra-1.2.4.
  While doing an initial load of the data one of the nodes ran out of file
  descriptors and since then it hasn't been automatically compacting.
 
 You have (at least) two options :
 
 1) increase file descriptors available to Cassandra with ulimit, if possible
 2) increase the size of your sstables with levelled compaction, such
 that you have fewer of them
 
 Oops, I wasn't clear enough.
 
 I have increased the number of file descriptors and no longer have a file 
 descriptor issue. However the node still doesn't compact automatically. If I 
 run a 'nodetool compact' it will do a small amount of compaction and then 
 stop. The Column Family is using LCS
 
 Any ideas on this - compaction is still not automatically running for one of 
 my nodes
 
 thanks
  
 
 cheers
  
 
 =Rob
 
 
 
 -- 
 Franc Carter | Systems architect | Sirca Ltd
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514 
 Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215
 
 
 
 
 -- 
 Franc Carter | Systems architect | Sirca Ltd
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514 
 Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215
 
 
 
 
 
 -- 
 Franc Carter | Systems architect | Sirca Ltd
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514 
 Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215
 
 
 
 
 
 -- 
 Franc Carter | Systems architect | Sirca Ltd
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514 
 Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215
 



Re: Confirm with cqlsh of Cassandra-1.2.5, the behavior of the export/import

2013-06-21 Thread aaron morton
That looks like it may be a bug, can you raise a ticket at 
https://issues.apache.org/jira/browse/CASSANDRA

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/06/2013, at 1:56 AM, hiroshi.kise...@hitachi.com wrote:

 
 Dear everyone.
 
 I'm Hiroshi Kise.
 I will confirm with cqlsh of Cassandra-1.2.5, the behavior of the export / 
 import of data.
 Using the Copy of cqlsh, the data included the “{“ and “[“ (= CollectionType) 
 case,
 I think in the export / import process, data integrity is compromised.
 How about?
 
 Such as the definition of create table, if there is an error in courtesy, 
 please tell me the right way.
 
 
 Concrete operation is as follows.
 -*-*-*-*-*-*-*-*
 (1)map type's export/import
 export
 [root@castor bin]# ./cqlsh
 Connected to Test Cluster at localhost:9160.
 [cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
 Use HELP for help.
 cqlsh create keyspace maptestks with replication  = { 'class' : 
 'SimpleStrategy', 'replication_factor' : '1' };
 cqlsh use maptestks;
 cqlsh:maptestks create table maptestcf (rowkey varchar PRIMARY KEY, 
 targetmap mapvarchar,varchar);
 cqlsh:maptestks insert into maptestcf (rowkey, targetmap) values 
 ('rowkey',{'mapkey':'mapvalue'});
 cqlsh:maptestks select * from maptestcf;
 
 rowkey | targetmap
 +
 rowkey | {mapkey: mapvalue}
 cqlsh:maptestks  copy maptestcf to 'maptestcf-20130619.txt';
 1 rows exported in 0.008 seconds.
 cqlsh:maptestks exit;
 
 [root@castor bin]# cat maptestcf-20130619.txt
 rowkey,{mapkey: mapvalue}
    (a)
 import
 [root@castor bin]# ./cqlsh
 Connected to Test Cluster at localhost:9160.
 [cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
 Use HELP for help.
 cqlsh create keyspace mapimptestks with replication  = { 'class' : 
 'SimpleStrategy', 'replication_factor' : '1' };
 cqlsh use mapimptestks;
 cqlsh:mapimptestks create table mapimptestcf (rowkey varchar PRIMARY KEY, 
 targetmap mapvarchar,varchar);
 
 cqlsh:mapimptestks copy mapimptestcf from ' maptestcf-20130619.txt ';
 Bad Request: line 1:83 no viable alternative at input '}'
 Aborting import at record #0 (line 1). Previously-inserted values still 
 present.
 0 rows imported in 0.025 seconds.
 -*-*-*-*-*-*-*-*
 (2)list type's export/import
 export
 [root@castor bin]#./cqlsh
 Connected to Test Cluster at localhost:9160.
 [cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
 Use HELP for help.
 cqlsh create keyspace listtestks with replication  = { 'class' : 
 'SimpleStrategy', 'replication_factor' : '1' };
 cqlsh use listtestks;
 cqlsh:listtestks create table listtestcf (rowkey varchar PRIMARY KEY, value 
 listvarchar);
 cqlsh:listtestks insert into listtestcf (rowkey,value) values 
 ('rowkey',['value1','value2']);
 cqlsh:listtestks select * from listtestcf;
 
 rowkey | value
 +--
 rowkey | [value1, value2]
 
 cqlsh:listtestks copy listtestcf to 'listtestcf-20130619.txt';
 1 rows exported in 0.014 seconds.
 cqlsh:listtestks exit;
 
 [root@castor bin]# cat listtestcf-20130619.txt
 rowkey,[value1, value2]
    (b)
 export
 [root@castor bin]# ./cqlsh
 Connected to Test Cluster at localhost:9160.
 [cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
 Use HELP for help.
 cqlsh create keyspace listimptestks with replication  = { 'class' : 
 'SimpleStrategy', 'replication_factor' : '1' };
 cqlsh use listimptestks;
 cqlsh:listimptestks create table listimptestcf (rowkey varchar PRIMARY KEY, 
 value listvarchar);
 cqlsh:listimptestks copy listimptestcf from ' listtestcf-20130619.txt ';
 Bad Request: line 1:79 no viable alternative at input ']'
 Aborting import at record #0 (line 1). Previously-inserted values still 
 present.
 0 rows imported in 0.030 seconds.
 -*-*-*-*-*-*-*-*
 Reference: (correct, or error, in another dimension)
 
 Manually, I have rewritten the export file.
 [root@castor bin]# cat nlisttestcf-20130619.txt
 rowkey,['value1',' value2']
 
 
 cqlsh:listimptestks copy listimptestcf from 'nlisttestcf-20130619.txt';
 1 rows imported in 0.035 seconds.
 
 cqlsh:listimptestks select * from implisttestcf;
 rowkey | value
 +--
 rowkey | [value1, value2]
 cqlsh:implisttestks exit;
 
 [root@castor bin]# cat nmaptestcf-20130619.txt
 rowkey,”{'mapkey': 'mapvalue'}”
 
 [root@castor bin]# ./cqlsh
 Connected to Test Cluster at localhost:9160.
 [cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
 Use HELP for help.
 cqlsh use  mapimptestks;
 cqlsh:mapimptestks copy mapimptestcf from 'nmaptestcf-20130619.txt';
 1 rows imported 

Re: block size

2013-06-21 Thread aaron morton
 If I have a data in column of size 500KB, 
 
Also some information here 
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

The data files are memory mapped so it's sort of OS dependant. 

A

-

Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/06/2013, at 8:29 AM, Shahab Yunus shahab.yu...@gmail.com wrote:

 Ok. Though the closest that I can find is this (Aaron Morton's great blog):
 http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
 
 I would also like to know the answer as, as such, I also haven't came across 
 'block size' as a core concept (or a concept to be considered while 
 developing with) Cassandra unlike Hadoop.
 
 Regards,
 Shahab
 
 
 On Thu, Jun 20, 2013 at 3:38 PM, Kanwar Sangha kan...@mavenir.com wrote:
 Yes. Is that not specific to hadoop with CFS ? I want to know that If I have 
 a data in column of size 500KB, how many IOPS are needed to read that ? 
 (assuming we have key cache enabled)
 
  
 
  
 
 From: Shahab Yunus [mailto:shahab.yu...@gmail.com] 
 Sent: 20 June 2013 14:32
 To: user@cassandra.apache.org
 Subject: Re: block size
 
  
 
 Have you seen this?
 
 http://www.datastax.com/dev/blog/cassandra-file-system-design
 
  
 
 Regards,
 Shahab
 
  
 
 On Thu, Jun 20, 2013 at 3:17 PM, Kanwar Sangha kan...@mavenir.com wrote:
 
 Hi – What is the block size for Cassandra ? is it taken from the OS defaults ?
 
  
 
 



Re: Compaction not running

2013-06-21 Thread Franc Carter
On Fri, Jun 21, 2013 at 6:16 PM, aaron morton aa...@thelastpickle.comwrote:

 Do you think it's worth posting an issue, or not enough traceable evidence
 ?

 If you can reproduce it then certainly file a bug.


I'll keep my eye on it to see if it happens again and there is a pattern

cheers



 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 20/06/2013, at 9:41 PM, Franc Carter franc.car...@sirca.org.au wrote:

 On Thu, Jun 20, 2013 at 7:27 PM, aaron morton aa...@thelastpickle.comwrote:

 nodetool compactionstats, gives

 pending tasks: 13120

 If there are no errors in the log, I would say this is a bug.


 This happened after the node ran out of file descriptors, so an edge case
 wouldn't surprise me.

 I've rebuilt the node (blown the data way and am running a nodetool
 rebuild). Do you think it's worth posting an issue, or not enough traceable
 evidence ?

 cheers



 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 19/06/2013, at 11:41 AM, Franc Carter franc.car...@sirca.org.au
 wrote:

 On Wed, Jun 19, 2013 at 9:34 AM, Bryan Talbot btal...@aeriagames.comwrote:

 Manual compaction for LCS doesn't really do much.  It certainly doesn't
 compact all those little files into bigger files.  What makes you think
 that compactions are not occurring?


 Yeah, that's what I thought, however:-

 nodetool compactionstats, gives

 pending tasks: 13120
Active compaction remaining time :n/a

 when I run nodetool compact in a loop the pending tasks goes down
 gradually.

 This node also has vastly higher latencies (x10) than the other nodes. I
 saw this with a previous CF than I 'manually compacted', and when the
 pending tasks reached low numbers (stuck on 9) then latencies were back to
 low milliseconds

 cheers


 -Bryan



 On Tue, Jun 18, 2013 at 3:59 PM, Franc Carter franc.car...@sirca.org.au
  wrote:

 On Sat, Jun 15, 2013 at 11:49 AM, Franc Carter 
 franc.car...@sirca.org.au wrote:

 On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli rc...@eventbrite.comwrote:

 On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter 
 franc.car...@sirca.org.au wrote:
  We are running a test system with Leveled compaction on
 Cassandra-1.2.4.
  While doing an initial load of the data one of the nodes ran out of
 file
  descriptors and since then it hasn't been automatically compacting.

 You have (at least) two options :

 1) increase file descriptors available to Cassandra with ulimit, if
 possible
 2) increase the size of your sstables with levelled compaction, such
 that you have fewer of them


 Oops, I wasn't clear enough.

 I have increased the number of file descriptors and no longer have a
 file descriptor issue. However the node still doesn't compact
 automatically. If I run a 'nodetool compact' it will do a small amount of
 compaction and then stop. The Column Family is using LCS


 Any ideas on this - compaction is still not automatically running for
 one of my nodes

 thanks



 cheers



 =Rob




 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215




 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: nodetool ring showing different 'Load' size

2013-06-21 Thread Rodrigo Felix
Ok. Thank you all you guys.

Att.

*Rodrigo Felix de Almeida*
LSBD - Universidade Federal do Ceará
Project Manager
MBA, CSM, CSPO, SCJP


On Wed, Jun 19, 2013 at 2:26 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jun 19, 2013 at 5:47 AM, Michal Michalski mich...@opera.com
 wrote:
  You can also perform a major compaction via nodetool compact (for
  SizeTieredCompaction), but - again - you really should not do it unless
  you're really sure what you do, as it compacts all the SSTables together,
  which is not something you might want to achieve in most of the cases.

 If you do that and discover you did not want to :

 https://github.com/pcmanus/cassandra/tree/sstable_split

 Will enable you to split your monolithic sstable back into smaller
 sstables.

 =Rob
 PS - @pcmanus, here's that reminder we discussed @ summit to merge
 this tool into upstream! :D



Re: [Cassandra] Replacing a cassandra node

2013-06-21 Thread Eric Stevens
Is there a way to replace a failed server using vnodes?  I only had
occasion to do this once, on a relatively small cluster.  At the time I
just needed to get the new server online and wasn't concerned about the
performance implications, so I just removed the failed server from the
cluster and bootstrapped a new one.  Of course that caused a bunch of key
reassignments, so I'm sure it would be less work for the cluster if I could
bring a new server online with the same vnodes as the failed server.


On Thu, Jun 20, 2013 at 2:40 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Jun 20, 2013 at 10:40 AM, Emalayan Vairavanathan
 svemala...@yahoo.com wrote:
  In the case where replace a cassandra node (call it node A) with another
 one
  that has the exact same IP (ie. during a node failure), what exactly
 should
  we do?  Currently I understand that we should at least run nodetool
  repair.

 If you lost the data from the node, then what you want is replace_token.

 If you didn't lose the data from the node (and can tolerate stale
 reads until the repair completes) you want to start the node with
 auto_bootstrap set to false and then repair.

 =Rob



Re: timeuuid and cql3 query

2013-06-21 Thread Eric Stevens
It's my understanding that if cardinality of the first part of the primary
key has low cardinality, you will struggle with cluster balance as (unless
you use WITH COMPACT STORAGE) the first entry of the primary key equates to
the row key from the traditional interface, thus all entries related to a
single value for the counter column will map to the same partition.

So consider the cardinality of this field, if cardinality is low, you might
need to remodel with PRIMARY KEY (counter, ts, key1) then tack on WITH
COMPACT STORAGE (then the entire primary key becomes the row key, but you
can only have one column which is not part of the primary key)  If
cardinality of counter is high, then you have nothing to worry about.


On Wed, Jun 19, 2013 at 3:16 PM, Francisco Andrades Grassi 
bigjoc...@gmail.com wrote:

 Hi,

 I believe what he's recommending is:

 CREATE TABLE count3 (
   counter text,
   ts timeuuid,
   key1 text,
   value int,
   PRIMARY KEY (counter, ts)
 )

 That way *counter* will be your partitioning key, and all the rows that
 have the same *counter* value will be clustered (stored as a single wide
 row sorted by the *ts* value). In this scenario the query:

  where counter = 'test' and ts  minTimeuuid('2013-06-18 16:23:00') and ts
  minTimeuuid('2013-06-18 16:24:00');

 would actually be a sequential read on a wide row on a single node.

 --
 Francisco Andrades Grassi
 www.bigjocker.com
 @bigjocker

 On Jun 19, 2013, at 12:17 PM, Ryan, Brent br...@cvent.com wrote:

  Tyler,

  You're recommending this schema instead, correct?

  CREATE TABLE count3 (
   counter text,
   ts timeuuid,
   key1 text,
   value int,
   PRIMARY KEY (ts, counter)
 )

  I believe I tried this as well and ran into similar problems but I'll
 try it again.  I'm using the ByteOrderedPartitioner if that helps with
 the latest version of DSE community edition which I believe is Cassandra
 1.2.3.


  Thanks,
 Brent


   From: Tyler Hobbs ty...@datastax.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Wednesday, June 19, 2013 11:00 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: timeuuid and cql3 query


 On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent br...@cvent.com wrote:


  CREATE TABLE count3 (
   counter text,
   ts timeuuid,
   key1 text,
   value int,
   PRIMARY KEY ((counter, ts))
 )


 Instead of doing a composite partition key, remove a set of parens and let
 ts be your clustering key.  That will cause cql rows to be stored in sorted
 order by the ts column (for a given value of counter) and allow you to do
 the kind of query you're looking for.


 --
 Tyler Hobbs
 DataStax http://datastax.com/





Re: Heap is not released and streaming hangs at 0%

2013-06-21 Thread srmore
On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.comwrote:

  nodetool -h localhost flush didn't do much good.

 Do you have 100's of millions of rows ?
 If so see recent discussions about reducing the bloom_filter_fp_chance and
 index_sampling.

Yes, I have 100's of millions of rows.



 If this is an old schema you may be using the very old setting of 0.000744
 which creates a lot of bloom filters.

 bloom_filter_fp_chance value that was changed from default to 0.1, looked
at the filters and they are about 2.5G on disk and I have around 8G of heap.
I will try increasing the value to 0.7 and report my results.

It also appears to be a case of hard GC failure (as Rob mentioned) as the
heap is never released, even after 24+ hours of idle time, the JVM needs to
be restarted to reclaim the heap.

Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote:

 If you want, you can try to force the GC through Jconsole. Memory-Perform
 GC.

 It theoretically triggers a full GC and when it will happen depends on the
 JVM

 -Wei

 --
 *From: *Robert Coli rc...@eventbrite.com
 *To: *user@cassandra.apache.org
 *Sent: *Tuesday, June 18, 2013 10:43:13 AM
 *Subject: *Re: Heap is not released and streaming hangs at 0%

 On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote:
  But then shouldn't JVM C G it eventually ? I can still see Cassandra
 alive
  and kicking but looks like the heap is locked up even after the traffic
 is
  long stopped.

 No, when GC system fails this hard it is often a permanent failure
 which requires a restart of the JVM.

  nodetool -h localhost flush didn't do much good.

 This adds support to the idea that your heap is too full, and not full
 of memtables.

 You could try nodetool -h localhost invalidatekeycache, but that
 probably will not free enough memory to help you.

 =Rob





Re: timeuuid and cql3 query

2013-06-21 Thread Ryan, Brent
Yes.  The problem is that I can't use counter as the partition key otherwise 
I'd wind up with hot spots in my cluster where majority of the data is being 
written to single node in the cluster.  The only real way around this problem 
with Cassandra is to follow along with what this blog does:

http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra


From: Eric Stevens migh...@gmail.commailto:migh...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, June 21, 2013 8:38 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: timeuuid and cql3 query

It's my understanding that if cardinality of the first part of the primary key 
has low cardinality, you will struggle with cluster balance as (unless you use 
WITH COMPACT STORAGE) the first entry of the primary key equates to the row key 
from the traditional interface, thus all entries related to a single value for 
the counter column will map to the same partition.

So consider the cardinality of this field, if cardinality is low, you might 
need to remodel with PRIMARY KEY (counter, ts, key1) then tack on WITH COMPACT 
STORAGE (then the entire primary key becomes the row key, but you can only have 
one column which is not part of the primary key)  If cardinality of counter 
is high, then you have nothing to worry about.


On Wed, Jun 19, 2013 at 3:16 PM, Francisco Andrades Grassi 
bigjoc...@gmail.commailto:bigjoc...@gmail.com wrote:
Hi,

I believe what he's recommending is:

CREATE TABLE count3 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY (counter, ts)
)

That way counter will be your partitioning key, and all the rows that have the 
same counter value will be clustered (stored as a single wide row sorted by the 
ts value). In this scenario the query:

 where counter = 'test' and ts  minTimeuuid('2013-06-18 16:23:00') and ts  
minTimeuuid('2013-06-18 16:24:00');

would actually be a sequential read on a wide row on a single node.

--
Francisco Andrades Grassi
www.bigjocker.comhttp://www.bigjocker.com/
@bigjocker

On Jun 19, 2013, at 12:17 PM, Ryan, Brent 
br...@cvent.commailto:br...@cvent.com wrote:

Tyler,

You're recommending this schema instead, correct?

CREATE TABLE count3 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY (ts, counter)
)

I believe I tried this as well and ran into similar problems but I'll try it 
again.  I'm using the ByteOrderedPartitioner if that helps with the latest 
version of DSE community edition which I believe is Cassandra 1.2.3.


Thanks,
Brent


From: Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, June 19, 2013 11:00 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: timeuuid and cql3 query


On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent 
br...@cvent.commailto:br...@cvent.com wrote:

CREATE TABLE count3 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY ((counter, ts))
)

Instead of doing a composite partition key, remove a set of parens and let ts 
be your clustering key.  That will cause cql rows to be stored in sorted order 
by the ts column (for a given value of counter) and allow you to do the kind 
of query you're looking for.


--
Tyler Hobbs
DataStaxhttp://datastax.com/




NREL has released open source Databus on github for time series data

2013-06-21 Thread Hiller, Dean
NREL has released their open source databus.  They spin it as energy data (and 
a system for campus energy/building energy) but it is very general right now 
and probably will stay pretty general.  More information can be found here

http://www.nrel.gov/analysis/databus/

The source code can be found here
https://github.com/deanhiller/databus

Star the project if you like the idea.  NREL just did a big press release and 
is developing a community around the project.  It is in it's early stages but 
there are users using it and I am helping HP set an instance up this month.  If 
you want to become a committer on the project, let me know as well.

Later,
Dean



Cassandra terminates with OutOfMemory (OOM) error

2013-06-21 Thread Mohammed Guller
We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 
1.2.2 and have 8GB memory. We didn't change any of the default heap or GC 
settings. So each node is allocating 1.8GB of heap space. The rows are wide; 
each row stores around 260,000 columns. We are reading the data using Astyanax. 
If our application tries to read 80,000 columns each from 10 or more rows at 
the same time, some of the nodes run out of heap space and terminate with OOM 
error. Here is the error message:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
at 
org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
at org.apache.cassandra.db.Table.getRow(Table.java:355)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
   at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.lang.Long.toString(Long.java:269)
at java.lang.Long.toString(Long.java:764)
at 
org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171)
at 
org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068)
at 
org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192)
at 
org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766)
at 
org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

The data in each column is less than 50 bytes. After adding all the column 
overheads (column name + metadata), it should not be more than 100 bytes. So 
reading 80,000 columns from 10 rows each means that we are reading 80,000 * 10 
* 100 = 80 MB of data. It is large, but not large enough to fill up the 1.8 GB 
heap. So I wonder why the heap is getting full. If the data request is too big 
to fill in a reasonable amount of time, I would expect Cassandra to return a 
TimeOutException instead of terminating.

One easy solution is to increase the heapsize. However that means Cassandra can 
still crash if someone reads 100 rows.  I wonder if there some other Cassandra 
setting that I can tweak to prevent the OOM exception?

Thanks,
Mohammed


Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-21 Thread Jabbar Azam
Hello Mohammed,

You should increase the heap space. You should also tune the garbage
collection so young generation objects are collected faster, relieving
pressure on heap We have been using jdk 7 and it uses G1 as the default
collector. It does a better job than me trying to optimise the JDK 6 GC
collectors.

Bear in mind though that the OS will need memory, so will the row cache and
the filing system. Although memory usage will depend on the workload of
your system.

I'm sure you'll also get good advice from other members of the mailing list.

Thanks

Jabbar Azam


On 21 June 2013 18:49, Mohammed Guller moham...@glassbeam.com wrote:

  We have a 3-node cassandra cluster on AWS. These nodes are running
 cassandra 1.2.2 and have 8GB memory. We didn't change any of the default
 heap or GC settings. So each node is allocating 1.8GB of heap space. The
 rows are wide; each row stores around 260,000 columns. We are reading the
 data using Astyanax. If our application tries to read 80,000 columns each
 from 10 or more rows at the same time, some of the nodes run out of heap
 space and terminate with OOM error. Here is the error message:

 ** **

 java.lang.OutOfMemoryError: Java heap space

 at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
 

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
 

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
 

 at
 org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
 

 at
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
 

 at
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
 

 at
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
 

 at
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
 

 at
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
 

 at org.apache.cassandra.db.Table.getRow(Table.java:355)

 at
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
 

at
 org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
 

 at
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
 

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 

 at java.lang.Thread.run(Thread.java:722)

 ** **

 ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main]

 java.lang.OutOfMemoryError: Java heap space

 at java.lang.Long.toString(Long.java:269)

 at java.lang.Long.toString(Long.java:764)

 at
 org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171)
 

 at
 org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068)
 

 at
 org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192)
 

 at
 org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766)
 

 at
 org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754)
 

 at
 org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)

 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
 

 at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
 

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 

 at java.lang.Thread.run(Thread.java:722)

 ** **

 The data in each column is less than 50 bytes. After adding all the column
 overheads (column name + metadata), it should not be more than 100 bytes.
 So reading 80,000 columns from 10 rows each means that we are reading
 80,000 * 10 * 100 = 80 MB of data. It is large, but not large enough to
 fill up the 1.8 GB heap. So I wonder why the heap is getting full. If the
 data request is too big to fill 

Cassandra driver performance question...

2013-06-21 Thread Tony Anecito
Hi All,
I am using jdbc driver and noticed that if I run the same query twice the 
second time it is much faster.
I setup the row cache and column family cache and it not seem to make a 
difference.

I am wondering how to setup cassandra such that the first query is always as 
fast as the second one. The second one was 1.8msec and the first 
28msec for the same exact paremeters. I am using preparestatement.

Thanks!

Re: Cassandra driver performance question...

2013-06-21 Thread Jabbar Azam
Hello Tony,

I would guess that the first queries data  is put into the row cache and
the filesystem cache. The second query gets the data from the row cache and
or the filesystem cache so it'll be faster.

If you want to make it consistently faster having a key cache will
definitely help. The following advice from Aaron Morton will also help

You can also see what it looks like from the server side.

nodetool proxyhistograms will show you full request latency recorded
by the coordinator.
nodetool cfhistograms will show you the local read latency, this is
just the time it takes
to read data on a replica and does not include network or wait times.

If the proxyhistograms is showing most requests running faster than
your app says it's your
app.


http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E



Thanks

Jabbar Azam


On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote:

 Hi All,
 I am using jdbc driver and noticed that if I run the same query twice the
 second time it is much faster.
 I setup the row cache and column family cache and it not seem to make a
 difference.

 I am wondering how to setup cassandra such that the first query is always
 as fast as the second one. The second one was 1.8msec and the first 28msec
 for the same exact paremeters. I am using preparestatement.

 Thanks!



Re: [Cassandra] Replacing a cassandra node with one of the same IP

2013-06-21 Thread Mahony, Robin
Please note that I am currently using version 1.2.2 of Cassandra.  Also we are 
using virtual nodes.

My question mainly stems from the fact that the nodes appear to be aware that 
the node uuid changes for the IP (from reading the logs), so I am just 
wondering if this means the hinted handoffs are also updated to reflect the new 
Cassandra node uuid. If that was the case, I would not think a nodetool cleanup 
would be necessary.

- Forwarded Message -
From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org; Emalayan 
Vairavanathan svemala...@yahoo.commailto:svemala...@yahoo.com
Sent: Thursday, 20 June 2013 11:40 AM
Subject: Re: [Cassandra] Replacing a cassandra node

On Thu, Jun 20, 2013 at 10:40 AM, Emalayan Vairavanathan
svemala...@yahoo.commailto:svemala...@yahoo.com wrote:
 In the case where replace a cassandra node (call it node A) with another one
 that has the exact same IP (ie. during a node failure), what exactly should
 we do?  Currently I understand that we should at least run nodetool
 repair.

If you lost the data from the node, then what you want is replace_token.

If you didn't lose the data from the node (and can tolerate stale
reads until the repair completes) you want to start the node with
auto_bootstrap set to false and then repair.

=Rob


crashed while running repair

2013-06-21 Thread Franc Carter
Hi,

I am experimenting with Cassandra-1.2.4, and got a crash while running
repair. The nodes has 24GB of ram with an 8GB heap. Any ideas on my I may
have missed in the config ? Log is below

ERROR [Thread-136019] 2013-06-22 06:30:05,861 CassandraDaemon.java (line
174) Exception in thread Thread[Thread-136019,5,main]
FSReadError in
/var/lib/cassandra/data/cut3/Price/cut3-Price-ib-44369-Index.db
at
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:200)
at
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168)
at
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:340)
at
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319)
at
org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:194)
at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
at
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238)
at
org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748)
at
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:192)
... 8 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
... 9 more
ERROR [Thread-136019] 2013-06-22 06:30:05,865 FileUtils.java (line 375)
Stopping gossiper


thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: Heap is not released and streaming hangs at 0%

2013-06-21 Thread Bryan Talbot
bloom_filter_fp_chance = 0.7 is probably way too large to be effective and
you'll probably have issues compacting deleted rows and get poor read
performance with a value that high.  I'd guess that anything larger than
0.1 might as well be 1.0.

-Bryan



On Fri, Jun 21, 2013 at 5:58 AM, srmore comom...@gmail.com wrote:


 On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.comwrote:

  nodetool -h localhost flush didn't do much good.

 Do you have 100's of millions of rows ?
 If so see recent discussions about reducing the bloom_filter_fp_chance
 and index_sampling.

 Yes, I have 100's of millions of rows.



 If this is an old schema you may be using the very old setting of
 0.000744 which creates a lot of bloom filters.

 bloom_filter_fp_chance value that was changed from default to 0.1, looked
 at the filters and they are about 2.5G on disk and I have around 8G of heap.
 I will try increasing the value to 0.7 and report my results.

 It also appears to be a case of hard GC failure (as Rob mentioned) as the
 heap is never released, even after 24+ hours of idle time, the JVM needs to
 be restarted to reclaim the heap.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote:

 If you want, you can try to force the GC through Jconsole.
 Memory-Perform GC.

 It theoretically triggers a full GC and when it will happen depends on
 the JVM

 -Wei

 --
 *From: *Robert Coli rc...@eventbrite.com
 *To: *user@cassandra.apache.org
 *Sent: *Tuesday, June 18, 2013 10:43:13 AM
 *Subject: *Re: Heap is not released and streaming hangs at 0%

 On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote:
  But then shouldn't JVM C G it eventually ? I can still see Cassandra
 alive
  and kicking but looks like the heap is locked up even after the traffic
 is
  long stopped.

 No, when GC system fails this hard it is often a permanent failure
 which requires a restart of the JVM.

  nodetool -h localhost flush didn't do much good.

 This adds support to the idea that your heap is too full, and not full
 of memtables.

 You could try nodetool -h localhost invalidatekeycache, but that
 probably will not free enough memory to help you.

 =Rob






Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-21 Thread Andrew Bialecki
We're potentially considering increasing the size of our sstables for some
column families from 10MB to something larger.

In test, we've been trying to verify that the sstable file sizes change and
then doing a bit of benchmarking. However when we run alter the column
family and then run nodetool upgradesstables -a keyspace columnfamily,
the files in the data directory have been re-written, but the file sizes
are the same.

Is this the expected behavior? If not, what's the right way to upgrade
them. If this is expected, how can we benchmark the read/write performance
with varying sstable sizes.

Thanks in advance!

Andrew


Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-21 Thread Robert Coli
On Fri, Jun 21, 2013 at 4:40 PM, Andrew Bialecki
andrew.biale...@gmail.com wrote:
 However when we run alter the column
 family and then run nodetool upgradesstables -a keyspace columnfamily, the
 files in the data directory have been re-written, but the file sizes are the
 same.

 Is this the expected behavior? If not, what's the right way to upgrade them.
 If this is expected, how can we benchmark the read/write performance with
 varying sstable sizes.

It is expected, upgradesstables/scrub/clean compactions work on a
single sstable at a time, they are not capable of combining or
splitting them.

In theory you could probably :

1) start out with the largest size you want to test
2) stop your node
3) use sstable_split [1] to split sstables
4) start node, test
5) repeat 2-4

I am not sure if there is anything about level compaction which makes
this infeasible.

=Rob
[1] https://github.com/pcmanus/cassandra/tree/sstable_split


Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-21 Thread Wei Zhu
I think the new SSTable will be in the new size. In order to do that, you need 
to trigger a compaction so that the new SSTables will be generated. for LCS, 
there is no major compaction though. You can run a nodetool repair and 
hopefully you will bring some new SSTables and compactions will kick in. 
Or you can change the $CFName.json file under your data directory and move 
every SSTable to level 0. You need to stop your node, write a simple script to 
alter that file and start the node again. 

I think it will be helpful to have a nodetool command to change the SSTable 
Size and trigger the rebuild of the SSTables. 

Thanks. 
-Wei 

- Original Message -

From: Robert Coli rc...@eventbrite.com 
To: user@cassandra.apache.org 
Sent: Friday, June 21, 2013 4:51:29 PM 
Subject: Re: Updated sstable size for LCS, ran upgradesstables, file sizes 
didn't change 

On Fri, Jun 21, 2013 at 4:40 PM, Andrew Bialecki 
andrew.biale...@gmail.com wrote: 
 However when we run alter the column 
 family and then run nodetool upgradesstables -a keyspace columnfamily, the 
 files in the data directory have been re-written, but the file sizes are the 
 same. 
 
 Is this the expected behavior? If not, what's the right way to upgrade them. 
 If this is expected, how can we benchmark the read/write performance with 
 varying sstable sizes. 

It is expected, upgradesstables/scrub/clean compactions work on a 
single sstable at a time, they are not capable of combining or 
splitting them. 

In theory you could probably : 

1) start out with the largest size you want to test 
2) stop your node 
3) use sstable_split [1] to split sstables 
4) start node, test 
5) repeat 2-4 

I am not sure if there is anything about level compaction which makes 
this infeasible. 

=Rob 
[1] https://github.com/pcmanus/cassandra/tree/sstable_split 



Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-21 Thread sankalp kohli
I think you can remove the json file which stores the mapping of which
sstable is in which level. This will be treated by cassandra as all
sstables in level 0 which will trigger a compaction. But if you have lot of
data, it will be very slow as you will keep compacting data between L1 and
L0.
This also happens when you write very fast and have a pile up in L0.  A
comment from the code will explain this what I am saying
// LevelDB gives each level a score of how much data it contains vs its
ideal amount, and
// compacts the level with the highest score. But this falls apart
spectacularly once you
// get behind.  Consider this set of levels:
// L0: 988 [ideal: 4]
// L1: 117 [ideal: 10]
// L2: 12  [ideal: 100]
//
// The problem is that L0 has a much higher score (almost 250) than
L1 (11), so what we'll
// do is compact a batch of MAX_COMPACTING_L0 sstables with all 117
L1 sstables, and put the
// result (say, 120 sstables) in L1. Then we'll compact the next
batch of MAX_COMPACTING_L0,
// and so forth.  So we spend most of our i/o rewriting the L1 data
with each batch.
//
// If we could just do *all* L0 a single time with L1, that would
be ideal.  But we can't
// -- see the javadoc for MAX_COMPACTING_L0.
//
// LevelDB's way around this is to simply block writes if L0
compaction falls behind.
// We don't have that luxury.
//
// So instead, we
// 1) force compacting higher levels first, which minimizes the i/o
needed to compact
//optimially which gives us a long term win, and
// 2) if L0 falls behind, we will size-tiered compact it to reduce
read overhead until
//we can catch up on the higher levels.
//
// This isn't a magic wand -- if you are consistently writing too
fast for LCS to keep
// up, you're still screwed.  But if instead you have intermittent
bursts of activity,
// it can help a lot.


On Fri, Jun 21, 2013 at 5:42 PM, Wei Zhu wz1...@yahoo.com wrote:

 I think the new SSTable will be in the new size. In order to do that, you
 need to trigger a compaction so that the new SSTables will be generated.
 for LCS, there is no major compaction though. You can run a nodetool repair
 and hopefully you will bring some new SSTables and compactions will kick in.
 Or you can change the $CFName.json file under your data directory and move
 every SSTable to level 0. You need to stop your node,  write a simple
 script to alter that file and start the node again.

 I think it will be helpful to have a nodetool command to change the
 SSTable Size and trigger the rebuild of the SSTables.

 Thanks.
 -Wei

 --
 *From: *Robert Coli rc...@eventbrite.com
 *To: *user@cassandra.apache.org
 *Sent: *Friday, June 21, 2013 4:51:29 PM
 *Subject: *Re: Updated sstable size for LCS, ran upgradesstables, file
 sizes didn't change


 On Fri, Jun 21, 2013 at 4:40 PM, Andrew Bialecki
 andrew.biale...@gmail.com wrote:
  However when we run alter the column
  family and then run nodetool upgradesstables -a keyspace columnfamily,
 the
  files in the data directory have been re-written, but the file sizes are
 the
  same.
 
  Is this the expected behavior? If not, what's the right way to upgrade
 them.
  If this is expected, how can we benchmark the read/write performance with
  varying sstable sizes.

 It is expected, upgradesstables/scrub/clean compactions work on a
 single sstable at a time, they are not capable of combining or
 splitting them.

 In theory you could probably :

 1) start out with the largest size you want to test
 2) stop your node
 3) use sstable_split [1] to split sstables
 4) start node, test
 5) repeat 2-4

 I am not sure if there is anything about level compaction which makes
 this infeasible.

 =Rob
 [1] https://github.com/pcmanus/cassandra/tree/sstable_split




Re: Heap is not released and streaming hangs at 0%

2013-06-21 Thread sankalp kohli
I will take a heap dump and see whats in there rather than guessing.


On Fri, Jun 21, 2013 at 4:12 PM, Bryan Talbot btal...@aeriagames.comwrote:

 bloom_filter_fp_chance = 0.7 is probably way too large to be effective and
 you'll probably have issues compacting deleted rows and get poor read
 performance with a value that high.  I'd guess that anything larger than
 0.1 might as well be 1.0.

 -Bryan



 On Fri, Jun 21, 2013 at 5:58 AM, srmore comom...@gmail.com wrote:


 On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.comwrote:

  nodetool -h localhost flush didn't do much good.

 Do you have 100's of millions of rows ?
 If so see recent discussions about reducing the bloom_filter_fp_chance
 and index_sampling.

 Yes, I have 100's of millions of rows.



 If this is an old schema you may be using the very old setting of
 0.000744 which creates a lot of bloom filters.

 bloom_filter_fp_chance value that was changed from default to 0.1,
 looked at the filters and they are about 2.5G on disk and I have around 8G
 of heap.
 I will try increasing the value to 0.7 and report my results.

 It also appears to be a case of hard GC failure (as Rob mentioned) as the
 heap is never released, even after 24+ hours of idle time, the JVM needs to
 be restarted to reclaim the heap.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote:

 If you want, you can try to force the GC through Jconsole.
 Memory-Perform GC.

 It theoretically triggers a full GC and when it will happen depends on
 the JVM

 -Wei

 --
 *From: *Robert Coli rc...@eventbrite.com
 *To: *user@cassandra.apache.org
 *Sent: *Tuesday, June 18, 2013 10:43:13 AM
 *Subject: *Re: Heap is not released and streaming hangs at 0%

 On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote:
  But then shouldn't JVM C G it eventually ? I can still see Cassandra
 alive
  and kicking but looks like the heap is locked up even after the
 traffic is
  long stopped.

 No, when GC system fails this hard it is often a permanent failure
 which requires a restart of the JVM.

  nodetool -h localhost flush didn't do much good.

 This adds support to the idea that your heap is too full, and not full
 of memtables.

 You could try nodetool -h localhost invalidatekeycache, but that
 probably will not free enough memory to help you.

 =Rob







Re: crashed while running repair

2013-06-21 Thread sankalp kohli
Looks like memory map failed. In a 64 bit system, you should have unlimited
virtual memory but Linux has a limit on the number of maps. Looks at these
two places.

http://stackoverflow.com/questions/8892143/error-when-opening-a-lucene-index-map-failed
https://blog.kumina.nl/2011/04/cassandra-java-io-ioerror-java-io-ioexception-map-failed/




On Fri, Jun 21, 2013 at 3:22 PM, Franc Carter franc.car...@sirca.org.auwrote:


 Hi,

 I am experimenting with Cassandra-1.2.4, and got a crash while running
 repair. The nodes has 24GB of ram with an 8GB heap. Any ideas on my I may
 have missed in the config ? Log is below

 ERROR [Thread-136019] 2013-06-22 06:30:05,861 CassandraDaemon.java (line
 174) Exception in thread Thread[Thread-136019,5,main]
 FSReadError in
 /var/lib/cassandra/data/cut3/Price/cut3-Price-ib-44369-Index.db
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:200)
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:340)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319)
 at
 org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:194)
 at
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
 at
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238)
 at
 org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178)
 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
 Caused by: java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748)
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:192)
 ... 8 more
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
 ... 9 more
 ERROR [Thread-136019] 2013-06-22 06:30:05,865 FileUtils.java (line 375)
 Stopping gossiper


 thanks

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-21 Thread sankalp kohli
Looks like you are putting lot of pressure on the heap by doing a slice
query on a large row.
Do you have lot of deletes/tombstone on the rows? That might be causing a
problem.
Also why are you returning so many columns as once, you can use auto
paginate feature in Astyanax.

Also do you see lot of GC happening?


On Fri, Jun 21, 2013 at 1:13 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello Mohammed,

 You should increase the heap space. You should also tune the garbage
 collection so young generation objects are collected faster, relieving
 pressure on heap We have been using jdk 7 and it uses G1 as the default
 collector. It does a better job than me trying to optimise the JDK 6 GC
 collectors.

 Bear in mind though that the OS will need memory, so will the row cache
 and the filing system. Although memory usage will depend on the workload of
 your system.

 I'm sure you'll also get good advice from other members of the mailing
 list.

 Thanks

 Jabbar Azam


 On 21 June 2013 18:49, Mohammed Guller moham...@glassbeam.com wrote:

  We have a 3-node cassandra cluster on AWS. These nodes are running
 cassandra 1.2.2 and have 8GB memory. We didn't change any of the default
 heap or GC settings. So each node is allocating 1.8GB of heap space. The
 rows are wide; each row stores around 260,000 columns. We are reading the
 data using Astyanax. If our application tries to read 80,000 columns each
 from 10 or more rows at the same time, some of the nodes run out of heap
 space and terminate with OOM error. Here is the error message:

 ** **

 java.lang.OutOfMemoryError: Java heap space

 at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
 

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
 

 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
 

 at
 org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
 

 at
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
 

 at
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
 

 at
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
 

 at
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
 

 at
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
 

 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
 

 at org.apache.cassandra.db.Table.getRow(Table.java:355)

 at
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
 

at
 org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
 

 at
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
 

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 

 at java.lang.Thread.run(Thread.java:722)

 ** **

 ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main]

 java.lang.OutOfMemoryError: Java heap space

 at java.lang.Long.toString(Long.java:269)

 at java.lang.Long.toString(Long.java:764)

 at
 org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171)
 

 at
 org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068)
 

 at
 org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192)
 

 at
 org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766)
 

 at
 org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754)
 

 at
 org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)

 at
 org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)

 at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
 

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 

 at java.lang.Thread.run(Thread.java:722)

 ** **

 The 

Re: Cassandra driver performance question...

2013-06-21 Thread Tony Anecito
Thanks Jabbar,
 
I ran nodetool as suggested and it 0 latency for the row count I have.
 
I also ran cli list command for the table hit by my JDBC perparedStatement and 
it was slow like 121msecs the first time I ran it and second time I ran it it 
was 40msecs versus jdbc call of 38msecs to start with unless I run it twice 
also and get 1.5-2.5msecs for executeQuery the second time the 
preparedStatement is called.
 
I ran describe from cli for the table and it said caching is ALL which is 
correct.
 
A real mystery and I need to understand better what is going on.
 
Regards,
-Tony

From: Jabbar Azam aja...@gmail.com
To: user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com 
Sent: Friday, June 21, 2013 3:32 PM
Subject: Re: Cassandra driver performance question...



Hello Tony, 

I would guess that the first queries data  is put into the row cache and the 
filesystem cache. The second query gets the data from the row cache and or the 
filesystem cache so it'll be faster.

If you want to make it consistently faster having a key cache will definitely 
help. The following advice from Aaron Morton will also help 
You can also see what it looks like from the server side. 

nodetool proxyhistograms will show you full request latency recorded by the 
coordinator. 
nodetool cfhistograms will show you the local read latency, this is just the 
time it takes
to read data on a replica and does not include network or wait times. 

If the proxyhistograms is showing most requests running faster than your app 
says it's your
app.

http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E



Thanks

Jabbar Azam



On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote:

Hi All,
I am using jdbc driver and noticed that if I run the same query twice the 
second time it is much faster.
I setup the row cache and column family cache and it not seem to make a 
difference.


I am wondering how to setup cassandra such that the first query is always as 
fast as the second one. The second one was 1.8msec and the first 28msec for 
the same exact paremeters. I am using preparestatement.


Thanks!

Re: Cassandra driver performance question...

2013-06-21 Thread Tony Anecito
Hi Jabbar,
 
I think I know what is going on. I happened accross a change mentioned by the 
jdbc driver developers regarding metadata caching. Seems the metadata caching 
was moved from the connection object to the preparedStatement object. So I am 
wondering if the time difference I am seeing on the second preparedStatement 
object is because of the Metadata is cached then.
 
So my question is how to test this theory? Is there a way to stop the metadata 
from coming accross from Cassandra? A 20x performance improvement would be nice 
to have.
 
Thanks,
-Tony

From: Tony Anecito adanec...@yahoo.com
To: user@cassandra.apache.org user@cassandra.apache.org 
Sent: Friday, June 21, 2013 8:56 PM
Subject: Re: Cassandra driver performance question...



Thanks Jabbar,
 
I ran nodetool as suggested and it 0 latency for the row count I have.
 
I also ran cli list command for the table hit by my JDBC perparedStatement and 
it was slow like 121msecs the first time I ran it and second time I ran it it 
was 40msecs versus jdbc call of 38msecs to start with unless I run it twice 
also and get 1.5-2.5msecs for executeQuery the second time the 
preparedStatement is called.
 
I ran describe from cli for the table and it said caching is ALL which is 
correct.
 
A real mystery and I need to understand better what is going on.
 
Regards,
-Tony

From: Jabbar Azam aja...@gmail.com
To: user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com 
Sent: Friday, June 21, 2013 3:32 PM
Subject: Re: Cassandra driver performance question...



Hello Tony, 

I would guess that the first queries data  is put into the row cache and the 
filesystem cache. The second query gets the data from the row cache and or the 
filesystem cache so it'll be faster.

If you want to make it consistently faster having a key cache will definitely 
help. The following advice from Aaron Morton will also help 
You can also see what it looks like from the server side. 

nodetool proxyhistograms will show you full request latency recorded by the 
coordinator. 
nodetool cfhistograms will show you the local read latency, this is just the 
time it takes
to read data on a replica and does not include network or wait times. 

If the proxyhistograms is showing most requests running faster than your app 
says it's your
app.

http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E



Thanks

Jabbar Azam



On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote:

Hi All,
I am using jdbc driver and noticed that if I run the same query twice the 
second time it is much faster.
I setup the row cache and column family cache and it not seem to make a 
difference.


I am wondering how to setup cassandra such that the first query is always as 
fast as the second one. The second one was 1.8msec and the first 28msec for 
the same exact paremeters. I am using preparestatement.


Thanks!