Custom data types and dynamic tables

2015-03-24 Thread Anishek Agarwal
Hello,

If i have a custom type EventDefinition and i create a table like


create table TestTable {
user_id long,
ts timestamp,
definition 'com.anishek.EventDefinition',
Primary Key (user_id, ts))
with clustering order by (ts desc) and compression={'sstable_compression' :
'SnappyCompressor'}
and compaction = {'class': 'DateTieredCompactionStrategy',
'base_time_seconds':'3600', 'max_sstable_age_days':'30'};


then how would the data stored internally, based on
http://www.datastax.com/dev/blog/thrift-to-cql3
Dynamic Column Family  section would the data be as below given
EventDefinition results in storing the following string representation
code, eventName

Row : Columns

1  :  2015-03-02 14:33:14+=12,a
   2015-03-02 14:34:14+=11,b
   2015-03-02 14:35:14+=15,e
   2015-03-02 14:36:14+=17,c
   2015-03-02 14:37:14+=1,d
2   : 2015-03-02 14:33:14+=12,a
   2015-03-02 14:34:14+=11,b
   2015-03-02 14:35:14+=15,e
   2015-03-02 14:36:14+=17,c
   2015-03-02 14:37:14+=1,d

Is the above correct ?

We will be getting above events per day for a user and new users keep
getting added to the system. We are presently assuming that we might have
about 30 events at max for a given user.

Given we will add data like

insert into TestTable(user_id,ts, definition) values(a, 2015-03-02
12:30:56, 1,s) using ttl [30 days];

I am assuming that date tired compaction will not be very effective if the
timestamp is not in the same timezone across entries.

thanks
Anishek


Re: Disastrous profusion of SSTables

2015-03-26 Thread Anishek Agarwal
Are you frequently updating same rows ? What is the memtable flush size ?
can you post the table create query here in please.

On Thu, Mar 26, 2015 at 1:21 PM, Dave Galbraith david92galbra...@gmail.com
wrote:

 Hey! So I'm running Cassandra 2.1.2 and using the
 SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single
 node. My read performance is terrible, all my queries just time out. So I
 do nodetool cfstats:

 Read Count: 42071
 Read Latency: 67.47804242827601 ms.
 Write Count: 131964300
 Write Latency: 0.011721604274792501 ms.
 Pending Flushes: 0
 Table: metrics16513
 SSTable count: 641
 Space used (live): 6366740812
 Space used (total): 6366740812
 Space used by snapshots (total): 0
 SSTable Compression Ratio: 0.25272488401992765
 Memtable cell count: 0
 Memtable data size: 0
 Memtable switch count: 1016
 Local read count: 42071
 Local read latency: 67.479 ms
 Local write count: 131964300
 Local write latency: 0.012 ms
 Pending flushes: 0
 Bloom filter false positives: 994
 Bloom filter false ratio: 0.0
 Bloom filter space used: 37840376
 Compacted partition minimum bytes: 104
 Compacted partition maximum bytes: 24601
 Compacted partition mean bytes: 255
 Average live cells per slice (last five minutes):
 111.67243951154147
 Maximum live cells per slice (last five minutes): 1588.0
 Average tombstones per slice (last five minutes): 0.0
 Maximum tombstones per slice (last five minutes): 0.0

 and nodetool cfhistograms:

 Percentile  SSTables Write Latency  Read LatencyPartition
 SizeCell Count
   (micros)  (micros)
 (bytes)
 50%46.00  6.99 154844.95
 149 1
 75%   430.00  8.533518837.53
 179 1
 95%   430.00 11.327252897.25
 215 2
 98%   430.00 15.54   22103886.34
 215 3
 99%   430.00 29.86   22290608.19
 159750
 Min 0.00  1.66 26.91
 104 0
 Max   430.00 269795.38   27311364.89
 24601   924

 Gross!! There are 641 SSTables in there, and all my reads are hitting
 hundreds of them and timing out. How could this possibly have happened, and
 what can I do about it? Nodetool compactionstats says pending tasks: 0, by
 the way. Thanks!



Re: Replication to second data center with different number of nodes

2015-03-29 Thread Anishek Agarwal
Colin,

When you said larger number of tokens has Query performance hit, is it read
or write performance. Also if you have any links you could share to shed
some light on this it would be great.

Thanks
Anishek

On Sun, Mar 29, 2015 at 2:20 AM, Colin Clark co...@clark.ws wrote:

 I typically use a # a lot lower than 256, usually less than 20 for
 num_tokens as a larger number has historically had a dramatic impact on
 query performance.
 —
 Colin Clark
 co...@clark.ws
 +1 612-859-6129
 skype colin.p.clark

 On Mar 28, 2015, at 3:46 PM, Eric Stevens migh...@gmail.com wrote:

 If you're curious about how Cassandra knows how to replicate data in the
 remote DC, it's the same as in the local DC, replication is independent in
 each, and you can even set a different replication strategy per keyspace
 per datacenter.  Nodes in each DC take up num_tokens positions on a ring,
 each partition key is mapped to a position on that ring, and whomever owns
 that part of the ring is the primary for that data.  Then (oversimplified)
 r-1 adjacent nodes become replicas for that same data.

 On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles 
 charles.sibb...@bskyb.com wrote:


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens

  So go with a default 256, and leave initial token empty:

  num_tokens: 256

 # initial_token:


  Cassandra will always give each node the same number of tokens, the
 only time you might want to distribute this is if your instances are of
 different sizing/capability which is also a bad scenario.

   From: Björn Hachmann bjoern.hachm...@metrigo.de
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Friday, 27 March 2015 12:11
 To: user user@cassandra.apache.org
 Subject: Re: Replication to second data center with different number of
 nodes


 2015-03-27 11:58 GMT+01:00 Sibbald, Charles charles.sibb...@bskyb.com:

 Cassandra’s Vnodes config


 ​Thank you. Yes, we are using vnodes! The num_token parameter controls
 the number of vnodes assigned to a specific node.​

  Might be I am seeing problems where are none.

  Let me rephrase my question: How does Cassandra know it has to
 replicate 1/3 of all keys to each single node in the second DC? I can see
 two ways:
  1. It has to be configured explicitly.
  2. It is derived from the number of nodes available in the data center
 at the time `nodetool rebuild` is started.

  Kind regards
 Björn
   Information in this email including any attachments may be privileged,
 confidential and is intended exclusively for the addressee. The views
 expressed may not be official policy, but the personal views of the
 originator. If you have received it in error, please notify the sender by
 return e-mail and delete it from your system. You should not reproduce,
 distribute, store, retransmit, use or disclose its contents to anyone.
 Please note we reserve the right to monitor all e-mail communication
 through our internal and external networks. SKY and the SKY marks are
 trademarks of Sky plc and Sky International AG and are used under licence.
 Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited
 (Registration No. 2067075) and Sky Subscribers Services Limited
 (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc
 (Registration No. 2247735). All of the companies mentioned in this
 paragraph are incorporated in England and Wales and share the same
 registered office at Grant Way, Isleworth, Middlesex TW7 5QD.






Re: write timeout

2015-03-23 Thread Anishek Agarwal
Forgot to mention I am using Cassandra 2.0.13

On Mon, Mar 23, 2015 at 5:59 PM, Anishek Agarwal anis...@gmail.com wrote:

 Hello,

 I am using a single node  server class machine with 16 CPUs with 32GB RAM
 with a single drive attached to it.

 my table structure is as below

 CREATE TABLE t1(id bigint, ts timestamp, cat1 settext, cat2 settext, lat 
 float, lon float, a bigint, primary key (id, ts));

 I am trying to insert 300 entries per partition key with 4000 partition
 keys using 25 threads. Configurations

 write_request_timeout_in_ms: 5000
 concurrent_writes: 32
 heap space : 8GB

 Client side timeout is 12 sec using datastax java driver.
 Consistency level: ONE

 With the above configuration i try to run it 10 times to eventually
 generate around

 300 * 4000 * 10 = 1200 entries,

 When i run this after the first few runs i get a WriteTimeout exception at
 client with 1 replica were required but only 0 acknowledged the write
 message.

 There are no errors in server log. Why does this error come how do i know
 what is the limit I should limit concurrent writes to a single node to.


 Looking at iostat disk utilization seems to be at 1-3% when running this.

 Please let me know if anything else is required.

 Regards,
 Anishek




write timeout

2015-03-23 Thread Anishek Agarwal
Hello,

I am using a single node  server class machine with 16 CPUs with 32GB RAM
with a single drive attached to it.

my table structure is as below

CREATE TABLE t1(id bigint, ts timestamp, cat1 settext, cat2
settext, lat float, lon float, a bigint, primary key (id, ts));

I am trying to insert 300 entries per partition key with 4000 partition
keys using 25 threads. Configurations

write_request_timeout_in_ms: 5000
concurrent_writes: 32
heap space : 8GB

Client side timeout is 12 sec using datastax java driver.
Consistency level: ONE

With the above configuration i try to run it 10 times to eventually
generate around

300 * 4000 * 10 = 1200 entries,

When i run this after the first few runs i get a WriteTimeout exception at
client with 1 replica were required but only 0 acknowledged the write
message.

There are no errors in server log. Why does this error come how do i know
what is the limit I should limit concurrent writes to a single node to.


Looking at iostat disk utilization seems to be at 1-3% when running this.

Please let me know if anything else is required.

Regards,
Anishek


Re: LCS Strategy, compaction pending tasks keep increasing

2015-04-21 Thread Anishek Agarwal
sorry i take that back we will modify different keys across threads not the
same key, our storm topology is going to use field grouping to get updates
for same keys to same set of bolts.

On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal anis...@gmail.com wrote:

 @Bruice : I dont think so as i am giving each thread a specific key range
 with no overlaps this does not seem to be the case now. However we will
 have to test where we have to modify the same key across threads -- do u
 think that will cause a problem ? As far as i have read LCS is recommended
 for such cases. should i just switch back to SizeTiredCompactionStrategy.


 On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil brice.duth...@gmail.com
 wrote:

 Could it that the app is inserting _duplicate_ keys ?

 -- Brice

 On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson krum...@gmail.com
 wrote:

 nope, but you can correlate I guess, tools/bin/sstablemetadata gives you
 sstable level information

 and, it is also likely that since you get so many L0 sstables, you will
 be doing size tiered compaction in L0 for a while.

 On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 @Marcus I did look and that is where i got the above but it doesnt show
 any detail about moving from L0 -L1 any specific arguments i should try
 with ?

 On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com
 wrote:

 you need to look at nodetool compactionstats - there is probably a big
 L0 - L1 compaction going on that blocks other compactions from starting

 On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 the some_bits column has about 14-15 bytes of data per key.

 On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 Hello,

 I am inserting about 100 million entries via datastax-java driver to
 a cassandra cluster of 3 nodes.

 Table structure is as

 create keyspace test with replication = {'class':
 'NetworkTopologyStrategy', 'DC' : 3};

 CREATE TABLE test_bits(id bigint primary key , some_bits text) with
 gc_grace_seconds=0 and compaction = {'class': 
 'LeveledCompactionStrategy'}
 and compression={'sstable_compression' : ''};

 have 75 threads that are inserting data into the above table with
 each thread having non over lapping keys.

 I see that the number of pending tasks via nodetool
 compactionstats keeps increasing and looks like from nodetool cfstats
 test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],

 Why is compaction not kicking in ?

 thanks
 anishek










Re: LCS Strategy, compaction pending tasks keep increasing

2015-04-21 Thread Anishek Agarwal
@Bruice : I dont think so as i am giving each thread a specific key range
with no overlaps this does not seem to be the case now. However we will
have to test where we have to modify the same key across threads -- do u
think that will cause a problem ? As far as i have read LCS is recommended
for such cases. should i just switch back to SizeTiredCompactionStrategy.


On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil brice.duth...@gmail.com
wrote:

 Could it that the app is inserting _duplicate_ keys ?

 -- Brice

 On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson krum...@gmail.com
 wrote:

 nope, but you can correlate I guess, tools/bin/sstablemetadata gives you
 sstable level information

 and, it is also likely that since you get so many L0 sstables, you will
 be doing size tiered compaction in L0 for a while.

 On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 @Marcus I did look and that is where i got the above but it doesnt show
 any detail about moving from L0 -L1 any specific arguments i should try
 with ?

 On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com
 wrote:

 you need to look at nodetool compactionstats - there is probably a big
 L0 - L1 compaction going on that blocks other compactions from starting

 On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 the some_bits column has about 14-15 bytes of data per key.

 On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 Hello,

 I am inserting about 100 million entries via datastax-java driver to
 a cassandra cluster of 3 nodes.

 Table structure is as

 create keyspace test with replication = {'class':
 'NetworkTopologyStrategy', 'DC' : 3};

 CREATE TABLE test_bits(id bigint primary key , some_bits text) with
 gc_grace_seconds=0 and compaction = {'class': 
 'LeveledCompactionStrategy'}
 and compression={'sstable_compression' : ''};

 have 75 threads that are inserting data into the above table with
 each thread having non over lapping keys.

 I see that the number of pending tasks via nodetool compactionstats
 keeps increasing and looks like from nodetool cfstats test.test_bits 
 has
 SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],

 Why is compaction not kicking in ?

 thanks
 anishek









Re: LCS Strategy, compaction pending tasks keep increasing

2015-04-21 Thread Anishek Agarwal
 have 2 CPU with 8 hyper
 threaded cores per node.

 In a related topic : I’m a bit concerned by datastax communication,
 usually people talk about IO as being the weak spot, but in our case it’s
 more about CPU. Fortunately the Moore law doesn’t really apply anymore
 vertically, now we have have multi core processors *and* the trend is
 going that way. Yet Datastax terms feels a bit *antiquated* and maybe a
 bit too much Oracle-y : http://www.datastax.com/enterprise-terms
 Node licensing is more appropriate for this century.
 ​

 -- Brice

 On Tue, Apr 21, 2015 at 11:19 PM, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 Do not enable multithreaded compaction. Overhead usually outweighs any
 benefit. It's removed in 2.1 because it harms more than helps:

 https://issues.apache.org/jira/browse/CASSANDRA-6142

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database
 technology, delivering Apache Cassandra to the world’s most innovative
 enterprises. Datastax is built to be agile, always-on, and predictably
 scalable to any size. With more than 500 customers in 45 countries, 
 DataStax
 is the database technology and transactional backbone of choice for the
 worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil brice.duth...@gmail.com
  wrote:

 I’m not sure I get everything about storm stuff, but my understanding
 of LCS is that compaction count may increase the more one update data
 (that’s why I was wondering about duplicate primary keys).

 Another option is that the code is sending too much write request/s to
 the cassandra cluster. I don’t know haw many nodes you have, but the less
 node there is the more compactions.
 Also I’d look at the CPU / load, maybe the config is too *restrictive*,
 look at the following properties in the cassandra.yaml

- compaction_throughput_mb_per_sec, by default the value is 16,
you may want to increase it but be careful on mechanical drives, if 
 already
in SSD IO is rarely the issue, we have 64 (with SSDs)
- multithreaded_compaction by default it is false, we enabled it.

 Compaction thread are niced, so it shouldn’t be much an issue for
 serving production r/w requests. But you never know, always keep an eye on
 IO and CPU.

 — Brice

 On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 sorry i take that back we will modify different keys across threads
 not the same key, our storm topology is going to use field grouping to 
 get
 updates for same keys to same set of bolts.

 On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 @Bruice : I dont think so as i am giving each thread a specific key
 range with no overlaps this does not seem to be the case now. However we
 will have to test where we have to modify the same key across threads 
 -- do
 u think that will cause a problem ? As far as i have read LCS is
 recommended for such cases. should i just switch back to
 SizeTiredCompactionStrategy.


 On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil 
 brice.duth...@gmail.com wrote:

 Could it that the app is inserting _duplicate_ keys ?

 -- Brice

 On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson krum...@gmail.com
  wrote:

 nope, but you can correlate I guess, tools/bin/sstablemetadata
 gives you sstable level information

 and, it is also likely that since you get so many L0 sstables, you
 will be doing size tiered compaction in L0 for a while.

 On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal 
 anis...@gmail.com wrote:

 @Marcus I did look and that is where i got the above but it
 doesnt show any detail about moving from L0 -L1 any specific 
 arguments i
 should try with ?

 On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson 
 krum...@gmail.com wrote:

 you need to look at nodetool compactionstats - there is probably
 a big L0 - L1 compaction going on that blocks other compactions 
 from
 starting

 On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal 
 anis...@gmail.com wrote:

 the some_bits column has about 14-15 bytes of data per key.

 On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal 
 anis...@gmail.com wrote:

 Hello,

 I am inserting about 100 million entries via datastax-java
 driver to a cassandra cluster of 3 nodes.

 Table structure is as

 create keyspace test with replication = {'class':
 'NetworkTopologyStrategy', 'DC' : 3};

 CREATE TABLE test_bits(id bigint primary key , some_bits text)
 with gc_grace_seconds=0 and compaction = {'class

Re: Reading hundreds of thousands of rows at once?

2015-04-22 Thread Anishek Agarwal
I think these will help speed up

- removing compression
- you have lot of independent columns mentioned. If you are always going to
query all of them together one other thing that will help is have a full
json(or some custom obj representation) of the value data and change the
model to just have survey_id, hour_created,respondent_id, *json_value*

On Wed, Apr 22, 2015 at 1:09 PM, John Anderson son...@gmail.com wrote:

 Hey, I'm looking at querying around 500,000 rows that I need to pull into
 a Pandas data frame for processing.  Currently testing this on a single
 cassandra node it takes around 21 seconds:

 https://gist.github.com/sontek/4ca95f5c5aa539663eaf

 I tried introducing multiprocessing so I could use 4 processes at a time
 to query this and I got it down to 14 seconds:

 https://gist.github.com/sontek/542f13307ef9679c0094

 Although shaving off 7 seconds is great it still isn't really where I
 would like to be in regards to performance, for this many rows I'd really
 like to get down to a max of 1-2 seconds query time.

 What types of optimization's can I make to improve the read performance
 when querying a large set of data?  Will this timing speed up linearly as I
 add more nodes?

 This is what the schema looks like currently:

 https://gist.github.com/sontek/d6fa3fc1b6d085ad3fa4


 I'm not tied to the current schema at all, its mostly just a replication
 of what we have in SQL Server. I'm more interested in what things I can
 change to make querying it faster.

 Thanks,
 John



Re: Reading hundreds of thousands of rows at once?

2015-04-22 Thread Anishek Agarwal
also might want to go through a thread here in with subject High latencies
for simple queries

On Wed, Apr 22, 2015 at 1:55 PM, Anishek Agarwal anis...@gmail.com wrote:

 I think these will help speed up

 - removing compression
 - you have lot of independent columns mentioned. If you are always going
 to query all of them together one other thing that will help is have a full
 json(or some custom obj representation) of the value data and change the
 model to just have survey_id, hour_created,respondent_id, *json_value*

 On Wed, Apr 22, 2015 at 1:09 PM, John Anderson son...@gmail.com wrote:

 Hey, I'm looking at querying around 500,000 rows that I need to pull into
 a Pandas data frame for processing.  Currently testing this on a single
 cassandra node it takes around 21 seconds:

 https://gist.github.com/sontek/4ca95f5c5aa539663eaf

 I tried introducing multiprocessing so I could use 4 processes at a time
 to query this and I got it down to 14 seconds:

 https://gist.github.com/sontek/542f13307ef9679c0094

 Although shaving off 7 seconds is great it still isn't really where I
 would like to be in regards to performance, for this many rows I'd really
 like to get down to a max of 1-2 seconds query time.

 What types of optimization's can I make to improve the read performance
 when querying a large set of data?  Will this timing speed up linearly as I
 add more nodes?

 This is what the schema looks like currently:

 https://gist.github.com/sontek/d6fa3fc1b6d085ad3fa4


 I'm not tied to the current schema at all, its mostly just a replication
 of what we have in SQL Server. I'm more interested in what things I can
 change to make querying it faster.

 Thanks,
 John





Re: LCS Strategy, compaction pending tasks keep increasing

2015-04-21 Thread Anishek Agarwal
@Marcus I did look and that is where i got the above but it doesnt show any
detail about moving from L0 -L1 any specific arguments i should try with ?

On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com wrote:

 you need to look at nodetool compactionstats - there is probably a big L0
 - L1 compaction going on that blocks other compactions from starting

 On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 the some_bits column has about 14-15 bytes of data per key.

 On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 Hello,

 I am inserting about 100 million entries via datastax-java driver to a
 cassandra cluster of 3 nodes.

 Table structure is as

 create keyspace test with replication = {'class':
 'NetworkTopologyStrategy', 'DC' : 3};

 CREATE TABLE test_bits(id bigint primary key , some_bits text) with
 gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
 and compression={'sstable_compression' : ''};

 have 75 threads that are inserting data into the above table with each
 thread having non over lapping keys.

 I see that the number of pending tasks via nodetool compactionstats
 keeps increasing and looks like from nodetool cfstats test.test_bits has
 SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],

 Why is compaction not kicking in ?

 thanks
 anishek






Network transfer to one node twice as others

2015-04-21 Thread Anishek Agarwal
Hello,

We are using cassandra 2.0.14 and have a cluster of 3 nodes. I have a
writer test (written in java) that runs 50 threads to populate data to a
single table in a single keyspace.

when i look at the iftop  I see that the amount of network transfer
happening on two nodes is same but on one of the nodes its almost 2ice as
the other two, Any reason that would be the case ?

Thanks
Anishek


Re: LCS Strategy, compaction pending tasks keep increasing

2015-04-21 Thread Anishek Agarwal
the some_bits column has about 14-15 bytes of data per key.

On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote:

 Hello,

 I am inserting about 100 million entries via datastax-java driver to a
 cassandra cluster of 3 nodes.

 Table structure is as

 create keyspace test with replication = {'class':
 'NetworkTopologyStrategy', 'DC' : 3};

 CREATE TABLE test_bits(id bigint primary key , some_bits text) with
 gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
 and compression={'sstable_compression' : ''};

 have 75 threads that are inserting data into the above table with each
 thread having non over lapping keys.

 I see that the number of pending tasks via nodetool compactionstats
 keeps increasing and looks like from nodetool cfstats test.test_bits has
 SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],

 Why is compaction not kicking in ?

 thanks
 anishek



LCS Strategy, compaction pending tasks keep increasing

2015-04-21 Thread Anishek Agarwal
Hello,

I am inserting about 100 million entries via datastax-java driver to a
cassandra cluster of 3 nodes.

Table structure is as

create keyspace test with replication = {'class':
'NetworkTopologyStrategy', 'DC' : 3};

CREATE TABLE test_bits(id bigint primary key , some_bits text) with
gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
and compression={'sstable_compression' : ''};

have 75 threads that are inserting data into the above table with each
thread having non over lapping keys.

I see that the number of pending tasks via nodetool compactionstats keeps
increasing and looks like from nodetool cfstats test.test_bits has
SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],

Why is compaction not kicking in ?

thanks
anishek


Re: LCS Strategy, compaction pending tasks keep increasing

2015-04-21 Thread Anishek Agarwal
I am on version 2.0.14, will update once i get the stats up for the writes
again


On Tue, Apr 21, 2015 at 4:46 PM, Carlos Rolo r...@pythian.com wrote:

 Are you on version 2.1.x?

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
 www.pythian.com

 On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 the some_bits column has about 14-15 bytes of data per key.

 On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 Hello,

 I am inserting about 100 million entries via datastax-java driver to a
 cassandra cluster of 3 nodes.

 Table structure is as

 create keyspace test with replication = {'class':
 'NetworkTopologyStrategy', 'DC' : 3};

 CREATE TABLE test_bits(id bigint primary key , some_bits text) with
 gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
 and compression={'sstable_compression' : ''};

 have 75 threads that are inserting data into the above table with each
 thread having non over lapping keys.

 I see that the number of pending tasks via nodetool compactionstats
 keeps increasing and looks like from nodetool cfstats test.test_bits has
 SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],

 Why is compaction not kicking in ?

 thanks
 anishek




 --






Re: Unable to connect via cqlsh or datastax-driver

2015-05-06 Thread Anishek Agarwal
did u setup CQLSH_HOST variable to the ip so cqlsh uses that ?

On Tue, May 5, 2015 at 8:50 PM, Björn Hachmann bjoern.hachm...@metrigo.de
wrote:

 Hello,

 I am unable to connect to the nodes of our second datacenter, not even
 from localhost.

 The error message I receive is:

 Connection error: ('Unable to connect to any servers', {'...':
 OperationTimedOut('errors=None, last_host=None',)})


 I already checked some things:

- The node starts to listen for cql clients on the expected port
(extract from the log):
Starting listening for CQL clients on .../192.168.1.23:9042
- The port is open and accepts connections via telnet.
- nodetool info works and returns:
Gossip active  : true
Thrift active  : true
Native Transport active: true
- nodetool netstats:
Mode: NORMAL
- nodetool statusbinary
running

 Any help would be highly appreciated! Thank you very much.

 Kind regards
 Björn



Re: Read performance

2015-05-11 Thread Anishek Agarwal
how many sst tables were there?   what compaction are you using ? These
properties define how many possible disk reads cassandra has to do to get
all the data you need depending on which SST Tables have data for your
partition key.

On Fri, May 8, 2015 at 6:25 PM, Alprema alpr...@alprema.com wrote:

 I was planning on using a more server-friendly strategy anyway (by
 parallelizing my workload on multiple metrics) but my concern here is more
 about the raw numbers.

 According to the trace and my estimation of the data size, the read from
 disk was done at about 30MByte/s and the transfer between the responsible
 node and the coordinator was done at 120Mbits/s which doesn't seem right
 given that the cluster was not busy and the network is Gbit capable.

 I know that there is some overhead, but these numbers seem odd to me, do
 they seem normal to you ?

 On Fri, May 8, 2015 at 2:34 PM, Bryan Holladay holla...@longsight.com
 wrote:

 Try breaking it up into smaller chunks using multiple threads and token
 ranges. 86400 is pretty large. I found ~1000 results per query is good.
 This will spread the burden across all servers a little more evenly.

 On Thu, May 7, 2015 at 4:27 AM, Alprema alpr...@alprema.com wrote:

 Hi,

 I am writing an application that will periodically read big amounts of
 data from Cassandra and I am experiencing odd performances.

 My column family is a classic time series one, with series ID and Day as
 partition key and a timestamp as clustering key, the value being a double.

 The query I run gets all the values for a given time series for a given
 day (so about 86400 points):

 SELECT UtcDate, ValueFROM Metric_OneSecWHERE MetricId = 
 12215ece-6544-4fcf-a15d-4f9e9ce1567eAND Day = '2015-05-05 
 00:00:00+'LIMIT 86400;


 This takes about 450ms to run and when I trace the query I see that it
 takes about 110ms to read the data from disk and 224ms to send the data
 from the responsible node to the coordinator (full trace in attachment).

 I did a quick estimation of the requested data (correct me if I'm wrong):
 86400 * (column name + column value + timestamp + ttl)
 = 86400 * (8 + 8 + 8 + 8?)
 = 2.6Mb

 Let's say about 3Mb with misc. overhead, so these timings seem pretty
 slow to me for a modern SSD and a 1Gb/s NIC.

 Do those timings seem normal? Am I missing something?

 Thank you,

 Kévin







Re: error='Cannot allocate memory' (errno=12)

2015-05-11 Thread Anishek Agarwal
the memory cassandra is trying to allocate is pretty small. you sure there
is no hardware failure on the machine. what is the free ram on the box ?

On Mon, May 11, 2015 at 3:28 PM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

 Hi All,

 We have cluster of 3 nodes with  64GB RAM each. My cluster was running in
 healthy state. Suddenly one machine's cassandra daemon stops working and
 shut down.

 On restarting it after 2 minutes it again stops and is getting stop after
 returning below error in cassandra.log

 Java HotSpot(TM) 64-Bit Server VM warning: INFO:
 os::commit_memory(0x7fd064dc6000, 12288, 0) failed; error='Cannot
 allocate memory' (errno=12)
 #
 # There is insufficient memory for the Java Runtime Environment to
 continue.
 # Native memory allocation (malloc) failed to allocate 12288 bytes for
 committing reserved memory.
 # An error report file with more information is saved as:
 # /tmp/hs_err_pid23215.log
 INFO  09:50:41 Loading settings from
 file:/etc/cassandra/default.conf/cassandra.yaml
 INFO  09:50:41 Node configuration:[authenticator=AllowAllAuthenticator;
 authorizer=AllowAllAuthorizer; auto_snapshot=true;
 batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024;
 cas_contention_timeout_in_ms=1000; client_encryption_options=REDACTED;
 cluster_name=Test Cluster; column_index_size_in_kb=64;
 commit_failure_policy=stop;
 commitlog_directory=/var/lib/cassandra/commitlog;
 commitlog_segment_size_in_mb=64; commitlog_sync=periodic;
 commitlog_sync_period_in_ms=1; compaction_throughput_mb_per_sec=16;
 concurrent_compactors=4; concurrent_counter_writes=32; concurrent_reads=32;
 concurrent_writes=32; counter_cache_save_period=7200;
 counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000;
 cross_node_timeout=false; data_file_directories=[/var/lib/cassandra/data];
 disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1;
 dynamic_snitch_reset_interval_in_ms=60;
 dynamic_snitch_update_interval_in_ms=100;
 endpoint_snitch=GossipingPropertyFileSnitch; hinted_handoff_enabled=true;
 hinted_handoff_throttle_in_kb=1024; incremental_backups=false;
 index_summary_capacity_in_mb=null;
 index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false;
 internode_compression=all; key_cache_save_period=14400;
 key_cache_size_in_mb=null; listen_address=null;
 max_hint_window_in_ms=1080; max_hints_delivery_threads=2;
 memtable_allocation_type=heap_buffers; native_transport_port=9042;
 num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner;
 permissions_validity_in_ms=2000; range_request_timeout_in_ms=100;
 read_request_timeout_in_ms=9;
 request_scheduler=org.apache.cassandra.scheduler.NoScheduler;
 request_timeout_in_ms=9; row_cache_save_period=0;
 row_cache_size_in_mb=0; rpc_address=null; rpc_keepalive=true;
 rpc_port=9160; rpc_server_type=sync;
 saved_caches_directory=/var/lib/cassandra/saved_caches;
 seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider,
 parameters=[{seeds=206.191.151.199}]}];
 server_encryption_options=REDACTED; snapshot_before_compaction=false;
 ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50;
 start_native_transport=true; start_rpc=true; storage_port=7000;
 thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=10;
 tombstone_warn_threshold=1000; trickle_fsync=false;
 trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=6;
 write_request_timeout_in_ms=9]
 ERROR 09:50:41 Exception encountered during startup
 java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method) ~[na:1.7.0_60]
 at java.lang.Thread.start(Thread.java:714) ~[na:1.7.0_60]
 at
 java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
 ~[na:1.7.0_60]
 at
 java.util.concurrent.ThreadPoolExecutor.ensurePrestart(ThreadPoolExecutor.java:1590)
 ~[na:1.7.0_60]
 at
 java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:333)
 ~[na:1.7.0_60]
 at
 java.util.concurrent.ScheduledThreadPoolExecutor.scheduleWithFixedDelay(ScheduledThreadPoolExecutor.java:594)
 ~[na:1.7.0_60]
 at
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor.scheduleWithFixedDelay(DebuggableScheduledThreadPoolExecutor.java:61)
 ~[apache-cassandra-2.1.2.jar:2.1.2-SNAPSHOT]
 at org.apache.cassandra.gms.Gossiper.start(Gossiper.java:1188)
 ~[apache-cassandra-2.1.2.jar:2.1.2-SNAPSHOT]
 at
 org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:721)
 ~[apache-cassandra-2.1.2.jar:2.1.2-SNAPSHOT]
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:643)
 ~[apache-cassandra-2.1.2.jar:2.1.2-SNAPSHOT]
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:535)
 ~[apache-cassandra-2.1.2.jar:2.1.2-SNAPSHOT]
  

Re: error='Cannot allocate memory' (errno=12)

2015-05-11 Thread Anishek Agarwal
Well i havent used 2.1.x cassandra neither java 8 but any reason for not
using oracle JDK as i thought thats what is recommended. i saw a thread
earlier stating java 8 with 2.0.14+ cassandra is tested but not sure about
2.1.x versions.


On Mon, May 11, 2015 at 4:04 PM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

 PFA of error log​
  hs_err_pid9656.log
 https://docs.google.com/a/indiamart.com/file/d/0B0hlSlesIPVfaU9peGwxSXdsZGc/edit?usp=drive_web
 ​

 On Mon, May 11, 2015 at 3:58 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 free RAM:


 free -m
  total   used   free sharedbuffers cached
 Mem: 64398  23753  40644  0108   8324
 -/+ buffers/cache:  15319  49078
 Swap: 2925 15   2909


  ulimit -a
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 515041
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 1024
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515041
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited


 Also attaching complete error file


 On Mon, May 11, 2015 at 3:35 PM, Anishek Agarwal anis...@gmail.com
 wrote:

 the memory cassandra is trying to allocate is pretty small. you sure
 there is no hardware failure on the machine. what is the free ram on the
 box ?

 On Mon, May 11, 2015 at 3:28 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 Hi All,

 We have cluster of 3 nodes with  64GB RAM each. My cluster was running
 in healthy state. Suddenly one machine's cassandra daemon stops working and
 shut down.

 On restarting it after 2 minutes it again stops and is getting stop
 after returning below error in cassandra.log

 Java HotSpot(TM) 64-Bit Server VM warning: INFO:
 os::commit_memory(0x7fd064dc6000, 12288, 0) failed; error='Cannot
 allocate memory' (errno=12)
 #
 # There is insufficient memory for the Java Runtime Environment to
 continue.
 # Native memory allocation (malloc) failed to allocate 12288 bytes for
 committing reserved memory.
 # An error report file with more information is saved as:
 # /tmp/hs_err_pid23215.log
 INFO  09:50:41 Loading settings from
 file:/etc/cassandra/default.conf/cassandra.yaml
 INFO  09:50:41 Node configuration:[authenticator=AllowAllAuthenticator;
 authorizer=AllowAllAuthorizer; auto_snapshot=true;
 batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024;
 cas_contention_timeout_in_ms=1000; client_encryption_options=REDACTED;
 cluster_name=Test Cluster; column_index_size_in_kb=64;
 commit_failure_policy=stop;
 commitlog_directory=/var/lib/cassandra/commitlog;
 commitlog_segment_size_in_mb=64; commitlog_sync=periodic;
 commitlog_sync_period_in_ms=1; compaction_throughput_mb_per_sec=16;
 concurrent_compactors=4; concurrent_counter_writes=32; concurrent_reads=32;
 concurrent_writes=32; counter_cache_save_period=7200;
 counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000;
 cross_node_timeout=false; data_file_directories=[/var/lib/cassandra/data];
 disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1;
 dynamic_snitch_reset_interval_in_ms=60;
 dynamic_snitch_update_interval_in_ms=100;
 endpoint_snitch=GossipingPropertyFileSnitch; hinted_handoff_enabled=true;
 hinted_handoff_throttle_in_kb=1024; incremental_backups=false;
 index_summary_capacity_in_mb=null;
 index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false;
 internode_compression=all; key_cache_save_period=14400;
 key_cache_size_in_mb=null; listen_address=null;
 max_hint_window_in_ms=1080; max_hints_delivery_threads=2;
 memtable_allocation_type=heap_buffers; native_transport_port=9042;
 num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner;
 permissions_validity_in_ms=2000; range_request_timeout_in_ms=100;
 read_request_timeout_in_ms=9;
 request_scheduler=org.apache.cassandra.scheduler.NoScheduler;
 request_timeout_in_ms=9; row_cache_save_period=0;
 row_cache_size_in_mb=0; rpc_address=null; rpc_keepalive=true;
 rpc_port=9160; rpc_server_type=sync;
 saved_caches_directory=/var/lib/cassandra/saved_caches;
 seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider,
 parameters=[{seeds=206.191.151.199}]}];
 server_encryption_options=REDACTED; snapshot_before_compaction=false;
 ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50;
 start_native_transport=true; start_rpc=true; storage_port=7000;
 thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold

Reads failing at around 4000 QPS

2015-05-12 Thread Anishek Agarwal
Hello everyone,

i have a 3 node cluster with Cassandra 2.0.14 on centos in the same Data
center with RF=3 and i am using CL=Local_Quorum by default for the read and
write operations. I have given about 5 GB of heap space to cassandra.
I have 40 core machines with 3 separate SATA disks with commitlog on one
and data directories on the other two.

I am doing a read + write at the same with about 4000 QPS.

I am getting Read failures on the client where one of the replica didnot
respond.  When i look at the cassandra logs i see a lot failures as i have
attached (read_failures.txt).

Am i overloading the system too much ? 4000 QPS doesnt seem too much at
first glance.

Please let me know if any other details are required.

DataModel :

partition_key, clustering_key, col1


Regards,
Anishek
INFO [ScheduledTasks:1] 2015-05-12 15:30:01,135 MessagingService.java (line 
875) 1482 READ messages dropped in last 5000ms
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,136 StatusLogger.java (line 55) 
Pool NameActive   Pending  Completed   Blocked  All 
Time Blocked
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,136 StatusLogger.java (line 70) 
ReadStage32  46762438994 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,137 StatusLogger.java (line 70) 
RequestResponseStage  0 04624954 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,137 StatusLogger.java (line 70) 
ReadRepairStage   0 1 162868 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,137 StatusLogger.java (line 70) 
MutationStage 0 03794927 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,137 StatusLogger.java (line 70) 
ReplicateOnWriteStage 0 0  0 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,138 StatusLogger.java (line 70) 
GossipStage   0 0   5339 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,138 StatusLogger.java (line 70) 
CacheCleanupExecutor  0 0  0 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,138 StatusLogger.java (line 70) 
MigrationStage0 0  0 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,138 StatusLogger.java (line 70) 
MemoryMeter   0 0 38 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,139 StatusLogger.java (line 70) 
ValidationExecutor0 0  0 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,139 StatusLogger.java (line 70) 
FlushWriter   0 0 31 0  
   9
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,139 StatusLogger.java (line 70) 
InternalResponseStage 0 0  0 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,139 StatusLogger.java (line 70) 
AntiEntropyStage  0 0  0 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,139 StatusLogger.java (line 70) 
MemtablePostFlusher   0 0 64 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,140 StatusLogger.java (line 70) 
MiscStage 0 0  0 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,140 StatusLogger.java (line 70) 
PendingRangeCalculator0 0  3 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,140 StatusLogger.java (line 70) 
commitlog_archiver0 0  0 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,140 StatusLogger.java (line 70) 
CompactionExecutor0 0   1149 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,141 StatusLogger.java (line 70) 
HintedHandoff 0 1  4 0  
   0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,141 StatusLogger.java (line 79) 
CompactionManager 0 0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,141 StatusLogger.java (line 81) 
Commitlog   n/a 0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,141 StatusLogger.java (line 93) 
MessagingServicen/a   0/0
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,141 StatusLogger.java (line 103) 
Cache Type Size Capacity   
KeysToSave
 INFO [ScheduledTasks:1] 2015-05-12 15:30:01,141 StatusLogger.java (line 105) 
KeyCache   59330812104857600
  all
 INFO 

text partition key Bloom filters fp is 1 always, why?

2015-05-13 Thread Anishek Agarwal
Hello,

I have a text partition key for one of the CF. The cfstats on that table
seems to show that the bloom filter false positive ratio is always 1. Also
the bloom filter is using very less space.

Do bloom filters not work well with text partition keys ? I can assume this
as it can no way detect the length of the text and hence would have a very
high false positive.

The text partition key is combined using a long + _ +
epoch_time_in_hours, would it be better if we have a composite partition
key of the (long, epoch_time_in_hours) rather than combining it as a text
key ?


Thanks
anishek


SST Tables Per read in cfhistorgrams

2015-05-17 Thread Anishek Agarwal
Hello,

I am seeing that even though the bloom filter fp ratio being set to 0.1 the
actual is at about .55 and on looking at the histograms of the table i see
that there are reads going to 3+ SSTtables even though the way i am
querying for read it should look at the most recent row only since i have
time as part of my partition_key. I have a composite partition key with
((long,timestamp)).

Question: The Number of SST tables read, would it also include those where
the bloom filter gave a false positive ? or is it just the number to
actually do the reads.

Thanks
Anishek


Binary Protocol Version and CQL version supported in 2.0.14

2015-04-13 Thread Anishek Agarwal
Hello,

I was trying to find what protocol versions are supported in Cassandara
2.0.14 and after reading multiple links i am very very confused.

Please correct me if my understanding is correct:

   - Binary Protocol version and CQL Spec version are different ?
   - Cassandra 2.0.x supports CQL 3 ?
   - Is there a different Binary Protocol version between 2.0.x and 2.1.x ?


Is there some link which states what version of cassandra supports what
binary protocol version and CQL spec version (Additionally showing which
drivers support what will be great too) ?

 The  link
http://www.datastax.com/dev/blog/java-driver-2-1-2-native-protocol-v3 shows
some info but i am not sure what the supported protocol versions are
referring to(binary or CQL spec).

Thanks
Anishek


Re: Uderstanding Read after update

2015-04-12 Thread Anishek Agarwal
Thanks Tyler for the validations,

I have a follow up question.

 One SSTable doesn't have precedence over another.  Instead, when the same
cell exists in both sstables, the one with the higher write timestamp wins.

if my table has 5(non partition key columns) and i update only 1 of them
then the new SST table should have only that entry, which means if i query
everything for that parition key,  cassandra has to have the timestamp
matched per column for a partition key across SST tables to get me the data
?


On Fri, Apr 10, 2015 at 10:52 PM, Tyler Hobbs ty...@datastax.com wrote:



 SST Table level bloom filters have details as to what partition keys are
 in that table. So to clear up my understanding, if I insert and then have a
 update to the same row after some time (assuming both go to different SST
 Tables), then during read cassandra will read data from both SST Tables and
 merge them in order of time series with Data in Second SST table for the
 row taking precedence over the First SST Table and return the result ?


 That's approximately correct.  The only part that's incorrect is how
 merging works.  One SSTable doesn't have precedence over another.  Instead,
 when the same cell exists in both sstables, the one with the higher write
 timestamp wins.


 Does it mark the old column as tombstone in the previous SST Table or
 wait for compaction to remove the old data ?


 It just waits for compaction to remove the old data, there's no tombstone.


 when the data is in mem cache it also keep tracks of unique keys in that
 memtable so when it writes to disk it can use that to derive the right size
 of bloom filter for that SST Table ?


 That's correct, it knows the number of keys before the bloom filter is
 created.

 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: PHP Cassandra Driver for 2.0.13

2015-04-12 Thread Anishek Agarwal
Hey Alex,

We are planning on using Cassandra 2.0.13 and looks like it will take us a
month to go production. Since the team that needs PHP is only going to
read, if we dont think there is to much integration testing or otherwise we
need to do with PHP driver so if we get a PHP production driver in 3 weeks,
i think we should be fine, though i still have to discuss this with the
other team, they might not be willing to wait so long.

thanks

On Sat, Apr 11, 2015 at 12:52 AM, Alex Popescu al...@datastax.com wrote:

 What Cassandra version are you using? How soon will you need a production
 ready PHP driver?

 On Fri, Apr 10, 2015 at 5:47 AM, Anishek Agarwal anis...@gmail.com
 wrote:

 Hello,

 As part of using this for our project one of our teams need PHP driver
 for cassandra. the datastax page says its in ALPHA, is there some release
 candidate that people have used or any way to get this working with PHP ?

 Thanks
 Anishek




 --
 Bests,

 Alex Popescu | @al3xandru
 Sen. Product Manager @ DataStax




Re: PHP Cassandra Driver for 2.0.13

2015-04-12 Thread Anishek Agarwal
the php team is very stringent about response times, i will see if we can
do a node js web service or some form of inter process communication setup
between php == python to achieve this, thanks for the idea.

On Fri, Apr 10, 2015 at 7:13 PM, Michael Dykman mdyk...@gmail.com wrote:

 Somewhat over a year ago, I set out to address the exact same issue for
 our high-traffic PHP site.  After several failed attempts, I tried to wrap
 the C++ driver (as it was then) in extern C wrappers before I gave up
 when I realized the driver was pre-alpha.  The current implementation
 provides C bindings out of the box but it relative immaturity still makes
 it look like too much of a risk.

 Ultimately, we set up a web service (json in/json out) written in Java
 which uses the datastax Java driver to accommodate our PHP's cassandra
 needs.  An arbitrary number of parameterized queries can be passed to the
 service which runs those queries in parallel and the result is both
 reliable and very fast.  I don't think it would be easy (or even possible)
 for a PHP implementation to take advantage of the async interface which is
 where most of the performance gain is to be had.



 On Fri, Apr 10, 2015 at 8:47 AM, Anishek Agarwal anis...@gmail.com
 wrote:

 Hello,

 As part of using this for our project one of our teams need PHP driver
 for cassandra. the datastax page says its in ALPHA, is there some release
 candidate that people have used or any way to get this working with PHP ?

 Thanks
 Anishek




 --
  - michael dykman
  - mdyk...@gmail.com

  May the Source be with you.



Re: Heap memory usage while writing

2015-04-12 Thread Anishek Agarwal
I do understand how MaxTenuringThreshold works, thanks for your evaluation
though.

I dont think you saw my complete post with the values i have used for the
heap size and and the *memtable_total_space_in_mb=2048* which is two times
smaller than the young generation space i am using. additionally
*memtable_flush_queue_size=1
*so there are not many memtables in memory, this coupled with the fact that
i am writing out to cassandra wit 20 threads, it should pretty much just
collect the objects from ParNewGC, *which is what it is doing now. *

there are only 2 CMS collections that happened for me in 15 mins when
running at full capacity, what i am now concerned about is that the
CMS-remark phase is about 70 ms and that is something i am looking to bring
down. There seems to be valuable pointers @ *Cassandra-8150 *still which i
am going to try.



On Fri, Apr 10, 2015 at 7:26 PM, ssiv...@gmail.com ssiv...@gmail.com
wrote:


 MaxTenuringThreshold is low as i think most of the objects should be
 ephemeral with only writes.

  You don't understant how *MaxTenuringThreshold* works. If you keep it
 low, than GC will move objects which is still alive to old gen space.
 Yes, they ephemeral, but C* will keep it until flushed to disk. So, again,
 you should balance *heap space*, *memtable_total_space_in_mb,
 memtable_cleanup_threshold *and your *disk_throughput *to rid off
 memtables as soon as possible. If *memtable_total_space_in_mb *is large
 and young gen is large too, then you have to increase MaxTenuringThreshold,
 to keep CMS off of moving data to old gen.
 If you sure that young gen is filled not so fast, that you can increase
 *CMSWaitDuration* to avoid useless calls of CMS.



 On 04/10/2015 03:42 PM, Anishek Agarwal wrote:

 Sorry i forgot to update but i am not using CMSIncrementalMode anymore
 as it over rides UseCMSInitiatingOccupancyOnly.

 @Graham : thanks for the CMSParallelInitialMarkEnabled and 
 CMSEdenChunksRecordAlways  i havent used them, i will try it. My initial
 mark is only around 6ms though.

  With my current config(with incorporating the changes above), I have
 been able to reduce the number of CMS run significantly now and mostly
 ParNewGC is running but when CMS triggers it takes a lot of time for Remark
 hence started using  -XX:+CMSParallelRemarkEnabled which gave some
 improvement. This is still around 70 ms.

  MaxTenuringThreshold is low as i think most of the objects should be
 ephemeral with only writes.

  @Sebastian : I started from that Issue :), though i havent tried the GC
 affinity ones as of yet still. Thanks for the link!

  Thanks
 anishek


 On Fri, Apr 10, 2015 at 5:49 PM, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 Did you check out Cassandra-8150?
  On Apr 10, 2015 7:04 AM, Anishek Agarwal anis...@gmail.com wrote:

 Hey,

  Any reason you think the MaxTenuringThreshold should be increased. I
 am pumping data at full capacity that a single nodes seems to take so all
 the data becomes stale soon enough (when its flushed), additionally the
 whole memtable can be in young generation only. There seems to be enough
 additional space to even hold the bloom filters for the respective
 SSTTAbles i would guess.

  I will try with the CMSWaitDuration that should help in reducing the
 CMS initial mark phase i think.

  Though i am not sure what is getting moved to old generation
 continuously to fill it ?

  Thanks for the pointers.

 On Fri, Apr 10, 2015 at 12:12 PM, ssiv...@gmail.com ssiv...@gmail.com
 wrote:

  Hi,

 You should increase *MaxTenuringThreshold* and *CMSWaitDuration* to
 keep your data in young generation longer (until the data will be flushed
 to disk).
 Depending on your load, combine values of the next parameters: 
 *HEAP_NEWSIZE,
 memtable_total_space_in_mb, memtable_cleanup_threshold *and your
 *disk_throughput*.
 Ideally, only ParNewGC will work to collect ephemeral objects, and it
 will take very short delays.


 On 04/09/2015 09:30 AM, Anishek Agarwal wrote:

 Hello,

  We have only on CF as

  CREATE TABLE t1(id bigint, ts timestamp, definition text, primary key
 (id, ts))
 with clustering order by (ts desc) and gc_grace_seconds=0
 and compaction = {'class': 'DateTieredCompactionStrategy',
 'timestamp_resolution':'SECONDS', 'base_time_seconds':'20',
 'max_sstable_age_days':'30'}
 and compression={'sstable_compression' : ''};

  on a single Node using the following in

  cassandra.yaml:
  memtable_total_space_in_mb: 2048
  commitlog_total_space_in_mb: 4096
  memtable_flush_writers: 2
  memtable_flush_queue_size: 1

  cassandra-env.sh 
  MAX_HEAP_SIZE=8G
 HEAP_NEWSIZE=5120M
  JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
 JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
 JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
 JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=6
 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1
 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70
 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
 JVM_OPTS=$JVM_OPTS -XX:+UseTLAB

Heap memory usage while writing

2015-04-09 Thread Anishek Agarwal
Hello,

We have only on CF as

CREATE TABLE t1(id bigint, ts timestamp, definition text, primary key (id,
ts))
with clustering order by (ts desc) and gc_grace_seconds=0
and compaction = {'class': 'DateTieredCompactionStrategy',
'timestamp_resolution':'SECONDS', 'base_time_seconds':'20',
'max_sstable_age_days':'30'}
and compression={'sstable_compression' : ''};

on a single Node using the following in

cassandra.yaml:
memtable_total_space_in_mb: 2048
commitlog_total_space_in_mb: 4096
memtable_flush_writers: 2
memtable_flush_queue_size: 1

cassandra-env.sh 
MAX_HEAP_SIZE=8G
HEAP_NEWSIZE=5120M
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=6
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:MaxPermSize=256m
JVM_OPTS=$JVM_OPTS -XX:+AggressiveOpts
JVM_OPTS=$JVM_OPTS -XX:+UseCompressedOops
JVM_OPTS=$JVM_OPTS -XX:+CMSIncrementalMode
JVM_OPTS=$JVM_OPTS -XX:+CMSIncrementalPacing
JVM_OPTS=$JVM_OPTS -XX:+PrintGCDetails
JVM_OPTS=$JVM_OPTS -XX:+PrintGCTimeStamps -verbose:gc
JVM_OPTS=$JVM_OPTS
-Xloggc:/home/anishek/apache-cassandra-2.0.13/logs/gc.log
JVM_OPTS=$JVM_OPTS -XX:+PrintHeapAtGC
JVM_OPTS=$JVM_OPTS -XX:+PrintTenuringDistribution


I am writing via 20 threads continuously to this table above.
I see that some data keeps moving from the young generation to the older
generation continuously.

I am wondering why this is happening. Given i am writing constantly and my
young generation is more than twice the max mem table space used i would
think only the young generation space would be used and nothing would ever
go old generation.

** System.log show no compactions happening.
** There are no read operations.
** Cassandra version 2.0.13 on centos with 16 cores and 16 GB Ram

Thanks
Anishek


Re: log all the query statement

2015-04-06 Thread Anishek Agarwal
Hey Peter,

This is from the perspective of 2.0.13 but there should be something
similar in your version. Can you enable debug log for cassandra and see if
the log files have additional info. Depending on how soon/later in you test
you get the error, you might also want to modify the maxBackupIndex or
maxFileSize to make sure u keep enough log files around.

anishek

On Thu, Apr 2, 2015 at 11:53 AM, 鄢来琼 laiqiong@gtafe.com wrote:

  Hi all,



 Cassandra 2.1.2 is used in my project, but some node is down after
 executing query some statements.

 Could I configure the Cassandra to log all the executed statement?

 Hope the log file can be used to identify the problem.

 Thanks.



 Peter





Re: Throttle Heavy Read / Write Loads

2015-06-04 Thread Anishek Agarwal
may be just increase the read and write timeouts at cassandra currently at
5 sec i think. i think the datastax java client driver provides ability to
say how many max requests per connection are to be sent, you can try and
lower that to limit excessive requests along with limiting the number of
connections a client can do.

just out of curiosity how long are GC pauses for you both ParNew and CMS
and at what intervals are you seeing the GC happening. I just recently
spent time to tune it and would be good to know if its working well.

thanks
anishek

On Fri, Jun 5, 2015 at 12:03 AM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:

 We are using Cassandra 2.0.14 with Hector as client ( will be gradually
 moving to CQL Driver ).

 Often we see that heavy read and write loads lead to Cassandra timeouts
 and unpredictable results due to gc pauses and request timeouts. We need to
 know the best way to throttle read and write load on Cassandra such that
 even if heavy operations are slower they complete gracefully. This will
 also shield us against misbehaving clients.

 I was thinking of limiting rpc connections via rpc_max_threads property
 and implementing connection pool at client side.

 I would appreciate if you could please share your suggestions on the above
 mentioned approach or share any alternatives to the approach.

 Thanks
 Anuj Wadehra




DTCS - nodetool repair - TTL

2015-06-24 Thread Anishek Agarwal
Hello all,

We are running c* version 2.0.15. We have 5 nodes with RF=3. We are using
DTCS and on all inserts we have a TTL of 30 days. We have no deletes.We
just have one CF. When i run nodetool repair on a node i notice a lot of
extra sst tables created, this I think is due to the fact that its trying
to reconcile the correct values across different nodes. What i am trying to
figure out now is how will this affect the performance after the ttl is
reached for rows. As far as i understood from Spotify DTCS
https://labs.spotify.com/tag/dtcs/ it looks like DTCS will drop the whole
SST table once the ttl is reached as it compacts data which are inserted
around the same time into same SST table.  Now when repair happens we have
these new SST Tables which are earlier in the timeline and hence will have
tombstones alive for sometime.

for ex if the machine is up for 2 weeks and i run repair now for the first
time then the new sst tables might have data which is from anywhere in the
previous weeks and hence even though the SST tables created during week 1
will get dropped off in the starting of 5th Week because of repair there
will additional SST tables which will have tombstones till they reach their
eventual drop state a few weeks later.

Am i thinking correct ?

This means that we might still have lot of tombstones lying around as
compaction is less frequent for older tables ?

thanks
anishek


Re: handling down node cassandra 2.0.15

2015-11-16 Thread Anishek Agarwal
nope its not

On Mon, Nov 16, 2015 at 5:48 PM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> Is that a seed node?
>
> On Mon, Nov 16, 2015, 05:21 Anishek Agarwal <anis...@gmail.com> wrote:
>
>> Hello,
>>
>> We are having a 3 node cluster and one of the node went down due to a
>> hardware memory failure looks like. We followed the steps below after the
>> node was down for more than the default value of *max_hint_window_in_ms*
>>
>> I tried to restart cassandra by following the steps @
>>
>>
>>1.
>>
>> http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
>>2.
>>
>> http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
>>
>> *except the "clear data" part as it was not specified in second blog
>> above.*
>>
>> i was trying to restart the same node that went down, however I did not
>> get the messages in log files as stated in 2 against "StorageService"
>>
>> instead it just tried to replay and then stopped with the error message
>> as below:
>>
>> *ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584)
>> Exception encountered during startup*
>> *java.lang.RuntimeException: Cannot replace address with a node that is
>> already bootstrapped*
>>
>> Can someone please help me if there is something i am doing wrong here.
>>
>> Thanks for the help in advance.
>>
>> Regards,
>> Anishek
>>
>


Re: handling down node cassandra 2.0.15

2015-11-16 Thread Anishek Agarwal
Hey Josh

I did set the replace address which was same as the address of the machine
which went down so it was in place.

anishek

On Mon, Nov 16, 2015 at 10:33 PM, Josh Smith <josh.sm...@careerbuilder.com>
wrote:

> Sis you set the JVM_OPTS to replace address? That is usually the error I
> get when I forget to set the replace_address on Cassandra-env.
>
>
>
> JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node
>
>
>
>
>
> *From:* Anishek Agarwal [mailto:anis...@gmail.com]
> *Sent:* Monday, November 16, 2015 9:25 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: handling down node cassandra 2.0.15
>
>
>
> nope its not
>
>
>
> On Mon, Nov 16, 2015 at 5:48 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> Is that a seed node?
>
>
>
> On Mon, Nov 16, 2015, 05:21 Anishek Agarwal <anis...@gmail.com> wrote:
>
> Hello,
>
>
>
> We are having a 3 node cluster and one of the node went down due to a
> hardware memory failure looks like. We followed the steps below after the
> node was down for more than the default value of *max_hint_window_in_ms*
>
>
>
> I tried to restart cassandra by following the steps @
>
>
>
>1.
>
> http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
>2.
>
> http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
>
> *except the "clear data" part as it was not specified in second blog
> above.*
>
>
>
> i was trying to restart the same node that went down, however I did not
> get the messages in log files as stated in 2 against "StorageService"
>
>
>
> instead it just tried to replay and then stopped with the error message as
> below:
>
>
>
> *ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584)
> Exception encountered during startup*
>
> *java.lang.RuntimeException: Cannot replace address with a node that is
> already bootstrapped*
>
>
>
> Can someone please help me if there is something i am doing wrong here.
>
>
>
> Thanks for the help in advance.
>
>
>
> Regards,
>
> Anishek
>
>
>


Re: handling down node cassandra 2.0.15

2015-11-16 Thread Anishek Agarwal
hey Anuj,

Ok I will try that next time, so you are saying since i am replacing the
machine in place(trying to get the same machine back in cluster) which
already has some data, I dont clean the commitlogs/data directories and set
auto_bootstrap = false and then restart the node, followed by repair on
this machine right ?

thanks
anishek

On Mon, Nov 16, 2015 at 11:40 PM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

> Hi Abhishek,
>
> In my opinion, you already have data and bootstrapping is not needed here.
> You can set auto_bootstrap to false in Cassandra.yaml and once the
> cassandra is rebooted, you should run repair to fix the inconsistent data.
>
>
> Thanks
> Anuj
>
>
>
> On Monday, 16 November 2015 10:34 PM, Josh Smith <
> josh.sm...@careerbuilder.com> wrote:
>
>
> Sis you set the JVM_OPTS to replace address? That is usually the error I
> get when I forget to set the replace_address on Cassandra-env.
>
> JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node
>
>
> *From:* Anishek Agarwal [mailto:anis...@gmail.com]
> *Sent:* Monday, November 16, 2015 9:25 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: handling down node cassandra 2.0.15
>
> nope its not
>
> On Mon, Nov 16, 2015 at 5:48 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> Is that a seed node?
>
> On Mon, Nov 16, 2015, 05:21 Anishek Agarwal <anis...@gmail.com> wrote:
>
> Hello,
>
> We are having a 3 node cluster and one of the node went down due to a
> hardware memory failure looks like. We followed the steps below after the
> node was down for more than the default value of *max_hint_window_in_ms*
>
> I tried to restart cassandra by following the steps @
>
>
>1.
>
> http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
>2.
>
> http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
>
> *except the "clear data" part as it was not specified in second blog
> above.*
>
> i was trying to restart the same node that went down, however I did not
> get the messages in log files as stated in 2 against "StorageService"
>
> instead it just tried to replay and then stopped with the error message as
> below:
>
> *ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584)
> Exception encountered during startup*
> *java.lang.RuntimeException: Cannot replace address with a node that is
> already bootstrapped*
>
> Can someone please help me if there is something i am doing wrong here.
>
> Thanks for the help in advance.
>
> Regards,
> Anishek
>
>
>
>
>


handling down node cassandra 2.0.15

2015-11-16 Thread Anishek Agarwal
Hello,

We are having a 3 node cluster and one of the node went down due to a
hardware memory failure looks like. We followed the steps below after the
node was down for more than the default value of *max_hint_window_in_ms*

I tried to restart cassandra by following the steps @


   1.
   
http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
   2.
   
http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html

*except the "clear data" part as it was not specified in second blog above.*

i was trying to restart the same node that went down, however I did not get
the messages in log files as stated in 2 against "StorageService"

instead it just tried to replay and then stopped with the error message as
below:

*ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584)
Exception encountered during startup*
*java.lang.RuntimeException: Cannot replace address with a node that is
already bootstrapped*

Can someone please help me if there is something i am doing wrong here.

Thanks for the help in advance.

Regards,
Anishek


Re: terrible read/write latency fluctuation

2015-10-30 Thread Anishek Agarwal
if its some sort of timeseries DTCS might turn out to be better for
compaction. also some disk monitoring might help to understand if disk is
the bottleneck.

On Sun, Oct 25, 2015 at 3:47 PM, 曹志富  wrote:

> I will try to trace a read that take > 20msec
> .
>
> just HDD.no delete just 60days ttl.value size is small ,max length is 140.
>
>
> My data like Time Series . date of 90% reads which timestamp < 7days.
> data almost just insert ,with a lit update.
>


Re: compaction with LCS

2015-10-11 Thread Anishek Agarwal
Anyone has seen similar behavior with LCS, please do let me know, It will
be good to know this can happen.


On Fri, Oct 9, 2015 at 5:19 PM, Anishek Agarwal <anis...@gmail.com> wrote:

> Looks like some of the nodes have higher sstables on L0 and compaction is
> running there, so only few nodes run compaction at a time and the
> preference is given to lower level nodes for compaction before going to
> higher levels ? so is compaction cluster aware then ?
>
>
> On Fri, Oct 9, 2015 at 5:17 PM, Anishek Agarwal <anis...@gmail.com> wrote:
>
>> hello,
>>
>> on doing cfstats for the column family i see
>>
>> SSTables in each level: [1, 10, 109/100, 1, 0, 0, 0, 0, 0]
>>
>> i thought compaction would trigger since the 3rd level tables are move
>> than expected number,
>>
>> but on doing compactionstats its shows "n/a" -- any reason why its not
>> triggering, should i be worried ?
>>
>> we have 5 node cluster running 2.0.15 cassandra version,
>>
>> thanks
>> anishek
>>
>
>


Re: compaction with LCS

2015-10-09 Thread Anishek Agarwal
Looks like some of the nodes have higher sstables on L0 and compaction is
running there, so only few nodes run compaction at a time and the
preference is given to lower level nodes for compaction before going to
higher levels ? so is compaction cluster aware then ?


On Fri, Oct 9, 2015 at 5:17 PM, Anishek Agarwal <anis...@gmail.com> wrote:

> hello,
>
> on doing cfstats for the column family i see
>
> SSTables in each level: [1, 10, 109/100, 1, 0, 0, 0, 0, 0]
>
> i thought compaction would trigger since the 3rd level tables are move
> than expected number,
>
> but on doing compactionstats its shows "n/a" -- any reason why its not
> triggering, should i be worried ?
>
> we have 5 node cluster running 2.0.15 cassandra version,
>
> thanks
> anishek
>


compaction with LCS

2015-10-09 Thread Anishek Agarwal
hello,

on doing cfstats for the column family i see

SSTables in each level: [1, 10, 109/100, 1, 0, 0, 0, 0, 0]

i thought compaction would trigger since the 3rd level tables are move than
expected number,

but on doing compactionstats its shows "n/a" -- any reason why its not
triggering, should i be worried ?

we have 5 node cluster running 2.0.15 cassandra version,

thanks
anishek


DTCS dropping of SST Tables

2015-07-07 Thread Anishek Agarwal
Hey all,

We are using DTCS and we have a ttl of 30 days for all inserts, there are
no deletes/updates we do.
When the SST tables is dropped by DTCS what kind of logging do we see in C*
logs.

any help would be useful. The reason is my db size is not hovering around a
size it is increasing, there has been no significant change in traffic that
creates data in C*.

thanks
anishek


Strategy tools for taking snapshots to load in another cluster instance

2015-11-18 Thread Anishek Agarwal
Hello

We have 5 node prod cluster and 3 node test cluster. Is there a way i can
take snapshot of a table in prod and load it test cluster. The cassandra
versions are same.

Even if there is a tool that can help with this it will be great.

If not, how do people handle scenarios where data in prod is required in
staging/test clusters for testing to make sure things are correct ? Does
the cluster size have to be same to allow copying of relevant snapshot data
etc?


thanks
anishek


Re: handling down node cassandra 2.0.15

2015-11-18 Thread Anishek Agarwal
@Rob interesting something i will try next time, for step 3 you mentioned
-- I just remove the -Dcassandra.join_ring=false option and restart the
cassandra service?

@Anuj, gc_grace_seconds dictates how long hinted handoff are stored right.
These might be good where we explicitly delete values from the table. we
just have ttl and DTCS should delete data older than 1 month. In this case
do i need to wipe the node and then start copy of key space again ? or can
i run a repair once it joins the right with auto_bootstrap=false.



On Wed, Nov 18, 2015 at 1:20 AM, Robert Coli  wrote:

> On Tue, Nov 17, 2015 at 4:33 AM, Anuj Wadehra 
> wrote:
>
>> Only if gc_grace_seconds havent passed since the failure. If your machine
>> is down for more than gc_grace_seconds you need to delete the data
>> directory and go with auto bootstrap = true .
>>
>
> Since CASSANDRA-6961 you can :
>
> 1) bring up the node with join_ring=false
> 2) repair it
> 3) join it to the cluster
>
> https://issues.apache.org/jira/browse/CASSANDRA-6961
>
> This prevents you from decreasing your unique replica count, which is
> usually a good thing!
>
> =Rob
>


Re: Strategy tools for taking snapshots to load in another cluster instance

2015-11-24 Thread Anishek Agarwal
Peer,

that talks about having a similar sized cluster, I was wondering if there
is a way for moving from larger to smaller cluster. I will try a few things
as soon as i get time and update here.

On Thu, Nov 19, 2015 at 5:48 PM, Peer, Oded <oded.p...@rsa.com> wrote:

> Have you read the DataStax documentation?
>
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html
>
>
>
>
>
> *From:* Romain Hardouin [mailto:romainh...@yahoo.fr]
> *Sent:* Wednesday, November 18, 2015 3:59 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Strategy tools for taking snapshots to load in another
> cluster instance
>
>
>
> You can take a snapshot via nodetool then load sstables on your test
> cluster with sstableloader:
> docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html
>
>
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> --
>
> *From*:"Anishek Agarwal" <anis...@gmail.com>
> *Date*:Wed, Nov 18, 2015 at 11:24
> *Subject*:Strategy tools for taking snapshots to load in another cluster
> instance
>
> Hello
>
>
>
> We have 5 node prod cluster and 3 node test cluster. Is there a way i can
> take snapshot of a table in prod and load it test cluster. The cassandra
> versions are same.
>
>
>
> Even if there is a tool that can help with this it will be great.
>
>
>
> If not, how do people handle scenarios where data in prod is required in
> staging/test clusters for testing to make sure things are correct ? Does
> the cluster size have to be same to allow copying of relevant snapshot data
> etc?
>
>
>
>
>
> thanks
>
> anishek
>
>
>


Re: High Bloom filter false ratio

2016-02-23 Thread Anishek Agarwal
Looks like that sstablemetadata is available in 2.2 , we are on 2.0.x do
you know anything that will work on 2.0.x

On Tue, Feb 23, 2016 at 1:48 PM, Anishek Agarwal <anis...@gmail.com> wrote:

> Thanks Jeff, Awesome will look at the tools and JMX endpoint.
>
> our settings are below originated from the jira you posted above as the
> base. we are running on 48 core machines with 2 SSD disks of 800 GB each .
>
> MAX_HEAP_SIZE="6G"
>
> HEAP_NEWSIZE="4G"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6"
>
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=4"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>
> JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
>
> JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>
> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48"
>
> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48"
>
> JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent"
>
> JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>
> JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>
> # earlier value 131072
>
> JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32678"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32678"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32678"
>
>
> On Tue, Feb 23, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
>> There exists a JMX endpoint called forceUserDefinedCompaction that takes
>> a comma separated list of sstables to compact together.
>>
>> There also exists a tool called sstablemetadata (may be in a
>> ‘cassandra-tools’ package separate from whatever package you used to
>> install cassandra, or in the tools/ directory of your binary package).
>> Using sstablemetadata, you can look at the maxTimestamp for each table, and
>> the ‘Estimated droppable tombstones’. Using those two fields, you could,
>> very easily, write a script that gives you a list of sstables that you
>> could feed to forceUserDefinedCompaction to join together to eliminate
>> leftover waste.
>>
>> Your long ParNew times may be fixable by increasing the new gen size of
>> your heap – the general guidance in cassandra-env.sh is out of date, you
>> may want to reference CASSANDRA-8150 for “newer” advice (
>> http://issues.apache.org/jira/browse/CASSANDRA-8150 )
>>
>> - Jeff
>>
>> From: Anishek Agarwal
>> Reply-To: "user@cassandra.apache.org"
>> Date: Monday, February 22, 2016 at 8:33 PM
>>
>> To: "user@cassandra.apache.org"
>> Subject: Re: High Bloom filter false ratio
>>
>> Hey Jeff,
>>
>> Thanks for the clarification, I did not explain my self clearly, the 
>> max_stable_age_days
>> is set to 30 days and the ttl on every insert is set to 30 days also
>> by default. gc_grace_seconds is 0, so i would think the sstable as a whole
>> would be deleted.
>>
>> Because of the problems mentioned by at 1) above it looks like, there
>> might be cases where the table just lies around since no compaction is
>> happening on it and even though everything is expired it would still not be
>> deleted?
>>
>> for 3) the average read is pretty good, though the throughput doesn't
>> seem to be that great, when no repair is running we get GCIns > 200ms every
>> couple of hours once, otherwise its every 10-20 mins
>>
>> INFO [ScheduledTasks:1] 2016-02-23 05:15:03,070 GCInspector.java (line
>> 116) GC for ParNew: 205 ms for 1 collections, 1712439128 used; max is
>> 7784628224
>>
>>  INFO [ScheduledTasks:1] 2016-02-23 08:30:47,709 GCInspector.java (line
>> 116) GC for ParNew: 242 ms for 1 collections, 1819126928 used; max is
>> 7784628224
>>
>>  INFO [ScheduledTasks:1] 2016-02-23 09:09:55,085 GCInspector.java (line
>> 116) GC for ParNew: 374 ms for 1 collections, 1829660304 used; max is
>> 7784628224
>>
>>  INFO [ScheduledTasks:1] 2016-02-23 09:11:21,245 GCInspector.java (line
>&

Re: High Bloom filter false ratio

2016-02-23 Thread Anishek Agarwal
Thanks Jeff, Awesome will look at the tools and JMX endpoint.

our settings are below originated from the jira you posted above as the
base. we are running on 48 core machines with 2 SSD disks of 800 GB each .

MAX_HEAP_SIZE="6G"

HEAP_NEWSIZE="4G"

JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"

JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"

JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"

JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6"

JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=4"

JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"

JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"

JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"

JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"

JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"

JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"

JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48"

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48"

JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent"

JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"

JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"

JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"

# earlier value 131072

JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32678"

JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600"

JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32678"

JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32678"


On Tue, Feb 23, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> There exists a JMX endpoint called forceUserDefinedCompaction that takes a
> comma separated list of sstables to compact together.
>
> There also exists a tool called sstablemetadata (may be in a
> ‘cassandra-tools’ package separate from whatever package you used to
> install cassandra, or in the tools/ directory of your binary package).
> Using sstablemetadata, you can look at the maxTimestamp for each table, and
> the ‘Estimated droppable tombstones’. Using those two fields, you could,
> very easily, write a script that gives you a list of sstables that you
> could feed to forceUserDefinedCompaction to join together to eliminate
> leftover waste.
>
> Your long ParNew times may be fixable by increasing the new gen size of
> your heap – the general guidance in cassandra-env.sh is out of date, you
> may want to reference CASSANDRA-8150 for “newer” advice (
> http://issues.apache.org/jira/browse/CASSANDRA-8150 )
>
> - Jeff
>
> From: Anishek Agarwal
> Reply-To: "user@cassandra.apache.org"
> Date: Monday, February 22, 2016 at 8:33 PM
>
> To: "user@cassandra.apache.org"
> Subject: Re: High Bloom filter false ratio
>
> Hey Jeff,
>
> Thanks for the clarification, I did not explain my self clearly, the 
> max_stable_age_days
> is set to 30 days and the ttl on every insert is set to 30 days also
> by default. gc_grace_seconds is 0, so i would think the sstable as a whole
> would be deleted.
>
> Because of the problems mentioned by at 1) above it looks like, there
> might be cases where the table just lies around since no compaction is
> happening on it and even though everything is expired it would still not be
> deleted?
>
> for 3) the average read is pretty good, though the throughput doesn't seem
> to be that great, when no repair is running we get GCIns > 200ms every
> couple of hours once, otherwise its every 10-20 mins
>
> INFO [ScheduledTasks:1] 2016-02-23 05:15:03,070 GCInspector.java (line
> 116) GC for ParNew: 205 ms for 1 collections, 1712439128 used; max is
> 7784628224
>
>  INFO [ScheduledTasks:1] 2016-02-23 08:30:47,709 GCInspector.java (line
> 116) GC for ParNew: 242 ms for 1 collections, 1819126928 used; max is
> 7784628224
>
>  INFO [ScheduledTasks:1] 2016-02-23 09:09:55,085 GCInspector.java (line
> 116) GC for ParNew: 374 ms for 1 collections, 1829660304 used; max is
> 7784628224
>
>  INFO [ScheduledTasks:1] 2016-02-23 09:11:21,245 GCInspector.java (line
> 116) GC for ParNew: 419 ms for 1 collections, 2309875224 used; max is
> 7784628224
>
>  INFO [ScheduledTasks:1] 2016-02-23 09:35:50,717 GCInspector.java (line
> 116) GC for ParNew: 231 ms for 1 collections, 2515325328 used; max is
> 7784628224
>
>  INFO [ScheduledTasks:1] 2016-02-23 09:38:47,194 GCInspector.java (line
> 116) GC for ParNew: 252 ms for 1 collections, 1724241952 used; max is
> 7784628224
>
>
> our reading patterns are dependent on BF to work efficiently as we do a
> lot of reads for keys that may not exists because its time series and
> we segregate data based on hourly boundary from epoch.
>
>
> hey Christoper,
>
> yes eve

Re: Cassandra nodes reduce disks per node

2016-02-25 Thread Anishek Agarwal
perational point of view (very long operation + repair needed)
>>
>> Hope this long email will be useful, maybe should I blog about this. Let
>> me know if the process above makes sense or if some things might be
>> improved.
>>
>> C*heers,
>> -
>> Alain Rodriguez
>> France
>>
>> The Last Pickle
>> http://www.thelastpickle.com
>>
>> 2016-02-19 7:19 GMT+01:00 Branton Davis <branton.da...@spanning.com>:
>>
>>> Jan, thanks!  That makes perfect sense to run a second time before
>>> stopping cassandra.  I'll add that in when I do the production cluster.
>>>
>>> On Fri, Feb 19, 2016 at 12:16 AM, Jan Kesten <j.kes...@enercast.de>
>>> wrote:
>>>
>>>> Hi Branton,
>>>>
>>>> two cents from me - I didnt look through the script, but for the rsyncs
>>>> I do pretty much the same when moving them. Since they are immutable I do a
>>>> first sync while everything is up and running to the new location which
>>>> runs really long. Meanwhile new ones are created and I sync them again
>>>> online, much less files to copy now. After that I shutdown the node and my
>>>> last rsync now has to copy only a few files which is quite fast and so the
>>>> downtime for that node is within minutes.
>>>>
>>>> Jan
>>>>
>>>>
>>>>
>>>> Von meinem iPhone gesendet
>>>>
>>>> Am 18.02.2016 um 22:12 schrieb Branton Davis <
>>>> branton.da...@spanning.com>:
>>>>
>>>> Alain, thanks for sharing!  I'm confused why you do so many repetitive
>>>> rsyncs.  Just being cautious or is there another reason?  Also, why do you
>>>> have --delete-before when you're copying data to a temp (assumed empty)
>>>> directory?
>>>>
>>>> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ <arodr...@gmail.com>
>>>> wrote:
>>>>
>>>>> I did the process a few weeks ago and ended up writing a runbook and a
>>>>> script. I have anonymised and share it fwiw.
>>>>>
>>>>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>>>>>
>>>>> It is basic bash. I tried to have the shortest down time possible,
>>>>> making this a bit more complex, but it allows you to do a lot in parallel
>>>>> and just do a fast operation sequentially, reducing overall operation 
>>>>> time.
>>>>>
>>>>> This worked fine for me, yet I might have make some errors while
>>>>> making it configurable though variables. Be sure to be around if you 
>>>>> decide
>>>>> to run this. Also I automated this more by using knife (Chef), I hate to
>>>>> repeat ops, this is something you might want to consider.
>>>>>
>>>>> Hope this is useful,
>>>>>
>>>>> C*heers,
>>>>> -
>>>>> Alain Rodriguez
>>>>> France
>>>>>
>>>>> The Last Pickle
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal <anis...@gmail.com>:
>>>>>
>>>>>> Hey Branton,
>>>>>>
>>>>>> Please do let us know if you face any problems  doing this.
>>>>>>
>>>>>> Thanks
>>>>>> anishek
>>>>>>
>>>>>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis <
>>>>>> branton.da...@spanning.com> wrote:
>>>>>>
>>>>>>> We're about to do the same thing.  It shouldn't be necessary to shut
>>>>>>> down the entire cluster, right?
>>>>>>>
>>>>>>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli <rc...@eventbrite.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal <
>>>>>>>> anis...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> To accomplish this can I just copy the data from disk1 to disk2
>>>>>>>>> with in the relevant cassandra home location folders, change the
>>>>>>>>> cassanda.yaml configuration and restart the node. before starting i 
>>>>>>>>> will
>>>>>>>>> shutdown the cluster.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>> =Rob
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: High Bloom filter false ratio

2016-02-22 Thread Anishek Agarwal
Hey Jeff,

Thanks for the clarification, I did not explain my self clearly, the
max_stable_age_days
is set to 30 days and the ttl on every insert is set to 30 days also
by default. gc_grace_seconds is 0, so i would think the sstable as a whole
would be deleted.

Because of the problems mentioned by at 1) above it looks like, there might
be cases where the table just lies around since no compaction is happening
on it and even though everything is expired it would still not be deleted?

for 3) the average read is pretty good, though the throughput doesn't seem
to be that great, when no repair is running we get GCIns > 200ms every
couple of hours once, otherwise its every 10-20 mins

INFO [ScheduledTasks:1] 2016-02-23 05:15:03,070 GCInspector.java (line 116)
GC for ParNew: 205 ms for 1 collections, 1712439128 used; max is 7784628224

 INFO [ScheduledTasks:1] 2016-02-23 08:30:47,709 GCInspector.java (line
116) GC for ParNew: 242 ms for 1 collections, 1819126928 used; max is
7784628224

 INFO [ScheduledTasks:1] 2016-02-23 09:09:55,085 GCInspector.java (line
116) GC for ParNew: 374 ms for 1 collections, 1829660304 used; max is
7784628224

 INFO [ScheduledTasks:1] 2016-02-23 09:11:21,245 GCInspector.java (line
116) GC for ParNew: 419 ms for 1 collections, 2309875224 used; max is
7784628224

 INFO [ScheduledTasks:1] 2016-02-23 09:35:50,717 GCInspector.java (line
116) GC for ParNew: 231 ms for 1 collections, 2515325328 used; max is
7784628224

 INFO [ScheduledTasks:1] 2016-02-23 09:38:47,194 GCInspector.java (line
116) GC for ParNew: 252 ms for 1 collections, 1724241952 used; max is
7784628224


our reading patterns are dependent on BF to work efficiently as we do a
lot of reads for keys that may not exists because its time series and
we segregate data based on hourly boundary from epoch.


hey Christoper,

yes every row in the stable that should have been deleted has "d" in that
column. Also the key for one of the row is as

"key": "00080cdd5edd080006251000"



how do i get it back to normal readable format to get the (long,long) --
composite partition key back?

Looks like i have to force a major compaction to delete a lot of data ? are
there any other solutions ?

thanks
anishek



On Mon, Feb 22, 2016 at 11:21 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> 1) getFullyExpiredSSTables in 2.0 isn’t as thorough as many expect, so
> it’s very likely that some sstables stick around longer than you expect.
>
> 2) max_sstable_age_days tells cassandra when to stop compacting that file,
> not when to delete it.
>
> 3) You can change the window size using both the base_time_seconds
> parameter and max_sstable_age_days parameter (use the former to set the
> size of the first window, and the latter to determine how long before you
> stop compacting that window). It’s somewhat non-intuitive.
>
> Your read latencies actually look pretty reasonable, are you sure you’re
> not simply hitting GC pauses that cause your queries to run longer than you
> expect? Do you have graphs of GC time (first derivative of total gc time is
> common for tools like graphite), or do you see ‘gcinspector’ in your logs
> indicating pauses > 200ms?
>
> From: Anishek Agarwal
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, February 21, 2016 at 11:13 PM
> To: "user@cassandra.apache.org"
> Subject: Re: High Bloom filter false ratio
>
> Hey guys,
>
> Just did some more digging ... looks like DTCS is not removing old data
> completely, I used sstable2json for one such table and saw old data there.
> we have a value of 30 for  max_stable_age_days for the table.
>
> One of the columns showed data as :["2015-12-10 11\\:03+0530:",
> "56690ea2", 1449725602552000, "d"] what is the meaning of "d" in the last
> IS_MARKED_FOR_DELETE column ?
>
> I see data from 10 dec 2015 still there, looks like there are a few issues
> with DTCS, Operationally what choices do i have to rectify this, We are on
> version 2.0.15.
>
> thanks
> anishek
>
>
>
>
> On Mon, Feb 22, 2016 at 10:23 AM, Anishek Agarwal <anis...@gmail.com>
> wrote:
>
>> We are using DTCS have a 30 day window for them before they are cleaned
>> up. I don't think with DTCS we can do anything about table sizing. Please
>> do let me know if there are other ideas.
>>
>> On Sat, Feb 20, 2016 at 12:51 AM, Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>>> To me following three looks on higher side:
>>> SSTable count: 1289
>>>
>>> In order to reduce SSTable count see if you are compacting of not (If
>>> using STCS). Is it possible to change this to LCS?
>>>
>>>
>>> Number of keys (estimate): 345137664 (345M

Ops Centre Read Requests / TBL: Local Read Requests

2016-02-15 Thread Anishek Agarwal
Hello,

I have installed Ops center 5.2.3 along with agents on three cassandra
nodes in my test cluster version 2.0.15. This has two tables in one
keyspace. I have a program that is reading values only from one of the
tables(table1) with in a keyspace.

I am looking at two graphs

   - Read Requests across Cluster  -- (1)
   - TBL: Local Read across Cluster for table1  -- (2)

I find that the (2) is having higher numbers than (1) almost twice as much,
is there something i am measuring wrong? i would think (1) would always be
higher than (2) .

table1 has

   - has a composite partition key (long,long)
   - has a single clustering key (text)


thanks
Anishek


Re: Ops Centre Read Requests / TBL: Local Read Requests

2016-02-15 Thread Anishek Agarwal
Looks like (1) -- is analogous to client read requests, so if i do a
request with LOCAL_QUORUM consistency level then (2) would be higher since
the coordinator would send two requests out for every single read request
it receives, is there any other possibility for the above behaviour ?


On Mon, Feb 15, 2016 at 4:21 PM, Anishek Agarwal <anis...@gmail.com> wrote:

> Hello,
>
> I have installed Ops center 5.2.3 along with agents on three cassandra
> nodes in my test cluster version 2.0.15. This has two tables in one
> keyspace. I have a program that is reading values only from one of the
> tables(table1) with in a keyspace.
>
> I am looking at two graphs
>
>- Read Requests across Cluster  -- (1)
>- TBL: Local Read across Cluster for table1  -- (2)
>
> I find that the (2) is having higher numbers than (1) almost twice as
> much, is there something i am measuring wrong? i would think (1) would
> always be higher than (2) .
>
> table1 has
>
>- has a composite partition key (long,long)
>- has a single clustering key (text)
>
>
> thanks
> Anishek
>


Cassandra nodes reduce disks per node

2016-02-17 Thread Anishek Agarwal
Hello,

We started with two 800GB SSD on each cassandra node based on our initial
estimations of read/write rate. As we started on boarding additional
traffic we find that CPU is becoming a bottleneck and we are not able to
run the NICE jobs like compaction very well. We have started expanding the
cluster and this would lead to less data per node. It looks like at this
point once we expand the cluster, the current 2 X 800 GB SSD will be too
much and it might be better to have just one SSD.

To accomplish this can I just copy the data from disk1 to disk2 with in the
relevant cassandra home location folders, change the cassanda.yaml
configuration and restart the node. before starting i will shutdown the
cluster.

Thanks
anishek


Re: Cassandra nodes reduce disks per node

2016-02-17 Thread Anishek Agarwal
Additional note we are using cassandra 2.0.15 have 5 nodes in cluster ,
going to expand to 8 nodes.

On Wed, Feb 17, 2016 at 12:59 PM, Anishek Agarwal <anis...@gmail.com> wrote:

> Hello,
>
> We started with two 800GB SSD on each cassandra node based on our initial
> estimations of read/write rate. As we started on boarding additional
> traffic we find that CPU is becoming a bottleneck and we are not able to
> run the NICE jobs like compaction very well. We have started expanding the
> cluster and this would lead to less data per node. It looks like at this
> point once we expand the cluster, the current 2 X 800 GB SSD will be too
> much and it might be better to have just one SSD.
>
> To accomplish this can I just copy the data from disk1 to disk2 with in
> the relevant cassandra home location folders, change the cassanda.yaml
> configuration and restart the node. before starting i will shutdown the
> cluster.
>
> Thanks
> anishek
>


Re: Cassandra nodes reduce disks per node

2016-02-17 Thread Anishek Agarwal
Hey Branton,

Please do let us know if you face any problems  doing this.

Thanks
anishek

On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis <branton.da...@spanning.com>
wrote:

> We're about to do the same thing.  It shouldn't be necessary to shut down
> the entire cluster, right?
>
> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli <rc...@eventbrite.com>
> wrote:
>
>>
>>
>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal <anis...@gmail.com>
>> wrote:
>>>
>>> To accomplish this can I just copy the data from disk1 to disk2 with in
>>> the relevant cassandra home location folders, change the cassanda.yaml
>>> configuration and restart the node. before starting i will shutdown the
>>> cluster.
>>>
>>
>> Yes.
>>
>> =Rob
>>
>>
>
>


High Bloom filter false ratio

2016-02-17 Thread Anishek Agarwal
Hello,

We have a table with composite partition key with humungous cardinality,
its a combination of (long,long). On the table we have
bloom_filter_fp_chance=0.01.

On doing "nodetool cfstats" on the 5 nodes we have in the cluster we are
seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.

I thought over time the bloom filter would adjust to the key space
cardinality, we have been running the cluster for a long time now but have
added significant traffic from Jan this year, which would not lead to
writes in the db but would lead to high reads to see if are any values.

Are there any settings that can be changed to allow better ratio.

Thanks
Anishek


Re: High Bloom filter false ratio

2016-02-18 Thread Anishek Agarwal
Hey all,

@Jaydeep here is the cfstats output from one node.

Read Count: 1721134722

Read Latency: 0.04268825050756254 ms.

Write Count: 56743880

Write Latency: 0.014650376727851532 ms.

Pending Tasks: 0

Table: user_stay_points

SSTable count: 1289

Space used (live), bytes: 122141272262

Space used (total), bytes: 224227850870

Off heap memory used (total), bytes: 653827528

SSTable Compression Ratio: 0.4959736121441446

Number of keys (estimate): 345137664

Memtable cell count: 339034

Memtable data size, bytes: 106558314

Memtable switch count: 3266

Local read count: 1721134803

Local read latency: 0.048 ms

Local write count: 56743898

Local write latency: 0.018 ms

Pending tasks: 0

Bloom filter false positives: 40664437

Bloom filter false ratio: 0.69058

Bloom filter space used, bytes: 493777336

Bloom filter off heap memory used, bytes: 493767024

Index summary off heap memory used, bytes: 91677192

Compression metadata off heap memory used, bytes: 68383312

Compacted partition minimum bytes: 104

Compacted partition maximum bytes: 1629722

Compacted partition mean bytes: 1773

Average live cells per slice (last five minutes): 0.0

Average tombstones per slice (last five minutes): 0.0


@Tyler Hobbs

we are using cassandra 2.0.15 so
https://issues.apache.org/jira/browse/CASSANDRA-8525  shouldnt occur. Other
problems looks like will be fixed in 3.0 .. we will mostly try and slot in
an upgrade to 3.x version towards second quarter of this year.


@Daemon

Latencies seem to have higher ratios, attached is the graph.


I am mostly trying to look at Bloom filters, because the way we do reads,
we read data with non existent partition keys and it seems to be taking
long to respond, like for 720 queries it takes 2 seconds, with all 721
queries not returning anything. the 720 queries are done in sequence of 180
queries each with 180 of them running in parallel.


thanks

anishek



On Fri, Feb 19, 2016 at 3:09 AM, Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> How many partition keys exists for the table which shows this problem (or
> provide nodetool cfstats for that table)?
>
> On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> The bloom filter buckets the values in a small number of buckets. I have
>> been surprised by how many cases I see with large cardinality where a few
>> values populate a given bloom leaf, resulting in high false positives, and
>> a surprising impact on latencies!
>>
>> Are you seeing 2:1 ranges between mean and worse case latencies (allowing
>> for gc times)?
>>
>> Daemeon Reiydelle
>> On Feb 18, 2016 8:57 AM, "Tyler Hobbs" <ty...@datastax.com> wrote:
>>
>>> You can try slightly lowering the bloom_filter_fp_chance on your table.
>>>
>>> Otherwise, it's possible that you're repeatedly querying one or two
>>> partitions that always trigger a bloom filter false positive.  You could
>>> try manually tracing a few queries on this table (for non-existent
>>> partitions) to see if the bloom filter rejects them.
>>>
>>> Depending on your Cassandra version, your false positive ratio could be
>>> inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525
>>>
>>> There are also a couple of recent improvements to bloom filters:
>>> * https://issues.apache.org/jira/browse/CASSANDRA-8413
>>> * https://issues.apache.org/jira/browse/CASSANDRA-9167
>>>
>>>
>>> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal <anis...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> We have a table with composite partition key with humungous
>>>> cardinality, its a combination of (long,long). On the table we have
>>>> bloom_filter_fp_chance=0.01.
>>>>
>>>> On doing "nodetool cfstats" on the 5 nodes we have in the cluster we
>>>> are seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.
>>>>
>>>> I thought over time the bloom filter would adjust to the key space
>>>> cardinality, we have been running the cluster for a long time now but have
>>>> added significant traffic from Jan this year, which would not lead to
>>>> writes in the db but would lead to high reads to see if are any values.
>>>>
>>>> Are there any settings that can be changed to allow better ratio.
>>>>
>>>> Thanks
>>>> Anishek
>>>>
>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax <http://datastax.com/>
>>>
>>
>


Re: High Bloom filter false ratio

2016-02-21 Thread Anishek Agarwal
Hey guys,

Just did some more digging ... looks like DTCS is not removing old data
completely, I used sstable2json for one such table and saw old data there.
we have a value of 30 for  max_stable_age_days for the table.

One of the columns showed data as :["2015-12-10 11\\:03+0530:", "56690ea2",
1449725602552000, "d"] what is the meaning of "d" in the last
IS_MARKED_FOR_DELETE column ?

I see data from 10 dec 2015 still there, looks like there are a few issues
with DTCS, Operationally what choices do i have to rectify this, We are on
version 2.0.15.

thanks
anishek




On Mon, Feb 22, 2016 at 10:23 AM, Anishek Agarwal <anis...@gmail.com> wrote:

> We are using DTCS have a 30 day window for them before they are cleaned
> up. I don't think with DTCS we can do anything about table sizing. Please
> do let me know if there are other ideas.
>
> On Sat, Feb 20, 2016 at 12:51 AM, Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> To me following three looks on higher side:
>> SSTable count: 1289
>>
>> In order to reduce SSTable count see if you are compacting of not (If
>> using STCS). Is it possible to change this to LCS?
>>
>>
>> Number of keys (estimate): 345137664 (345M partition keys)
>>
>> I don't have any suggestion about reducing this unless you partition your
>> data.
>>
>>
>> Bloom filter space used, bytes: 493777336 (400MB is huge)
>>
>> If number of keys are reduced then this will automatically reduce bloom
>> filter size I believe.
>>
>>
>>
>> Jaydeep
>>
>> On Thu, Feb 18, 2016 at 7:52 PM, Anishek Agarwal <anis...@gmail.com>
>> wrote:
>>
>>> Hey all,
>>>
>>> @Jaydeep here is the cfstats output from one node.
>>>
>>> Read Count: 1721134722
>>>
>>> Read Latency: 0.04268825050756254 ms.
>>>
>>> Write Count: 56743880
>>>
>>> Write Latency: 0.014650376727851532 ms.
>>>
>>> Pending Tasks: 0
>>>
>>> Table: user_stay_points
>>>
>>> SSTable count: 1289
>>>
>>> Space used (live), bytes: 122141272262
>>>
>>> Space used (total), bytes: 224227850870
>>>
>>> Off heap memory used (total), bytes: 653827528
>>>
>>> SSTable Compression Ratio: 0.4959736121441446
>>>
>>> Number of keys (estimate): 345137664
>>>
>>> Memtable cell count: 339034
>>>
>>> Memtable data size, bytes: 106558314
>>>
>>> Memtable switch count: 3266
>>>
>>> Local read count: 1721134803
>>>
>>> Local read latency: 0.048 ms
>>>
>>> Local write count: 56743898
>>>
>>> Local write latency: 0.018 ms
>>>
>>> Pending tasks: 0
>>>
>>> Bloom filter false positives: 40664437
>>>
>>> Bloom filter false ratio: 0.69058
>>>
>>> Bloom filter space used, bytes: 493777336
>>>
>>> Bloom filter off heap memory used, bytes: 493767024
>>>
>>> Index summary off heap memory used, bytes: 91677192
>>>
>>> Compression metadata off heap memory used, bytes: 68383312
>>>
>>> Compacted partition minimum bytes: 104
>>>
>>> Compacted partition maximum bytes: 1629722
>>>
>>> Compacted partition mean bytes: 1773
>>>
>>> Average live cells per slice (last five minutes): 0.0
>>>
>>> Average tombstones per slice (last five minutes): 0.0
>>>
>>>
>>> @Tyler Hobbs
>>>
>>> we are using cassandra 2.0.15 so
>>> https://issues.apache.org/jira/browse/CASSANDRA-8525  shouldnt occur.
>>> Other problems looks like will be fixed in 3.0 .. we will mostly try and
>>> slot in an upgrade to 3.x version towards second quarter of this year.
>>>
>>>
>>> @Daemon
>>>
>>> Latencies seem to have higher ratios, attached is the graph.
>>>
>>>
>>> I am mostly trying to look at Bloom filters, because the way we do
>>> reads, we read data with non existent partition keys and it seems to be
>>> taking long to respond, like for 720 queries it takes 2 seconds, with all
>>> 721 queries not returning anything. the 720 queries are done in
>>> sequence of 180 queries each with 180 of them running in parallel.
>>>
>>>
>>> thanks
>>>
>>> anishek
>>>
>>>
>>>
>>> On Fri, Feb 19, 2016 at 3:09 AM, Jaydeep Chovatia <
>>> chovatia.jayd...@gmail

Re: High Bloom filter false ratio

2016-02-21 Thread Anishek Agarwal
We are using DTCS have a 30 day window for them before they are cleaned up.
I don't think with DTCS we can do anything about table sizing. Please do
let me know if there are other ideas.

On Sat, Feb 20, 2016 at 12:51 AM, Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> To me following three looks on higher side:
> SSTable count: 1289
>
> In order to reduce SSTable count see if you are compacting of not (If
> using STCS). Is it possible to change this to LCS?
>
>
> Number of keys (estimate): 345137664 (345M partition keys)
>
> I don't have any suggestion about reducing this unless you partition your
> data.
>
>
> Bloom filter space used, bytes: 493777336 (400MB is huge)
>
> If number of keys are reduced then this will automatically reduce bloom
> filter size I believe.
>
>
>
> Jaydeep
>
> On Thu, Feb 18, 2016 at 7:52 PM, Anishek Agarwal <anis...@gmail.com>
> wrote:
>
>> Hey all,
>>
>> @Jaydeep here is the cfstats output from one node.
>>
>> Read Count: 1721134722
>>
>> Read Latency: 0.04268825050756254 ms.
>>
>> Write Count: 56743880
>>
>> Write Latency: 0.014650376727851532 ms.
>>
>> Pending Tasks: 0
>>
>> Table: user_stay_points
>>
>> SSTable count: 1289
>>
>> Space used (live), bytes: 122141272262
>>
>> Space used (total), bytes: 224227850870
>>
>> Off heap memory used (total), bytes: 653827528
>>
>> SSTable Compression Ratio: 0.4959736121441446
>>
>> Number of keys (estimate): 345137664
>>
>> Memtable cell count: 339034
>>
>> Memtable data size, bytes: 106558314
>>
>> Memtable switch count: 3266
>>
>> Local read count: 1721134803
>>
>> Local read latency: 0.048 ms
>>
>> Local write count: 56743898
>>
>> Local write latency: 0.018 ms
>>
>> Pending tasks: 0
>>
>> Bloom filter false positives: 40664437
>>
>> Bloom filter false ratio: 0.69058
>>
>> Bloom filter space used, bytes: 493777336
>>
>> Bloom filter off heap memory used, bytes: 493767024
>>
>> Index summary off heap memory used, bytes: 91677192
>>
>> Compression metadata off heap memory used, bytes: 68383312
>>
>> Compacted partition minimum bytes: 104
>>
>> Compacted partition maximum bytes: 1629722
>>
>> Compacted partition mean bytes: 1773
>>
>> Average live cells per slice (last five minutes): 0.0
>>
>> Average tombstones per slice (last five minutes): 0.0
>>
>>
>> @Tyler Hobbs
>>
>> we are using cassandra 2.0.15 so
>> https://issues.apache.org/jira/browse/CASSANDRA-8525  shouldnt occur.
>> Other problems looks like will be fixed in 3.0 .. we will mostly try and
>> slot in an upgrade to 3.x version towards second quarter of this year.
>>
>>
>> @Daemon
>>
>> Latencies seem to have higher ratios, attached is the graph.
>>
>>
>> I am mostly trying to look at Bloom filters, because the way we do reads,
>> we read data with non existent partition keys and it seems to be taking
>> long to respond, like for 720 queries it takes 2 seconds, with all 721
>> queries not returning anything. the 720 queries are done in sequence of
>> 180 queries each with 180 of them running in parallel.
>>
>>
>> thanks
>>
>> anishek
>>
>>
>>
>> On Fri, Feb 19, 2016 at 3:09 AM, Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>>> How many partition keys exists for the table which shows this problem
>>> (or provide nodetool cfstats for that table)?
>>>
>>> On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle <daeme...@gmail.com>
>>> wrote:
>>>
>>>> The bloom filter buckets the values in a small number of buckets. I
>>>> have been surprised by how many cases I see with large cardinality where a
>>>> few values populate a given bloom leaf, resulting in high false positives,
>>>> and a surprising impact on latencies!
>>>>
>>>> Are you seeing 2:1 ranges between mean and worse case latencies
>>>> (allowing for gc times)?
>>>>
>>>> Daemeon Reiydelle
>>>> On Feb 18, 2016 8:57 AM, "Tyler Hobbs" <ty...@datastax.com> wrote:
>>>>
>>>>> You can try slightly lowering the bloom_filter_fp_chance on your table.
>>>>>
>>>>> Otherwise, it's possible that you're repeatedly querying one or two
>>>>> partitions that always trigger a bloom filter false 

Multi DC setup for analytics

2016-03-14 Thread Anishek Agarwal
Hello,

We are using cassandra 2.0.17 and have two logical DC having different
Keyspaces but both having same logical name DC1.

we want to setup another cassandra cluster for analytics which should get
data from both the above DC.

if we setup the new DC with name DC2 and follow the steps
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
will it work ?

I would think we would have to first change the names of existing clusters
to have to different names and then go with adding another dc getting data
from these?

Also as soon as we add the node the data starts moving... this will all be
only real time changes done to the cluster right ? we still have to do the
rebuild to get the data for tokens for node in new cluster ?

Thanks
Anishek


repairs how do we schedule

2016-03-10 Thread Anishek Agarwal
Hello,

we used to run repair on each node using
https://github.com/BrianGallew/cassandra_range_repair.git. most of the time
repairs finished in under 12 hrs per node, we had then 4 nodes. gradually
the repair time kept increasing as traffic increased, we also added more
nodes meanwhile, we have 7 nodes now and repair on one node takes 3 days
almost for one CF we have 2 CF in there.

Can we schedule multiple repairs at the same time ? we don't delete data
explicitly, the are removed via TTL from one CF(using DTCS) and there is no
delete operation on other CF(using LCS).

thanks
anishek


Re: Traffic inconsistent across nodes

2016-04-12 Thread Anishek Agarwal
We have two DC one with the above 8 nodes and other with 3 nodes.



On Tue, Apr 12, 2016 at 8:06 PM, Eric Stevens <migh...@gmail.com> wrote:

> Maybe include nodetool status here?  Are the four nodes serving reads in
> one DC (local to your driver's config) while the others are in another?
>
> On Tue, Apr 12, 2016, 1:01 AM Anishek Agarwal <anis...@gmail.com> wrote:
>
>> hello,
>>
>> we have 8 nodes in one cluster and attached is the traffic patterns
>> across the nodes.
>>
>> its very surprising that only 4 nodes show transmitting (purple) packets.
>>
>> our driver configuration on clients has the following load balancing
>> configuration  :
>>
>> new TokenAwarePolicy(
>> new 
>> DCAwareRoundRobinPolicy(configuration.get(Constants.LOCAL_DATA_CENTRE_NAME, 
>> "WDC")),
>> true)
>>
>>
>> any idea what is that we are missing which is leading to this skewed data
>> read patterns
>>
>> cassandra drivers as below:
>>
>> 
>> com.datastax.cassandra
>> cassandra-driver-core
>> 2.1.6
>> 
>> 
>> com.datastax.cassandra
>> cassandra-driver-mapping
>> 2.1.6
>> 
>>
>> cassandra version is 2.0.17
>>
>> Thanks in advance for the help.
>>
>> Anishek
>>
>>


Re: disk space used vs nodetool status

2016-03-22 Thread Anishek Agarwal
Thanks Carlos,

We didn't do any actions that would create a snapshot, and i couldn't find
the command in 2.0.17, but i found the respective snapshot directories and
they were created from more than a couple of months ago so, i it might be
that i might have forgotten, its fine now, i have cleared them.

anishek

On Tue, Mar 22, 2016 at 3:20 PM, Carlos Alonso <i...@mrcalonso.com> wrote:

> I'd say you have snapshots holding disk space.
>
> Check it with nodetool listsnapshots. A snapshot is automatically taken on
> destructive actions (drop, truncate...) and is basically a hard link to the
> involved SSTables, so it's not considered as data load from Cassandra but
> it is effectively using disk space.
>
> Hope this helps.
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 22 March 2016 at 07:57, Anishek Agarwal <anis...@gmail.com> wrote:
>
>> Hello,
>>
>> Using cassandra 2.0.17  on one of the 7 nodes i see that the "Load"
>> column from nodetool status
>> shows around 279.34 GB where as doing df -h on the two mounted disks the
>> total is about 400GB any reason of why this difference could show up and
>> how do i go about finding the cause for this ?
>>
>> Thanks In Advance.
>> Anishek
>>
>
>


Re: Multi DC setup for analytics

2016-03-21 Thread Anishek Agarwal
Hey Clint,

we have two separate rings which don't talk to each other but both having
the same DC name "DCX".

@Raja,

We had already gone towards the path you suggested.

thanks all
anishek

On Fri, Mar 18, 2016 at 8:01 AM, Reddy Raja <areddyr...@gmail.com> wrote:

> Yes. Here are the steps.
> You will have to change the DC Names first.
> DC1 and DC2 would be independent clusters.
>
> Create a new DC, DC3 and include these two DC's on DC3.
>
> This should work well.
>
>
> On Thu, Mar 17, 2016 at 11:03 PM, Clint Martin <
> clintlmar...@coolfiretechnologies.com> wrote:
>
>> When you say you have two logical DC both with the same name are you
>> saying that you have two clusters of servers both with the same DC name,
>> nether of which currently talk to each other? IE they are two separate
>> rings?
>>
>> Or do you mean that you have two keyspaces in one cluster?
>>
>> Or?
>>
>> Clint
>> On Mar 14, 2016 2:11 AM, "Anishek Agarwal" <anis...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> We are using cassandra 2.0.17 and have two logical DC having different
>>> Keyspaces but both having same logical name DC1.
>>>
>>> we want to setup another cassandra cluster for analytics which should
>>> get data from both the above DC.
>>>
>>> if we setup the new DC with name DC2 and follow the steps
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>> will it work ?
>>>
>>> I would think we would have to first change the names of existing
>>> clusters to have to different names and then go with adding another dc
>>> getting data from these?
>>>
>>> Also as soon as we add the node the data starts moving... this will all
>>> be only real time changes done to the cluster right ? we still have to do
>>> the rebuild to get the data for tokens for node in new cluster ?
>>>
>>> Thanks
>>> Anishek
>>>
>>
>
>
> --
> "In this world, you either have an excuse or a story. I preferred to have
> a story"
>


Re: Lot of GC on two nodes out of 7

2016-03-03 Thread Anishek Agarwal
Hello,

Bryan, most of the partition sizes are under 45 KB

I have tried with concurrent_compactors : 8 for one of the nodes still no
improvement,
I have tried max_heap_Size : 8G, no improvement.

I will try the newHeapsize of 2G though i am sure CMS will be a longer then.

Also doesn't look like i mentioned what type of GC was causing the
problems. On both the nodes its the ParNewGC thats taking long for each run
and too many runs are happening in succession.

anishek


On Fri, Mar 4, 2016 at 5:36 AM, Bryan Cheng <br...@blockcypher.com> wrote:

> Hi Anishek,
>
> In addition to the good advice others have given, do you notice any
> abnormally large partitions? What does cfhistograms report for 99%
> partition size? A few huge partitions will cause very disproportionate load
> on your cluster, including high GC.
>
> --Bryan
>
> On Wed, Mar 2, 2016 at 9:28 AM, Amit Singh F <amit.f.si...@ericsson.com>
> wrote:
>
>> Hi Anishek,
>>
>>
>>
>> We too faced similar problem in 2.0.14 and after doing some research we
>> config few parameters in Cassandra.yaml and was able to overcome GC pauses
>> . Those are :
>>
>>
>>
>> · memtable_flush_writers : increased from 1 to 3 as from tpstats
>> output  we can see mutations dropped so it means writes are getting
>> blocked, so increasing number will have those catered.
>>
>> · memtable_total_space_in_mb : Default (1/4 of heap size), can
>> lowered because larger long lived objects will create pressure on HEAP, so
>> its better to reduce some amount of size.
>>
>> · Concurrent_compactors : Alain righlty pointed out this i.e
>> reduce it to 8. You need to try this.
>>
>>
>>
>> Also please check whether you have mutations drop in other nodes or not.
>>
>>
>>
>> Hope this helps in your cluster too.
>>
>>
>>
>> Regards
>>
>> Amit Singh
>>
>> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
>> *Sent:* Wednesday, March 02, 2016 9:33 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Lot of GC on two nodes out of 7
>>
>>
>>
>> Can you post a gist of the output of jstat -gccause (60 seconds worth)?
>> I think it's cool you're willing to experiment with alternative JVM
>> settings but I've never seen anyone use max tenuring threshold of 50 either
>> and I can't imagine it's helpful.  Keep in mind if your objects are
>> actually reaching that threshold it means they've been copied 50x (really
>> really slow) and also you're going to end up spilling your eden objects
>> directly into your old gen if your survivor is full.  Considering the small
>> amount of memory you're using for heap I'm really not surprised you're
>> running into problems.
>>
>>
>>
>> I recommend G1GC + 12GB heap and just let it optimize itself for almost
>> all cases with the latest JVM versions.
>>
>>
>>
>> On Wed, Mar 2, 2016 at 6:08 AM Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>>
>> It looks like you are doing a good work with this cluster and know a lot
>> about JVM, that's good :-).
>>
>>
>>
>> our machine configurations are : 2 X 800 GB SSD , 48 cores, 64 GB RAM
>>
>>
>>
>> That's good hardware too.
>>
>>
>>
>> With 64 GB of ram I would probably directly give a try to
>> `MAX_HEAP_SIZE=8G` on one of the 2 bad nodes probably.
>>
>>
>>
>> Also I would also probably try lowering `HEAP_NEWSIZE=2G.` and using
>> `-XX:MaxTenuringThreshold=15`, still on the canary node to observe the
>> effects. But that's just an idea of something I would try to see the
>> impacts, I don't think it will solve your current issues or even make it
>> worse for this node.
>>
>>
>>
>> Using G1GC would allow you to use a bigger Heap size. Using C*2.1 would
>> allow you to store the memtables off-heap. Those are 2 improvements
>> reducing the heap pressure that you might be interested in.
>>
>>
>>
>> I have spent time reading about all other options before including them
>> and a similar configuration on our other prod cluster is showing good GC
>> graphs via gcviewer.
>>
>>
>>
>> So, let's look for an other reason.
>>
>>
>>
>> there are MUTATION and READ messages dropped in high number on nodes in
>> question and on other 5 nodes it varies between 1-3.
>>
>>
>>
>> - Is Memory, CPU or disk a bottleneck? Is one of those running at the
>> limits?
>>
>>
>>
>> concurren

Re: Lot of GC on two nodes out of 7

2016-03-02 Thread Anishek Agarwal
Hey Jeff,

one of the nodes with high GC has 1400 SST tables, all other nodes have
about 500-900 SST tables. the other node with high GC has 636 SST tables.

the average row size for compacted partitions is about 1640 bytes on all
nodes. We have replication factor 3 but the problem is only on two nodes.
the only other thing that stands out in cfstats is the read time and write
time on the nodes with high GC is 5-7 times higher than other 5 nodes, but
i think thats expected.

thanks
anishek




On Wed, Mar 2, 2016 at 1:09 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> Compaction falling behind will likely cause additional work on reads (more
> sstables to merge), but I’d be surprised if it manifested in super long GC.
> When you say twice as many sstables, how many is that?.
>
> In cfstats, does anything stand out? Is max row size on those nodes larger
> than on other nodes?
>
> What you don’t show in your JVM options is the new gen size – if you do
> have unusually large partitions on those two nodes (especially likely if
> you have rf=2 – if you have rf=3, then there’s probably a third node
> misbehaving you haven’t found yet), then raising new gen size can help
> handle the garbage created by reading large partitions without having to
> tolerate the promotion. Estimates for the amount of garbage vary, but it
> could be “gigabytes” of garbage on a very wide partition (see
> https://issues.apache.org/jira/browse/CASSANDRA-9754 for work in progress
> to help mitigate that type of pain).
>
> - Jeff
>
> From: Anishek Agarwal
> Reply-To: "user@cassandra.apache.org"
> Date: Tuesday, March 1, 2016 at 11:12 PM
> To: "user@cassandra.apache.org"
> Subject: Lot of GC on two nodes out of 7
>
> Hello,
>
> we have a cassandra cluster of 7 nodes, all of them have the same JVM GC
> configurations, all our writes /  reads use the TokenAware Policy wrapping
> a DCAware policy. All nodes are part of same Datacenter.
>
> We are seeing that two nodes are having high GC collection times. Then
> mostly seem to spend time in GC like about 300-600 ms. This also seems to
> result in higher CPU utilisation on these machines. Other  5 nodes don't
> have this problem.
>
> There is no additional repair activity going on the cluster, we are not
> sure why this is happening.
> we checked cfhistograms on the two CF we have in the cluster and number of
> reads seems to be almost same.
>
> we also used cfstats to see the number of ssttables on each node and one
> of the nodes with the above problem has twice the number of ssttables than
> other nodes. This still doesnot explain why two nodes have high GC
> Overheads. our GC config is as below:
>
> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=50"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>
> JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
>
> JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>
> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48"
>
> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48"
>
> JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent"
>
> JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>
> JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>
> # earlier value 131072 = 32768 * 4
>
> JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=131072"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32768"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32768"
>
> #new
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>
> We are using cassandra 2.0.17. If anyone has any suggestion as to how what
> else we can look for to understand why this is happening please do reply.
>
>
>
> Thanks
> anishek
>
>
>


Lot of GC on two nodes out of 7

2016-03-01 Thread Anishek Agarwal
Hello,

we have a cassandra cluster of 7 nodes, all of them have the same JVM GC
configurations, all our writes /  reads use the TokenAware Policy wrapping
a DCAware policy. All nodes are part of same Datacenter.

We are seeing that two nodes are having high GC collection times. Then
mostly seem to spend time in GC like about 300-600 ms. This also seems to
result in higher CPU utilisation on these machines. Other  5 nodes don't
have this problem.

There is no additional repair activity going on the cluster, we are not
sure why this is happening.
we checked cfhistograms on the two CF we have in the cluster and number of
reads seems to be almost same.

we also used cfstats to see the number of ssttables on each node and one of
the nodes with the above problem has twice the number of ssttables than
other nodes. This still doesnot explain why two nodes have high GC
Overheads. our GC config is as below:

JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"

JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"

JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"

JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"

JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=50"

JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"

JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"

JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"

JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"

JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"

JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"

JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48"

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48"

JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent"

JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"

JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"

JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"

# earlier value 131072 = 32768 * 4

JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=131072"

JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600"

JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32768"

JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32768"

#new

JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"

We are using cassandra 2.0.17. If anyone has any suggestion as to how what
else we can look for to understand why this is happening please do reply.



Thanks
anishek


Re: Lot of GC on two nodes out of 7

2016-03-06 Thread Anishek Agarwal
@Jeff i was just trying to follow some more advice given above, I
personally still think a larger newGen heap size would be better.

@Johnathan I will post the whole logs, I have restarted the nodes with
additional changes most probably tomorrow or day after i will put out the
gc logs.

the problem still exists on two nodes. too much time spent in GC,
additionally I tried to print the state of cluster via my application to
see what is happening and i see that the node with high GC has a lot of
 "inflight Queries" -- almost 1100 and other nodes is all 0.

the cfhistograms for all nodes show the approx the same number of reads. --
so i am thinking the above phenomenon is happening since the node is
spending time in gc.

also looking at the Load Balancing policy on client its new
TokenAwarePolicy(new DCAwareRoundRobinPolicy())

if you have any other ideas please keep posting them.

thanks
anishek

On Sat, Mar 5, 2016 at 12:54 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Without looking at your GC logs (you never posted a gist), my assumption
> would be you're doing a lot of copying between survivor generations, and
> they're taking a long time.  You're probably also copying a lot of data to
> your old gen as a result of having full-ish survivor spaces to begin with.
>
> On Thu, Mar 3, 2016 at 10:26 PM Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
>> I’d personally would have gone the other way – if you’re seeing parnew,
>> increasing new gen instead of decreasing it should help drop (faster)
>> rather than promoting to sv/oldgen (slower) ?
>>
>>
>>
>> From: Anishek Agarwal
>> Reply-To: "user@cassandra.apache.org"
>> Date: Thursday, March 3, 2016 at 8:55 PM
>>
>> To: "user@cassandra.apache.org"
>> Subject: Re: Lot of GC on two nodes out of 7
>>
>> Hello,
>>
>> Bryan, most of the partition sizes are under 45 KB
>>
>> I have tried with concurrent_compactors : 8 for one of the nodes still no
>> improvement,
>> I have tried max_heap_Size : 8G, no improvement.
>>
>> I will try the newHeapsize of 2G though i am sure CMS will be a longer
>> then.
>>
>> Also doesn't look like i mentioned what type of GC was causing the
>> problems. On both the nodes its the ParNewGC thats taking long for each run
>> and too many runs are happening in succession.
>>
>> anishek
>>
>>
>> On Fri, Mar 4, 2016 at 5:36 AM, Bryan Cheng <br...@blockcypher.com>
>> wrote:
>>
>>> Hi Anishek,
>>>
>>> In addition to the good advice others have given, do you notice any
>>> abnormally large partitions? What does cfhistograms report for 99%
>>> partition size? A few huge partitions will cause very disproportionate load
>>> on your cluster, including high GC.
>>>
>>> --Bryan
>>>
>>> On Wed, Mar 2, 2016 at 9:28 AM, Amit Singh F <amit.f.si...@ericsson.com>
>>> wrote:
>>>
>>>> Hi Anishek,
>>>>
>>>>
>>>>
>>>> We too faced similar problem in 2.0.14 and after doing some research we
>>>> config few parameters in Cassandra.yaml and was able to overcome GC pauses
>>>> . Those are :
>>>>
>>>>
>>>>
>>>> · memtable_flush_writers : increased from 1 to 3 as from
>>>> tpstats output  we can see mutations dropped so it means writes are getting
>>>> blocked, so increasing number will have those catered.
>>>>
>>>> · memtable_total_space_in_mb : Default (1/4 of heap size), can
>>>> lowered because larger long lived objects will create pressure on HEAP, so
>>>> its better to reduce some amount of size.
>>>>
>>>> · Concurrent_compactors : Alain righlty pointed out this i.e
>>>> reduce it to 8. You need to try this.
>>>>
>>>>
>>>>
>>>> Also please check whether you have mutations drop in other nodes or not.
>>>>
>>>>
>>>>
>>>> Hope this helps in your cluster too.
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> Amit Singh
>>>>
>>>> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
>>>> *Sent:* Wednesday, March 02, 2016 9:33 PM
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Re: Lot of GC on two nodes out of 7
>>>>
>>>>
>>>>
>>>> Can you post a gist of the output of jstat -gccause (60 seconds
>>>> worth)?  I think it's cool you're will

Re: Lot of GC on two nodes out of 7

2016-03-02 Thread Anishek Agarwal
t; might want to keep this as it or even reduce it if you have less than 16 GB
> of native memory. Go with 8 GB if you have a lot of memory.
> `-XX:MaxTenuringThreshold=50` is the highest value I have seen in use so
> far. I had luck with values between 4 <--> 16 in the past. I would give  a
> try with 15.
> `-XX:CMSInitiatingOccupancyFraction=70`--> Why not using default - 75 ?
> Using default and then tune from there to improve things is generally a
> good idea.
>
> You also use a bunch of option I don't know about, if you are uncertain
> about them, you could try a default conf without the options you added and
> just the using the changes above from default
> https://github.com/apache/cassandra/blob/cassandra-2.0/conf/cassandra-env.sh.
> Or you might find more useful information on a nice reference about this
> topic which is Al Tobey's blog post about tuning 2.1. Go to the 'Java
> Virtual Machine' part:
> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html
>
> FWIW, I also saw improvement in the past by upgrading to 2.1, Java 8 and
> G1GC. G1GC is supposed to be easier to configure too.
>
> the average row size for compacted partitions is about 1640 bytes on all
>> nodes. We have replication factor 3 but the problem is only on two nodes.
>>
>
> I think Jeff is trying to spot a wide row messing with your system, so
> looking at the max row size on those nodes compared to other is more
> relevant than average size for this check.
>
> the only other thing that stands out in cfstats is the read time and write
>> time on the nodes with high GC is 5-7 times higher than other 5 nodes, but
>> i think thats expected.
>
>
> I would probably look at this the reverse way: I imagine that extra GC  is
> a consequence of something going wrong on those nodes as JVM / GC are
> configured the same way cluster-wide. GC / JVM issues are often due to
> Cassandra / system / hardware issues, inducing extra pressure on the JVM. I
> would try to tune JVM / GC only once the system is healthy. So I often saw
> high GC being a consequence rather than the root cause of an issue.
>
> To explore this possibility:
>
> Does this command show some dropped or blocked tasks? This would add
> pressure to heap.
> nodetool tpstats
>
> Do you have errors in logs? Always good to know when facing an issue.
> grep -i "ERROR" /var/log/cassandra/system.log
>
> How are compactions tuned (throughput + concurrent compactors)? This
> tuning might explain compactions not keeping up or a high GC pressure.
>
> What are your disks / CPU? To help us giving you good arbitrary values to
> try.
>
> Is there some iowait ? Could point to a bottleneck or bad hardware.
> iostats -mx 5 100
>
> ...
>
> Hope one of those will point you to an issue, but there are many more
> thing you could check.
>
> Let us know how it goes,
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> 2016-03-02 10:33 GMT+01:00 Anishek Agarwal <anis...@gmail.com>:
>
>> also MAX_HEAP_SIZE=6G and HEAP_NEWSIZE=4G.
>>
>> On Wed, Mar 2, 2016 at 1:40 PM, Anishek Agarwal <anis...@gmail.com>
>> wrote:
>>
>>> Hey Jeff,
>>>
>>> one of the nodes with high GC has 1400 SST tables, all other nodes have
>>> about 500-900 SST tables. the other node with high GC has 636 SST tables.
>>>
>>> the average row size for compacted partitions is about 1640 bytes on all
>>> nodes. We have replication factor 3 but the problem is only on two nodes.
>>> the only other thing that stands out in cfstats is the read time and
>>> write time on the nodes with high GC is 5-7 times higher than other 5
>>> nodes, but i think thats expected.
>>>
>>> thanks
>>> anishek
>>>
>>>
>>>
>>>
>>> On Wed, Mar 2, 2016 at 1:09 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>> wrote:
>>>
>>>> Compaction falling behind will likely cause additional work on reads
>>>> (more sstables to merge), but I’d be surprised if it manifested in super
>>>> long GC. When you say twice as many sstables, how many is that?.
>>>>
>>>> In cfstats, does anything stand out? Is max row size on those nodes
>>>> larger than on other nodes?
>>>>
>>>> What you don’t show in your JVM options is the new gen size – if you do
>>>> have unusually large partitions on those two nodes (especially likely if
>>>> you have rf=2 – if

Re: Lot of GC on two nodes out of 7

2016-03-02 Thread Anishek Agarwal
also MAX_HEAP_SIZE=6G and HEAP_NEWSIZE=4G.

On Wed, Mar 2, 2016 at 1:40 PM, Anishek Agarwal <anis...@gmail.com> wrote:

> Hey Jeff,
>
> one of the nodes with high GC has 1400 SST tables, all other nodes have
> about 500-900 SST tables. the other node with high GC has 636 SST tables.
>
> the average row size for compacted partitions is about 1640 bytes on all
> nodes. We have replication factor 3 but the problem is only on two nodes.
> the only other thing that stands out in cfstats is the read time and write
> time on the nodes with high GC is 5-7 times higher than other 5 nodes, but
> i think thats expected.
>
> thanks
> anishek
>
>
>
>
> On Wed, Mar 2, 2016 at 1:09 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
>> Compaction falling behind will likely cause additional work on reads
>> (more sstables to merge), but I’d be surprised if it manifested in super
>> long GC. When you say twice as many sstables, how many is that?.
>>
>> In cfstats, does anything stand out? Is max row size on those nodes
>> larger than on other nodes?
>>
>> What you don’t show in your JVM options is the new gen size – if you do
>> have unusually large partitions on those two nodes (especially likely if
>> you have rf=2 – if you have rf=3, then there’s probably a third node
>> misbehaving you haven’t found yet), then raising new gen size can help
>> handle the garbage created by reading large partitions without having to
>> tolerate the promotion. Estimates for the amount of garbage vary, but it
>> could be “gigabytes” of garbage on a very wide partition (see
>> https://issues.apache.org/jira/browse/CASSANDRA-9754 for work in
>> progress to help mitigate that type of pain).
>>
>> - Jeff
>>
>> From: Anishek Agarwal
>> Reply-To: "user@cassandra.apache.org"
>> Date: Tuesday, March 1, 2016 at 11:12 PM
>> To: "user@cassandra.apache.org"
>> Subject: Lot of GC on two nodes out of 7
>>
>> Hello,
>>
>> we have a cassandra cluster of 7 nodes, all of them have the same JVM GC
>> configurations, all our writes /  reads use the TokenAware Policy wrapping
>> a DCAware policy. All nodes are part of same Datacenter.
>>
>> We are seeing that two nodes are having high GC collection times. Then
>> mostly seem to spend time in GC like about 300-600 ms. This also seems to
>> result in higher CPU utilisation on these machines. Other  5 nodes don't
>> have this problem.
>>
>> There is no additional repair activity going on the cluster, we are not
>> sure why this is happening.
>> we checked cfhistograms on the two CF we have in the cluster and number
>> of reads seems to be almost same.
>>
>> we also used cfstats to see the number of ssttables on each node and one
>> of the nodes with the above problem has twice the number of ssttables than
>> other nodes. This still doesnot explain why two nodes have high GC
>> Overheads. our GC config is as below:
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>
>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>>
>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=50"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>>
>> JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>>
>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48"
>>
>> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48"
>>
>> JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>>
>> # earlier value 131072 = 32768 * 4
>>
>> JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=131072"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32768"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32768"
>>
>> #new
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>>
>> We are using cassandra 2.0.17. If anyone has any suggestion as to how
>> what else we can look for to understand why this is happening please do
>> reply.
>>
>>
>>
>> Thanks
>> anishek
>>
>>
>>
>


Re: Multi DC setup for analytics

2016-03-30 Thread Anishek Agarwal
Hey Guys,

We did the necessary changes and were trying to get this back on track, but
hit another wall,

we have two Clusters in Different DC ( DC1 and DC2) with cluster names (
CLUSTER_1, CLUSTER_2)

we want to have a common analytics cluster in DC3 with cluster name
(CLUSTER_3). -- looks like this can't be done, so we have to setup two
different analytics cluster ? can't we just get data from CLUSTER_1/2 to
same cluster CLUSTER_3 ?

thanks
anishek

On Mon, Mar 21, 2016 at 3:31 PM, Anishek Agarwal <anis...@gmail.com> wrote:

> Hey Clint,
>
> we have two separate rings which don't talk to each other but both having
> the same DC name "DCX".
>
> @Raja,
>
> We had already gone towards the path you suggested.
>
> thanks all
> anishek
>
> On Fri, Mar 18, 2016 at 8:01 AM, Reddy Raja <areddyr...@gmail.com> wrote:
>
>> Yes. Here are the steps.
>> You will have to change the DC Names first.
>> DC1 and DC2 would be independent clusters.
>>
>> Create a new DC, DC3 and include these two DC's on DC3.
>>
>> This should work well.
>>
>>
>> On Thu, Mar 17, 2016 at 11:03 PM, Clint Martin <
>> clintlmar...@coolfiretechnologies.com> wrote:
>>
>>> When you say you have two logical DC both with the same name are you
>>> saying that you have two clusters of servers both with the same DC name,
>>> nether of which currently talk to each other? IE they are two separate
>>> rings?
>>>
>>> Or do you mean that you have two keyspaces in one cluster?
>>>
>>> Or?
>>>
>>> Clint
>>> On Mar 14, 2016 2:11 AM, "Anishek Agarwal" <anis...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> We are using cassandra 2.0.17 and have two logical DC having different
>>>> Keyspaces but both having same logical name DC1.
>>>>
>>>> we want to setup another cassandra cluster for analytics which should
>>>> get data from both the above DC.
>>>>
>>>> if we setup the new DC with name DC2 and follow the steps
>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>>> will it work ?
>>>>
>>>> I would think we would have to first change the names of existing
>>>> clusters to have to different names and then go with adding another dc
>>>> getting data from these?
>>>>
>>>> Also as soon as we add the node the data starts moving... this will all
>>>> be only real time changes done to the cluster right ? we still have to do
>>>> the rebuild to get the data for tokens for node in new cluster ?
>>>>
>>>> Thanks
>>>> Anishek
>>>>
>>>
>>
>>
>> --
>> "In this world, you either have an excuse or a story. I preferred to have
>> a story"
>>
>
>


Re: Acceptable repair time

2016-03-30 Thread Anishek Agarwal
we have about 380GB / RF = 3 ~ 1200 GB on disk. since we are on 2.0.17
there is no incremental repair :(

On Tue, Mar 29, 2016 at 6:05 PM, Kai Wang <dep...@gmail.com> wrote:

> IIRC when we switched to LCS and ran the first full repair with
> 250GB/RF=3, it took at least 12 hours for the repair to finish, then
> another 3+ days for all the compaction to catch up. I called it "the big
> bang of LCS".
>
> Since then we've been running nightly incremental repair.
>
> For me as long as it's reliable (no streaming error, better progress
> reporting etc), I actually don't mind it it takes more than a few hours to
> do a full repair. But I am not sure about 4 days... I guess it depends on
> the size of the cluster and data...
>
> On Tue, Mar 29, 2016 at 6:04 AM, Anishek Agarwal <anis...@gmail.com>
> wrote:
>
>> I would really like to know the answer for above because on some nodes
>> repair takes almost 4 days for us :(.
>>
>> On Tue, Mar 29, 2016 at 8:34 AM, Jack Krupansky <jack.krupan...@gmail.com
>> > wrote:
>>
>>> Someone recently asked me for advice when their repair time was 2-3
>>> days. I thought that was outrageous, but not unheard of. Personally, to me,
>>> 2-3 hours would be about the limit of what I could tolerate, and my
>>> personal goal would be that a full repair of a node should take no longer
>>> than an hour, maybe 90 minutes tops. But... achieving those more
>>> abbreviated repair times would strongly suggest that the amount of data on
>>> each node be kept down to a tiny fraction of a typical spinning disk drive,
>>> or even a fraction of a larger SSD drive.
>>>
>>> So, my question here is what people consider acceptable full repair
>>> times for nodes and what the resulting node data size is.
>>>
>>> What impact vnodes has on these numbers is a bonus question.
>>>
>>> Thanks!
>>>
>>> -- Jack Krupansky
>>>
>>
>>
>


Re: Acceptable repair time

2016-03-29 Thread Anishek Agarwal
I would really like to know the answer for above because on some nodes
repair takes almost 4 days for us :(.

On Tue, Mar 29, 2016 at 8:34 AM, Jack Krupansky 
wrote:

> Someone recently asked me for advice when their repair time was 2-3 days.
> I thought that was outrageous, but not unheard of. Personally, to me, 2-3
> hours would be about the limit of what I could tolerate, and my personal
> goal would be that a full repair of a node should take no longer than an
> hour, maybe 90 minutes tops. But... achieving those more abbreviated repair
> times would strongly suggest that the amount of data on each node be kept
> down to a tiny fraction of a typical spinning disk drive, or even a
> fraction of a larger SSD drive.
>
> So, my question here is what people consider acceptable full repair times
> for nodes and what the resulting node data size is.
>
> What impact vnodes has on these numbers is a bonus question.
>
> Thanks!
>
> -- Jack Krupansky
>


Re: Multi DC setup for analytics

2016-03-31 Thread Anishek Agarwal
Hey Bryan,

Thanks for the info, we inferred as much, currently the only other thing we
were trying were trying to start two separate instances in Analytics
cluster on same set of machines to talk to respective individual DC's but
within 2 mins dropped that as we will have to change ports on atlas one of
the existing DC's so when they join with the analytics cluster they are on
same port.

for now we are just getting another set of machines for this.


I had known about the pattern of using a separate analytics cluster for
cassandra but thought we could join them across two clusters, my bad now
that i think of it i think it would have been better to have just one DC
for realtime prod requests instead of two.

are there ways of merging existing clusters to one cluster in cassandra ?


On Fri, Apr 1, 2016 at 5:05 AM, Bryan Cheng <br...@blockcypher.com> wrote:

> I'm jumping into this thread late, so sorry if this has been covered
> before. But am I correct in reading that you have two different Cassandra
> rings, not talking to each other at all, and you want to have a shared DC
> with a third Cassandra ring?
>
> I'm not sure what you want to do is possible.
>
> If I had the luxury of starting from scratch, the design I would do is:
> All three DC's in one cluster, with 3 datacenters. DC3 is the analytics DC.
> DC1's keyspaces are replicated to DC1 and DC3 only.
> DC2's keyspaces are replicated to DC2 and DC3 only.
>
> Then you have DC3 with all data from both DC1 and DC2 to run analytics on,
> and no cross-talk between DC1 and DC2.
>
> If you cannot rebuild your existing clusters, you may want to consider
> using something like Spark to ETL your data out of DC1 and DC2 into a new
> cluster at DC3. At that point you're running a data warehouse and lose some
> of the advantages of seemless cluster membership.
>
> On Wed, Mar 30, 2016 at 5:43 AM, Anishek Agarwal <anis...@gmail.com>
> wrote:
>
>> Hey Guys,
>>
>> We did the necessary changes and were trying to get this back on track,
>> but hit another wall,
>>
>> we have two Clusters in Different DC ( DC1 and DC2) with cluster names (
>> CLUSTER_1, CLUSTER_2)
>>
>> we want to have a common analytics cluster in DC3 with cluster name
>> (CLUSTER_3). -- looks like this can't be done, so we have to setup two
>> different analytics cluster ? can't we just get data from CLUSTER_1/2 to
>> same cluster CLUSTER_3 ?
>>
>> thanks
>> anishek
>>
>> On Mon, Mar 21, 2016 at 3:31 PM, Anishek Agarwal <anis...@gmail.com>
>> wrote:
>>
>>> Hey Clint,
>>>
>>> we have two separate rings which don't talk to each other but both
>>> having the same DC name "DCX".
>>>
>>> @Raja,
>>>
>>> We had already gone towards the path you suggested.
>>>
>>> thanks all
>>> anishek
>>>
>>> On Fri, Mar 18, 2016 at 8:01 AM, Reddy Raja <areddyr...@gmail.com>
>>> wrote:
>>>
>>>> Yes. Here are the steps.
>>>> You will have to change the DC Names first.
>>>> DC1 and DC2 would be independent clusters.
>>>>
>>>> Create a new DC, DC3 and include these two DC's on DC3.
>>>>
>>>> This should work well.
>>>>
>>>>
>>>> On Thu, Mar 17, 2016 at 11:03 PM, Clint Martin <
>>>> clintlmar...@coolfiretechnologies.com> wrote:
>>>>
>>>>> When you say you have two logical DC both with the same name are you
>>>>> saying that you have two clusters of servers both with the same DC name,
>>>>> nether of which currently talk to each other? IE they are two separate
>>>>> rings?
>>>>>
>>>>> Or do you mean that you have two keyspaces in one cluster?
>>>>>
>>>>> Or?
>>>>>
>>>>> Clint
>>>>> On Mar 14, 2016 2:11 AM, "Anishek Agarwal" <anis...@gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We are using cassandra 2.0.17 and have two logical DC having
>>>>>> different Keyspaces but both having same logical name DC1.
>>>>>>
>>>>>> we want to setup another cassandra cluster for analytics which should
>>>>>> get data from both the above DC.
>>>>>>
>>>>>> if we setup the new DC with name DC2 and follow the steps
>>>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>>>>> will it work ?
>>>>>>
>>>>>> I would think we would have to first change the names of existing
>>>>>> clusters to have to different names and then go with adding another dc
>>>>>> getting data from these?
>>>>>>
>>>>>> Also as soon as we add the node the data starts moving... this will
>>>>>> all be only real time changes done to the cluster right ? we still have 
>>>>>> to
>>>>>> do the rebuild to get the data for tokens for node in new cluster ?
>>>>>>
>>>>>> Thanks
>>>>>> Anishek
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> "In this world, you either have an excuse or a story. I preferred to
>>>> have a story"
>>>>
>>>
>>>
>>
>


Re: Traffic inconsistent across nodes

2016-04-13 Thread Anishek Agarwal
here is the output:  every node in a single DC is in the same rack.

Datacenter: WDC5



Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns (effective)  Host ID
Rack

UN  10.125.138.33   299.22 GB  256 64.2%
8aaa6015-d444-4551-a3c5-3257536df476  RAC1

UN  10.125.138.125  329.38 GB  256 70.3%
70be44a2-de17-41f1-9d3a-6a0be600eedf  RAC1

UN  10.125.138.129  305.11 GB  256 65.5%
0fbc7f44-7062-4996-9eba-2a05ae1a7032  RAC1

Datacenter: WDC

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns (effective)  Host ID
Rack

UN  10.124.114.105  151.09 GB  256 38.0%
c432357d-bf81-4eef-98e1-664c178a3c23  RAC1

UN  10.124.114.110  150.15 GB  256 36.9%
6f92d32e-1c64-4145-83d7-265c331ea408  RAC1

UN  10.124.114.108  170.1 GB   256 41.3%
040ae7e5-3f1e-4874-8738-45edbf576e12  RAC1

UN  10.124.114.98   165.34 GB  256 37.6%
cdc69c7d-b9d6-4abd-9388-1cdcd35d946c  RAC1

UN  10.124.114.113  145.22 GB  256 35.7%
1557af04-e658-4751-b984-8e0cdc41376e  RAC1

UN  10.125.138.59   162.65 GB  256 38.6%
9ba1b7b6-5655-456e-b1a1-6f429750fc96  RAC1

UN  10.124.114.97   164.03 GB  256 36.9%
c918e497-498e-44c3-ab01-ab5cb4d48b09  RAC1

UN  10.124.114.118  139.62 GB  256 35.1%
2bb0c265-a5d4-4cd4-8f50-13b5a9a891c9  RAC1

On Thu, Apr 14, 2016 at 4:48 AM, Eric Stevens <migh...@gmail.com> wrote:

> The output of nodetool status would really help answer some questions.  I
> take it the 8 hosts in your graph are in the same DC.  Are the four serving
> writes in the same logical or physical rack (as Cassandra sees it), while
> the others are not?
>
> On Tue, Apr 12, 2016 at 10:48 PM Anishek Agarwal <anis...@gmail.com>
> wrote:
>
>> We have two DC one with the above 8 nodes and other with 3 nodes.
>>
>>
>>
>> On Tue, Apr 12, 2016 at 8:06 PM, Eric Stevens <migh...@gmail.com> wrote:
>>
>>> Maybe include nodetool status here?  Are the four nodes serving reads in
>>> one DC (local to your driver's config) while the others are in another?
>>>
>>> On Tue, Apr 12, 2016, 1:01 AM Anishek Agarwal <anis...@gmail.com> wrote:
>>>
>>>> hello,
>>>>
>>>> we have 8 nodes in one cluster and attached is the traffic patterns
>>>> across the nodes.
>>>>
>>>> its very surprising that only 4 nodes show transmitting (purple)
>>>> packets.
>>>>
>>>> our driver configuration on clients has the following load balancing
>>>> configuration  :
>>>>
>>>> new TokenAwarePolicy(
>>>> new 
>>>> DCAwareRoundRobinPolicy(configuration.get(Constants.LOCAL_DATA_CENTRE_NAME,
>>>>  "WDC")),
>>>> true)
>>>>
>>>>
>>>> any idea what is that we are missing which is leading to this skewed
>>>> data read patterns
>>>>
>>>> cassandra drivers as below:
>>>>
>>>> 
>>>> com.datastax.cassandra
>>>> cassandra-driver-core
>>>> 2.1.6
>>>> 
>>>> 
>>>> com.datastax.cassandra
>>>> cassandra-driver-mapping
>>>> 2.1.6
>>>> 
>>>>
>>>> cassandra version is 2.0.17
>>>>
>>>> Thanks in advance for the help.
>>>>
>>>> Anishek
>>>>
>>>>
>>


Re: Traffic inconsistent across nodes

2016-04-18 Thread Anishek Agarwal
Looks like some problem with our monitoring framework. Thanks for you help !

On Mon, Apr 18, 2016 at 2:46 PM, Anishek Agarwal <anis...@gmail.com> wrote:

> OS used : Cent OS 6 on all nodes except *10*.125.138.59 ( which runs Cent
> OS 7)
> All of them are running Cassandra 2.0.17
>
> output of the test :
>
> host ip: 10.124.114.113
>
> host DC : WDC
>
> distance of host: LOCAL
>
> host is up: true
>
> cassandra version : 2.0.17
>
> host ip: 10.124.114.108
>
> host DC : WDC
>
> distance of host: LOCAL
>
> host is up: true
>
> cassandra version : 2.0.17
>
> host ip: 10.124.114.110
>
> host DC : WDC
>
> distance of host: LOCAL
>
> host is up: true
>
> cassandra version : 2.0.17
>
> host ip: 10.124.114.118
>
> host DC : WDC
>
> distance of host: LOCAL
>
> host is up: true
>
> cassandra version : 2.0.17
>
> host ip: 10.125.138.59
>
> host DC : WDC
>
> distance of host: LOCAL
>
> host is up: true
>
> cassandra version : 2.0.17
>
> host ip: 10.124.114.97
>
> host DC : WDC
>
> distance of host: LOCAL
>
> host is up: true
>
> cassandra version : 2.0.17
>
> host ip: 10.124.114.105
>
> host DC : WDC
>
> distance of host: LOCAL
>
> host is up: true
>
> cassandra version : 2.0.17
>
> host ip: 10.124.114.98
>
> host DC : WDC
>
> distance of host: LOCAL
>
> host is up: true
>
> cassandra version : 2.0.17
>
>
> On Fri, Apr 15, 2016 at 6:47 PM, Eric Stevens <migh...@gmail.com> wrote:
>
>> Thanks for that, that helps a lot.  The next thing to check might be
>> whether or not your application actually has access to the other nodes.
>> With that topology, and assuming all the nodes you included in your
>> original graph are in the 'WDC' data center, I'd be inclined to look for a
>> network issue of some kind.
>>
>> Also, it probably doesn't matter, but what OS / Distribution are you
>> running the servers and clients on?
>>
>> Check with netcat or something that you can reach all the configured
>> ports from your application server, but also the driver itself offers some
>> introspection into its view of individual connection health.  This is a
>> little bit ugly, but this is how we include information about connection
>> status in an API for health monitoring from a Scala application using the
>> Java driver; hopefully you can use it to see how to access information
>> about the driver's view of host health from the application's perspective.
>> Most importantly I'd suggest looking for host.isUp status and
>> LoadBalancingPolicy.distance(host) to see that it considers all the hosts
>> in your target datacenter to be LOCAL.
>>
>> "hosts" -> {
>>   val hosts: Map[String, Map[String, mutable.Set[Host]]] =
>> connection.getMetadata
>>   .getAllHosts.asScala
>>   .groupBy(_.getDatacenter)
>>   .mapValues(_.groupBy(_.getRack))
>>   val lbp: LoadBalancingPolicy = 
>> connection.getConfiguration.getPolicies.getLoadBalancingPolicy
>>   JsObject(hosts.map { case (dc: String, rackAndHosts) =>
>> dc -> JsObject(rackAndHosts.map { case (rack: String, hosts: 
>> mutable.Set[Host]) =>
>>   rack -> JsArray(hosts.map { host =>
>> Json.obj(
>>   "address"  -> host.getAddress.toString,
>>   "socketAddress"-> host.getSocketAddress.toString,
>>   "cassandraVersion" -> host.getCassandraVersion.toString,
>>   "isUp" -> host.isUp,
>>   "hostDistance" -> lbp.distance(host).toString
>> )
>>   }.toSeq)
>> }.toSeq)
>>   }.toSeq)
>> },
>>
>>
>> On Wed, Apr 13, 2016 at 10:50 PM Anishek Agarwal <anis...@gmail.com>
>> wrote:
>>
>>> here is the output:  every node in a single DC is in the same rack.
>>>
>>> Datacenter: WDC5
>>>
>>> 
>>>
>>> Status=Up/Down
>>>
>>> |/ State=Normal/Leaving/Joining/Moving
>>>
>>> --  Address Load   Tokens  Owns (effective)  Host ID
>>>   Rack
>>>
>>> UN  10.125.138.33   299.22 GB  256 64.2%
>>> 8aaa6015-d444-4551-a3c5-3257536df476  RAC1
>>>
>>> UN  10.125.138.125  329.38 GB  256 70.3%
>>> 70be44a2-de17-41f1-9d3a-6a0be600eedf  RAC1
>>>
>>> UN  10.125.138.129  305.11 GB  256 65.5%
>>> 0fbc7f44-7062-4996-9

Re: Traffic inconsistent across nodes

2016-04-18 Thread Anishek Agarwal
OS used : Cent OS 6 on all nodes except *10*.125.138.59 ( which runs Cent
OS 7)
All of them are running Cassandra 2.0.17

output of the test :

host ip: 10.124.114.113

host DC : WDC

distance of host: LOCAL

host is up: true

cassandra version : 2.0.17

host ip: 10.124.114.108

host DC : WDC

distance of host: LOCAL

host is up: true

cassandra version : 2.0.17

host ip: 10.124.114.110

host DC : WDC

distance of host: LOCAL

host is up: true

cassandra version : 2.0.17

host ip: 10.124.114.118

host DC : WDC

distance of host: LOCAL

host is up: true

cassandra version : 2.0.17

host ip: 10.125.138.59

host DC : WDC

distance of host: LOCAL

host is up: true

cassandra version : 2.0.17

host ip: 10.124.114.97

host DC : WDC

distance of host: LOCAL

host is up: true

cassandra version : 2.0.17

host ip: 10.124.114.105

host DC : WDC

distance of host: LOCAL

host is up: true

cassandra version : 2.0.17

host ip: 10.124.114.98

host DC : WDC

distance of host: LOCAL

host is up: true

cassandra version : 2.0.17


On Fri, Apr 15, 2016 at 6:47 PM, Eric Stevens <migh...@gmail.com> wrote:

> Thanks for that, that helps a lot.  The next thing to check might be
> whether or not your application actually has access to the other nodes.
> With that topology, and assuming all the nodes you included in your
> original graph are in the 'WDC' data center, I'd be inclined to look for a
> network issue of some kind.
>
> Also, it probably doesn't matter, but what OS / Distribution are you
> running the servers and clients on?
>
> Check with netcat or something that you can reach all the configured ports
> from your application server, but also the driver itself offers some
> introspection into its view of individual connection health.  This is a
> little bit ugly, but this is how we include information about connection
> status in an API for health monitoring from a Scala application using the
> Java driver; hopefully you can use it to see how to access information
> about the driver's view of host health from the application's perspective.
> Most importantly I'd suggest looking for host.isUp status and
> LoadBalancingPolicy.distance(host) to see that it considers all the hosts
> in your target datacenter to be LOCAL.
>
> "hosts" -> {
>   val hosts: Map[String, Map[String, mutable.Set[Host]]] =
> connection.getMetadata
>   .getAllHosts.asScala
>   .groupBy(_.getDatacenter)
>   .mapValues(_.groupBy(_.getRack))
>   val lbp: LoadBalancingPolicy = 
> connection.getConfiguration.getPolicies.getLoadBalancingPolicy
>   JsObject(hosts.map { case (dc: String, rackAndHosts) =>
> dc -> JsObject(rackAndHosts.map { case (rack: String, hosts: 
> mutable.Set[Host]) =>
>   rack -> JsArray(hosts.map { host =>
> Json.obj(
>   "address"  -> host.getAddress.toString,
>   "socketAddress"-> host.getSocketAddress.toString,
>   "cassandraVersion" -> host.getCassandraVersion.toString,
>   "isUp"         -> host.isUp,
>   "hostDistance" -> lbp.distance(host).toString
> )
>   }.toSeq)
> }.toSeq)
>   }.toSeq)
> },
>
>
> On Wed, Apr 13, 2016 at 10:50 PM Anishek Agarwal <anis...@gmail.com>
> wrote:
>
>> here is the output:  every node in a single DC is in the same rack.
>>
>> Datacenter: WDC5
>>
>> 
>>
>> Status=Up/Down
>>
>> |/ State=Normal/Leaving/Joining/Moving
>>
>> --  Address Load   Tokens  Owns (effective)  Host ID
>>   Rack
>>
>> UN  10.125.138.33   299.22 GB  256 64.2%
>> 8aaa6015-d444-4551-a3c5-3257536df476  RAC1
>>
>> UN  10.125.138.125  329.38 GB  256 70.3%
>> 70be44a2-de17-41f1-9d3a-6a0be600eedf  RAC1
>>
>> UN  10.125.138.129  305.11 GB  256 65.5%
>> 0fbc7f44-7062-4996-9eba-2a05ae1a7032  RAC1
>>
>> Datacenter: WDC
>>
>> ===
>>
>> Status=Up/Down
>>
>> |/ State=Normal/Leaving/Joining/Moving
>>
>> --  Address Load   Tokens  Owns (effective)  Host ID
>>   Rack
>>
>> UN  10.124.114.105  151.09 GB  256 38.0%
>> c432357d-bf81-4eef-98e1-664c178a3c23  RAC1
>>
>> UN  10.124.114.110  150.15 GB  256 36.9%
>> 6f92d32e-1c64-4145-83d7-265c331ea408  RAC1
>>
>> UN  10.124.114.108  170.1 GB   256 41.3%
>> 040ae7e5-3f1e-4874-8738-45edbf576e12  RAC1
>>
>> UN  10.124.114.98   165.34 GB  256 37.6%
>> cdc69c7d-b9d6-4abd-9388-1cdcd35d946c  RAC1
>>
>> UN  10.124.114.113  145.22 GB  256 35.7%
>> 1557af04-e658-4751

Re: nodetool repair with -pr and -dc

2016-08-10 Thread Anishek Agarwal
ok thanks, so if we want to use -pr option ( which i suppose we should to
prevent duplicate checks) in 2.0 then if we run the repair on all nodes in
a single DC then it should be sufficient and we should not need to run it
on all nodes across DC's ?



On Wed, Aug 10, 2016 at 5:01 PM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> On 2.0 repair -pr option is not supported together with -local, -hosts or
> -dc, since it assumes you need to repair all nodes in all DCs and it will
> throw and error if you try to run with nodetool, so perhaps there's
> something wrong with range_repair options parsing.
>
> On 2.1 it was added support to simultaneous -pr and -local options on
> CASSANDRA-7450, so if you need that you can either upgade to 2.1 or
> backport that to 2.0.
>
>
> 2016-08-10 5:20 GMT-03:00 Anishek Agarwal <anis...@gmail.com>:
>
>> Hello,
>>
>> We have 2.0.17 cassandra cluster(*DC1*) with a cross dc setup with a
>> smaller cluster(*DC2*).  After reading various blogs about
>> scheduling/running repairs looks like its good to run it with the following
>>
>>
>> -pr for primary range only
>> -st -et for sub ranges
>> -par for parallel
>> -dc to make sure we can schedule repairs independently on each Data
>> centre we have.
>>
>> i have configured the above using the repair utility @
>> https://github.com/BrianGallew/cassandra_range_repair.git
>>
>> which leads to the following command :
>>
>> ./src/range_repair.py -k [keyspace] -c [columnfamily name] -v -H
>> localhost -p -D* DC1*
>>
>> but looks like the merkle tree is being calculated on nodes which are
>> part of other *DC2.*
>>
>> why does this happen? i thought it should only look at the nodes in local
>> cluster. however on nodetool the* -pr* option cannot be used with
>> *-local* according to docs @https://docs.datastax.com/en/
>> cassandra/2.0/cassandra/tools/toolsRepair.html
>>
>> so i am may be missing something, can someone help explain this please.
>>
>> thanks
>> anishek
>>
>
>


nodetool repair with -pr and -dc

2016-08-10 Thread Anishek Agarwal
Hello,

We have 2.0.17 cassandra cluster(*DC1*) with a cross dc setup with a
smaller cluster(*DC2*).  After reading various blogs about
scheduling/running repairs looks like its good to run it with the following


-pr for primary range only
-st -et for sub ranges
-par for parallel
-dc to make sure we can schedule repairs independently on each Data centre
we have.

i have configured the above using the repair utility @
https://github.com/BrianGallew/cassandra_range_repair.git

which leads to the following command :

./src/range_repair.py -k [keyspace] -c [columnfamily name] -v -H localhost
-p -D* DC1*

but looks like the merkle tree is being calculated on nodes which are part
of other *DC2.*

why does this happen? i thought it should only look at the nodes in local
cluster. however on nodetool the* -pr* option cannot be used with
*-local* according
to docs @
https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsRepair.html

so i am may be missing something, can someone help explain this please.

thanks
anishek