Re: 10000+ CF support from Cassandra

2015-05-28 Thread Jack Krupansky
How big is each of the tables - are they all fairly small or fairly large?
Small as in no more than thousands of rows or large as in tens of millions
or hundreds of millions of rows?

Small tables are are not ideal for a Cassandra cluster since the rows would
be spread out across the nodes, even though it might make more sense for
each small table to be on a single node.

You might want to consider a model where you have an application layer that
maps logical tenant tables into partition keys within a single large
Casandra table, or at least a relatively small number of Cassandra tables.
It will depend on the typical size of your tenant tables - very small ones
would make sense within a single partition, while larger ones should have
separate partitions for a tenant's data. The key here is that tables are
expensive, but partitions are cheap and scale very well with Cassandra.

Finally, you said 10 clusters, but did you mean 10 nodes? You might want
to consider a model where you do indeed have multiple clusters, where each
handles a fraction of the tenants, since there is no need for separate
tenants to be on the same cluster.


-- Jack Krupansky

On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya chaitan64a...@gmail.com
wrote:

 Good Day Everyone,

 I am very happy with the (almost) linear scalability offered by C*. We had
 a lot of problems with RDBMS.

 But, I heard that C* has a limit on number of column families that can be
 created in a single cluster.
 The reason being each CF stores 1-2 MB on the JVM heap.

 In our use case, we have about 1+ CF and we want to support
 multi-tenancy.
 (i.e 1 * no of tenants)

 We are new to C* and being from RDBMS background, I would like to
 understand how to tackle this scenario from your advice.

 Our plan is to use Off-Heap memtable approach.
 http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1

 Each node in the cluster has following configuration
 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap)
 IMO, this should be able to support 1000 CF with no(very less) impact on
 performance and startup time.

 We tackle multi-tenancy using different keyspaces.(Solution I found on the
 web)

 Using this approach we can have 10 clusters doing the job. (We actually
 are worried about the cost)

 Can you please help us evaluate this strategy? I want to hear communities
 opinion on this.

 My major concerns being,

 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000 CF
 right?

 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number
 of column families increase even when we use multiple keyspace.

 3. I understand the complexity using multi-cluster for single application.
 The code base will get tightly coupled with infrastructure. Is this the
 right approach?

 Any suggestion is appreciated.

 Thanks,
 Arun



Re: Cassandra 1.2.x EOL date

2015-05-28 Thread Robert Coli
On Wed, May 27, 2015 at 5:10 PM, Jason Unovitch jason.unovi...@gmail.com
wrote:

 Simple and quick question, can anyone point me to where the Cassandra
 1.2.x series EOL date was announced?  I see archived mailing list
 threads for 1.2.19 mentioning it was going to be the last release and
 I see CVE-2015-0225 mention it is EOL.   I didn't see it say when the
 official EOL date was.


When version n+2 is released, version n is EOL. Release dates for new
versions are not generally known in advance.

2.1.5 has been released, so 1.2.19 (2.1, 2.0, 1.2 = 1.2 is current -2) is
EOL for the 1.2 branch as of its release date.

In some very very rare case, it is theoretically possible that the most
recent version of an EOL branch might get a patch.

=Rob


Re: Cassandra seems to replace existing node without specifying replace_address

2015-05-28 Thread Robert Coli
On Thu, May 28, 2015 at 2:00 AM, Thomas Whiteway 
thomas.white...@metaswitch.com wrote:

  Sorry, I should have been clearer.  In this case we’ve decommissioned
 the node and deleted the data, commitlog, and saved caches directories so
 we’re not hitting CASSANDRA-8801.  We also hit the “A node with address
 address already exists, cancelling join” error when performing the same
 steps on 2.1.0, just not in 2.1.4.


Is auto_bootstrap set to true or false? Are you using vnodes, or is
initial_token populated?

=Rob


Re: Spark SQL JDBC Server + DSE

2015-05-28 Thread Brian O'Neill
Mohammed,

This doesn¹t really answer your question, but I¹m working on a new REST
server that allows people to submit SQL queries over REST, which get
executed via Spark SQL.   Based on what I started here:
http://brianoneill.blogspot.com/2015/05/spark-sql-against-cassandra-example.
html

I assume you need JDBC connectivity specifically?

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 http://www.twitter.com/boneill42


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Mohammed Guller moham...@glassbeam.com
Reply-To:  user@cassandra.apache.org
Date:  Thursday, May 28, 2015 at 8:26 PM
To:  user@cassandra.apache.org user@cassandra.apache.org
Subject:  RE: Spark SQL JDBC Server + DSE

Anybody out there using DSE + Spark SQL JDBC server?
 

Mohammed
 

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Tuesday, May 26, 2015 6:17 PM
To: user@cassandra.apache.org
Subject: Spark SQL JDBC Server + DSE
 
Hi ­
As I understand, the Spark SQL Thrift/JDBC server cannot be used with the
open source C*. Only DSE supports  the Spark SQL JDBC server.
 
We would like to find out whether how many organizations are using this
combination. If you do use DSE + Spark SQL JDBC server, it would be great if
you could share your experience. For example, what kind of issues you have
run into? How is the performance? What reporting tools you are using?
 
Thank  you!
 
Mohammed 
 




what this error mean

2015-05-28 Thread 曹志富
I have a 25 noedes C* cluster with C* 2.1.3. These days a node occur split
brain many times。

check the log I found this:

INFO  [MemtableFlushWriter:118] 2015-05-29 08:07:39,176
Memtable.java:378 - Completed flushing
/home/ant/apache-cassandra-2.1.3/bin/../data/data/system/sstable_activity-5a1ff2
67ace03f128563cfae6103c65e/system-sstable_activity-ka-4371-Data.db (8187
bytes) for commitlog position ReplayPosition(segmentId=1432775133526,
position=16684949)
ERROR [IndexSummaryManager:1] 2015-05-29 08:10:30,209
CassandraDaemon.java:167 - Exception in thread
Thread[IndexSummaryManager:1,1,main]
java.lang.AssertionError: null
at
org.apache.cassandra.io.sstable.SSTableReader.cloneWithNewSummarySamplingLevel(SSTableReader.java:921)
~[apache-cassandra-2.1.3.jar:2.1.3]
at
org.apache.cassandra.io.sstable.IndexSummaryManager.adjustSamplingLevels(IndexSummaryManager.java:410)
~[apache-cassandra-2.1.3.jar:2.1.3]
at
org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:288)
~[apache-cassandra-2.1.3.jar:2.1.3]
at
org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:238)
~[apache-cassandra-2.1.3.jar:2.1.3]
at
org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:139)
~[apache-cassandra-2.1.3.jar:2.1.3]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-2.1.3.jar:2.1.3]
at
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:82)
~[apache-cassandra-2.
1.3.jar:2.1.3]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
[na:1.7.0_71]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
[na:1.7.0_71]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
[na:1.7.0_71]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[na:1.7.0_71]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_71]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]

I  want to know why this and how to fix this

Thanks all
--
Ranger Tsao


RE: Spark SQL JDBC Server + DSE

2015-05-28 Thread Mohammed Guller
Anybody out there using DSE + Spark SQL JDBC server?

Mohammed

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Tuesday, May 26, 2015 6:17 PM
To: user@cassandra.apache.org
Subject: Spark SQL JDBC Server + DSE

Hi -
As I understand, the Spark SQL Thrift/JDBC server cannot be used with the open 
source C*. Only DSE supports  the Spark SQL JDBC server.

We would like to find out whether how many organizations are using this 
combination. If you do use DSE + Spark SQL JDBC server, it would be great if 
you could share your experience. For example, what kind of issues you have run 
into? How is the performance? What reporting tools you are using?

Thank  you!

Mohammed



Re: 10000+ CF support from Cassandra

2015-05-28 Thread Arun Chaitanya
Hello Jack,

 Column families? As opposed to tables? Are you using Thrift instead of
CQL3? You should be focusing on the latter, not the former.
We have an ORM developed in our company, which maps each DTO to a column
family. So, we have many column families. We are using CQL3.

 But either way, the general guidance is that there is no absolute limit
of tables per se, but low hundreds is the recommended limit, regardless
of whether how many key spaces they may be divided
 between. More than that is an anti-pattern for Cassandra - maybe you can
make it work for your application, but it isn't recommended.
You want to say that most cassandra users don't have more than 2-300 column
families? Is this achieved through careful data modelling?

 A successful Cassandra deployment is critically dependent on careful data
modeling - who is responsible for modeling each of these tables, you and a
single, tightly-knit team with very common interests  and very specific
goals and SLAs or many different developers with different managers with
different goals such as SLAs?
The latter.

 When you say multi-tenant, are you simply saying that each of your
organization's customers has their data segregated, or does each customer
have direct access to the cluster?
Each organization's data is in the same cluster. No customer doesn't have
access to the cluster.

Thanks,
Arun

On Wed, May 27, 2015 at 7:17 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Scalability of Cassandra refers primarily to number of rows and number of
 nodes - to add more data, add more nodes.

 Column families? As opposed to tables? Are you using Thrift instead of
 CQL3? You should be focusing on the latter, not the former.

 But either way, the general guidance is that there is no absolute limit of
 tables per se, but low hundreds is the recommended limit, regardless of
 whether how many key spaces they may be divided between. More than that is
 an anti-pattern for Cassandra - maybe you can make it work for your
 application, but it isn't recommended.

 A successful Cassandra deployment is critically dependent on careful data
 modeling - who is responsible for modeling each of these tables, you and a
 single, tightly-knit team with very common interests and very specific
 goals and SLAs or many different developers with different managers with
 different goals such as SLAs?

 When you say multi-tenant, are you simply saying that each of your
 organization's customers has their data segregated, or does each customer
 have direct access to the cluster?





 -- Jack Krupansky

 On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya chaitan64a...@gmail.com
 wrote:

 Good Day Everyone,

 I am very happy with the (almost) linear scalability offered by C*. We
 had a lot of problems with RDBMS.

 But, I heard that C* has a limit on number of column families that can be
 created in a single cluster.
 The reason being each CF stores 1-2 MB on the JVM heap.

 In our use case, we have about 1+ CF and we want to support
 multi-tenancy.
 (i.e 1 * no of tenants)

 We are new to C* and being from RDBMS background, I would like to
 understand how to tackle this scenario from your advice.

 Our plan is to use Off-Heap memtable approach.
 http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1

 Each node in the cluster has following configuration
 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap)
 IMO, this should be able to support 1000 CF with no(very less) impact on
 performance and startup time.

 We tackle multi-tenancy using different keyspaces.(Solution I found on
 the web)

 Using this approach we can have 10 clusters doing the job. (We actually
 are worried about the cost)

 Can you please help us evaluate this strategy? I want to hear communities
 opinion on this.

 My major concerns being,

 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000
 CF right?

 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number
 of column families increase even when we use multiple keyspace.

 3. I understand the complexity using multi-cluster for single
 application. The code base will get tightly coupled with infrastructure. Is
 this the right approach?

 Any suggestion is appreciated.

 Thanks,
 Arun





Re: Start with single node, move to 3-node cluster

2015-05-28 Thread Jason Wee
hmm..i supposed you start with rf = 1 and then when 3n arrived, just add
into the cluster and later decomission this one node?
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_remove_node_t.html

hth

jason

On Tue, May 26, 2015 at 10:02 PM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Jason,



 When the 3N cluster is up and running, I need to get the data from SN into
 the 3N cluster and then give the SN server back. So I need to keep the
 data, but on completely new servers – just trying to work out what the best
 way of doing that is. The volume of data that needs migrating won’t be
 huge, probably about 30G, but it is data that I definitely need to keep
 (for historical analysis, audit etc).



 Thanks!

 Matthew







 *From:* Jason Wee [mailto:peich...@gmail.com]
 *Sent:* 26 May 2015 14:38
 *To:* user@cassandra.apache.org
 *Subject:* Re: Start with single node, move to 3-node cluster



 will you add this lent one node into the 3N to form a cluster? but really
 , if you are just started, you could use this one node for your learning by
 installing multiple instances for experiments or development purposes only.
 imho, in the long run, this proove to be very valuable, as least for me.



 with this single node, you can easily simulate like c* upgrade. for
 instance, c* right now is at 2.1.5, when 2.2 went stable, you can test
 using your multiple instances on this single node to simulate your
 production environment safely.



 hth



 jason



 On Tue, May 26, 2015 at 9:24 PM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi gurus,



 We have ordered some hardware for a 3-node cluster, but its ETA is 6 to 8
 weeks. In the meantime, I have been lent a single server that I can use. I
 am wondering what the best way is to set up my single node (SN), so I can
 then move to the 3-node cluster (3N) when the hardware arrives.



 Do I:



 1.   Create my keyspaces on SN with RF=1, and when 3N is up and
 running migrate all the data manually (either through Spark, dump-and-load,
 or write a small script)?

 2.   Create my keyspaces on SN with RF=3, bootstrap the 3N nodes into
 a 4-node cluster when they’re ready, then remove SN from the cluster?

 3.   Use SN as normal, and when 3N hardware arrives, physically move
 the data folder and commit log folder onto one of the nodes in 3N and start
 it up as a seed?

 4.   Any other recommended solutions?



 I’m not even sure what the impact would be of running a single node with
 RF=3 – would this even work?



 Any ideas would be much appreciated.



 Thanks!

 Matthew







RE: Cassandra seems to replace existing node without specifying replace_address

2015-05-28 Thread Thomas Whiteway
Sorry, I should have been clearer.  In this case we’ve decommissioned the node 
and deleted the data, commitlog, and saved caches directories so we’re not 
hitting CASSANDRA-8801.  We also hit the “A node with address address already 
exists, cancelling join” error when performing the same steps on 2.1.0, just 
not in 2.1.4.

Thomas

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: 27 May 2015 20:41
To: user@cassandra.apache.org
Subject: Re: Cassandra seems to replace existing node without specifying 
replace_address

On Wed, May 27, 2015 at 5:48 AM, Thomas Whiteway 
thomas.white...@metaswitch.commailto:thomas.white...@metaswitch.com wrote:
I’ve been investigating using replace_address to replace a node that hasn’t 
left the cluster cleanly and after upgrading from 2.1.0 to 2.1.4 it seems that 
adding a new node will automatically replace an existing node with the same IP 
address even if replace_address isn’t used.  Does anyone know whether this is 
an expected change (as far as I can tell it doesn’t seem to be)?

This is a longstanding known issue (Cassandra has had this behavior since the 
inception of decom), with a fix recently (May 19, 2015) merged to trunk.

https://issues.apache.org/jira/browse/CASSANDRA-8801

The basic problem is that the node does not forget its own cluster membership 
information, and so joins the cluster using its stored tokens. In my opinion, 
decommission should wipe all stored node state, but 8801 creates a workaround 
that addresses this, the worst case.

=Rob



Re: 10000+ CF support from Cassandra

2015-05-28 Thread Graham Sanderson
Depending on your use case and data types (for example if you can have a 
minimally
Nested Json representation of the objects;
Than you could go with a common mapstring,string representation where keys 
are top love object fields and values are valid Json literals as strings; eg 
unquoted primitives, quoted strings, unquoted arrays or other objects

Each top level field is then independently updatable - which may be beneficial 
(and allows you to trivially keep historical versions of objects of that is a 
requirement)

If you are updating the object in its entirety on save then simply store the 
entire object in a single cql field, and denormalize any search fields you may 
need (which you kinda want to do anyway)

Sent from my iPhone

 On May 28, 2015, at 1:49 AM, Arun Chaitanya chaitan64a...@gmail.com wrote:
 
 Hello Jack,
 
  Column families? As opposed to tables? Are you using Thrift instead of 
  CQL3? You should be focusing on the latter, not the former.
 We have an ORM developed in our company, which maps each DTO to a column 
 family. So, we have many column families. We are using CQL3.
 
  But either way, the general guidance is that there is no absolute limit of 
  tables per se, but low hundreds is the recommended limit, regardless of 
  whether how many key spaces they may be divided 
  between. More than that is an anti-pattern for Cassandra - maybe you can 
  make it work for your application, but it isn't recommended.
 You want to say that most cassandra users don't have more than 2-300 column 
 families? Is this achieved through careful data modelling?
 
  A successful Cassandra deployment is critically dependent on careful data 
  modeling - who is responsible for modeling each of these tables, you and a 
  single, tightly-knit team with very common interests  and very specific 
  goals and SLAs or many different developers with different managers with 
  different goals such as SLAs?
 The latter.
 
  When you say multi-tenant, are you simply saying that each of your 
  organization's customers has their data segregated, or does each customer 
  have direct access to the cluster?
 Each organization's data is in the same cluster. No customer doesn't have 
 access to the cluster.
 
 Thanks,
 Arun
 
 On Wed, May 27, 2015 at 7:17 PM, Jack Krupansky jack.krupan...@gmail.com 
 wrote:
 Scalability of Cassandra refers primarily to number of rows and number of 
 nodes - to add more data, add more nodes.
 
 Column families? As opposed to tables? Are you using Thrift instead of CQL3? 
 You should be focusing on the latter, not the former.
 
 But either way, the general guidance is that there is no absolute limit of 
 tables per se, but low hundreds is the recommended limit, regardless of 
 whether how many key spaces they may be divided between. More than that is 
 an anti-pattern for Cassandra - maybe you can make it work for your 
 application, but it isn't recommended.
 
 A successful Cassandra deployment is critically dependent on careful data 
 modeling - who is responsible for modeling each of these tables, you and a 
 single, tightly-knit team with very common interests and very specific goals 
 and SLAs or many different developers with different managers with different 
 goals such as SLAs?
 
 When you say multi-tenant, are you simply saying that each of your 
 organization's customers has their data segregated, or does each customer 
 have direct access to the cluster?
 
 
 
 
 
 -- Jack Krupansky
 
 On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya chaitan64a...@gmail.com 
 wrote:
 Good Day Everyone,
 
 I am very happy with the (almost) linear scalability offered by C*. We had 
 a lot of problems with RDBMS.
 
 But, I heard that C* has a limit on number of column families that can be 
 created in a single cluster.
 The reason being each CF stores 1-2 MB on the JVM heap.
 
 In our use case, we have about 1+ CF and we want to support 
 multi-tenancy.
 (i.e 1 * no of tenants)
 
 We are new to C* and being from RDBMS background, I would like to 
 understand how to tackle this scenario from your advice.
 
 Our plan is to use Off-Heap memtable approach.
 http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1
 
 Each node in the cluster has following configuration
 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap)
 IMO, this should be able to support 1000 CF with no(very less) impact on 
 performance and startup time.
 
 We tackle multi-tenancy using different keyspaces.(Solution I found on the 
 web)
 
 Using this approach we can have 10 clusters doing the job. (We actually are 
 worried about the cost)
 
 Can you please help us evaluate this strategy? I want to hear communities 
 opinion on this.
 
 My major concerns being, 
 
 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000 CF 
 right?
 
 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number of 
 column families increase even when we use multiple keyspace.
 
 3. I 

cassandra.WriteTimeout: code=1100 [Coordinator node timed out waiting for replica nodes' responses]

2015-05-28 Thread Sachin PK
Hi I'm running Cassandra 2.1.5 ,(single datacenter ,4 node,16GB vps each
node ),I have given my configuration below, I'm using python driver on my
clients ,when i tried to insert  1049067 items I got an error.

cassandra.WriteTimeout: code=1100 [Coordinator node timed out waiting for
replica nodes' responses] message=Operation timed out - received only 0
responses. info={'received_responses': 0, 'required_responses': 1,
'consistency': 'ONE'}

also I'm getting an error when I check the count of CF from cqlsh

OperationTimedOut: errors={}, last_host=127.0.0.1

I've installed Datastax Ops center for monitoring  nodes , it shows about
1000/sec write request even the cluster is idle .

Is there any problem with my cassandra configuration


cluster_name: 'Test Cluster'

num_tokens: 256


hinted_handoff_enabled: true
.
max_hint_window_in_ms: 1080 # 3 hours

hinted_handoff_throttle_in_kb: 1024

max_hints_delivery_threads: 2

batchlog_replay_throttle_in_kb: 1024

authenticator: AllowAllAuthenticator

authorizer: AllowAllAuthorizer

permissions_validity_in_ms: 2000

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

data_file_directories:
- /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog

disk_failure_policy: stop

commit_failure_policy: stop

key_cache_size_in_mb:

key_cache_save_period: 14400

row_cache_size_in_mb: 0


row_cache_save_period: 0

counter_cache_size_in_mb:


counter_cache_save_period: 7200


saved_caches_directory: /var/lib/cassandra/saved_caches

commitlog_sync: periodic
commitlog_sync_period_in_ms: 1

commitlog_segment_size_in_mb: 32

seed_provider:

- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:

  - seeds: 10.xx.xx.xx,10.xx.xx.xxx

concurrent_reads: 45
concurrent_writes: 64
concurrent_counter_writes: 32

memtable_allocation_type: heap_buffers


memtable_flush_writers: 6

index_summary_capacity_in_mb:

index_summary_resize_interval_in_minutes: 60

trickle_fsync: false
trickle_fsync_interval_in_kb: 10240

storage_port: 7000

ssl_storage_port: 7001

listen_address: 10.xx.x.xxx

start_native_transport: true

native_transport_port: 9042

rpc_address: 0.0.0.0

rpc_port: 9160

broadcast_rpc_address: 10.xx.x.xxx

rpc_keepalive: true

rpc_server_type: sync

thrift_framed_transport_size_in_mb: 15

incremental_backups: false

snapshot_before_compaction: false

auto_snapshot: true

tombstone_warn_threshold: 1000
tombstone_failure_threshold: 10

column_index_size_in_kb: 64

batch_size_warn_threshold_in_kb: 5

compaction_throughput_mb_per_sec: 16

sstable_preemptive_open_interval_in_mb: 50

read_request_timeout_in_ms: 5000

range_request_timeout_in_ms: 1

write_request_timeout_in_ms: 2000

counter_write_request_timeout_in_ms: 5000

cas_contention_timeout_in_ms: 1000

truncate_request_timeout_in_ms: 6

request_timeout_in_ms: 1

cross_node_timeout: false

endpoint_snitch: SimpleSnitch


dynamic_snitch_update_interval_in_ms: 100

dynamic_snitch_reset_interval_in_ms: 60

dynamic_snitch_badness_threshold: 0.1

request_scheduler: org.apache.cassandra.scheduler.NoScheduler


server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra


# enable or disable client/server encryption.
client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra

internode_compression: all

inter_dc_tcp_nodelay: false


Re: cassandra.WriteTimeout: code=1100 [Coordinator node timed out waiting for replica nodes' responses]

2015-05-28 Thread Jean Tremblay
I have experienced similar results: OperationTimedOut after inserting many 
millions of records on a 5 nodes cluster, using Cassandra 2.1.5.
I rolled back to 2.1.4 using identically the same configuration as with 2.1.5 
and these timeout went away… This is not the solution to your problem but just 
to say that for me 2.1.5 seems to saturate when bulk inserting many million 
records.


 On 28 May 2015, at 15:15 , Sachin PK sachinpray...@gmail.com wrote:
 
 Hi I'm running Cassandra 2.1.5 ,(single datacenter ,4 node,16GB vps each node 
 ),I have given my configuration below, I'm using python driver on my clients 
 ,when i tried to insert  1049067 items I got an error.
  
 cassandra.WriteTimeout: code=1100 [Coordinator node timed out waiting for 
 replica nodes' responses] message=Operation timed out - received only 0 
 responses. info={'received_responses': 0, 'required_responses': 1, 
 'consistency': 'ONE'}
 
 also I'm getting an error when I check the count of CF from cqlsh
 
 OperationTimedOut: errors={}, last_host=127.0.0.1
 
 I've installed Datastax Ops center for monitoring  nodes , it shows about 
 1000/sec write request even the cluster is idle .
 
 Is there any problem with my cassandra configuration 
 
 
 cluster_name: 'Test Cluster'
 
 num_tokens: 256
 
 
 hinted_handoff_enabled: true
 .
 max_hint_window_in_ms: 1080 # 3 hours
 
 hinted_handoff_throttle_in_kb: 1024
 
 max_hints_delivery_threads: 2
 
 batchlog_replay_throttle_in_kb: 1024
 
 authenticator: AllowAllAuthenticator
 
 authorizer: AllowAllAuthorizer
 
 permissions_validity_in_ms: 2000
 
 partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 data_file_directories:
 - /var/lib/cassandra/data
 
 commitlog_directory: /var/lib/cassandra/commitlog
 
 disk_failure_policy: stop
 
 commit_failure_policy: stop
 
 key_cache_size_in_mb:
 
 key_cache_save_period: 14400
 
 row_cache_size_in_mb: 0
 
 
 row_cache_save_period: 0
 
 counter_cache_size_in_mb:
 
 
 counter_cache_save_period: 7200
 
 
 saved_caches_directory: /var/lib/cassandra/saved_caches
 
 commitlog_sync: periodic
 commitlog_sync_period_in_ms: 1
 
 commitlog_segment_size_in_mb: 32
 
 seed_provider:
 
 - class_name: org.apache.cassandra.locator.SimpleSeedProvider
   parameters:
 
   - seeds: 10.xx.xx.xx,10.xx.xx.xxx
 
 concurrent_reads: 45
 concurrent_writes: 64
 concurrent_counter_writes: 32
 
 memtable_allocation_type: heap_buffers
 
 
 memtable_flush_writers: 6
 
 index_summary_capacity_in_mb:
 
 index_summary_resize_interval_in_minutes: 60
 
 trickle_fsync: false
 trickle_fsync_interval_in_kb: 10240
 
 storage_port: 7000
 
 ssl_storage_port: 7001
 
 listen_address: 10.xx.x.xxx
 
 start_native_transport: true
 
 native_transport_port: 9042
 
 rpc_address: 0.0.0.0
 
 rpc_port: 9160
 
 broadcast_rpc_address: 10.xx.x.xxx
 
 rpc_keepalive: true
 
 rpc_server_type: sync
 
 thrift_framed_transport_size_in_mb: 15
 
 incremental_backups: false
 
 snapshot_before_compaction: false
 
 auto_snapshot: true
 
 tombstone_warn_threshold: 1000
 tombstone_failure_threshold: 10
 
 column_index_size_in_kb: 64
 
 batch_size_warn_threshold_in_kb: 5
 
 compaction_throughput_mb_per_sec: 16
 
 sstable_preemptive_open_interval_in_mb: 50
 
 read_request_timeout_in_ms: 5000
 
 range_request_timeout_in_ms: 1
 
 write_request_timeout_in_ms: 2000
 
 counter_write_request_timeout_in_ms: 5000
 
 cas_contention_timeout_in_ms: 1000
 
 truncate_request_timeout_in_ms: 6
 
 request_timeout_in_ms: 1
 
 cross_node_timeout: false
 
 endpoint_snitch: SimpleSnitch
 
 
 dynamic_snitch_update_interval_in_ms: 100 
 
 dynamic_snitch_reset_interval_in_ms: 60
 
 dynamic_snitch_badness_threshold: 0.1
 
 request_scheduler: org.apache.cassandra.scheduler.NoScheduler
 
 
 server_encryption_options:
 internode_encryption: none
 keystore: conf/.keystore
 keystore_password: cassandra
 truststore: conf/.truststore
 truststore_password: cassandra
 
 
 # enable or disable client/server encryption.
 client_encryption_options:
 enabled: false
 keystore: conf/.keystore
 keystore_password: cassandra
 
 internode_compression: all
 
 inter_dc_tcp_nodelay: false
 
 
 
 



Re: 10000+ CF support from Cassandra

2015-05-28 Thread Jonathan Haddad
While Graham's suggestion will let you collapse a bunch of tables into a
single one, it'll likely result in so many other problems it won't be worth
the effort.  I strongly advise against this approach.

First off, different workloads need different tuning.  Compaction
strategies, gc_grace_seconds, garbage collection, etc.  This is very
workload specific and you'll quickly find that fixing one person's problem
will negatively impact someone else.

Nested JSON using maps will not lead to a good data model, from a
performance perspective and will limited your flexibility.  As CQL becomes
more expressive you'll miss out on its querying potential as well as the
ability to *easily* query those tables from tools like Spark.  You'll also
hit the limit of the number of elements in a map, which to my knowledge
still exists in current C* versions.

If you're truly dealing with a lot of data, you'll be managing one cluster
that is thousands of nodes.  Managing clusters  1k is territory that only
a handful of people in the world are familiar with.  Even the guys at
Netflix stick to a couple hundred.

Managing multi tenancy for a hundred clients each with different version
requirements will be a nightmare from a people perspective.  You'll need
everyone to be in sync when you upgrade your cluster.  This is just a mess,
people are in general, pretty bad at this type of thing.  Coordinating a
hundred application upgrades (say, to use a newer driver version) is pretty
much impossible.

off heap in 2.1 isn't fully off heap.  Read
http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1 for
details.

If you hit any performance issues, GC, etc, you will take down your entire
business instead of just a small portion.  Everyone will be impacting
everyone else.  1 app's tombstones will cause compaction problems for
everyone using that table and be a disaster to try to fix.

A side note: you can get away with more than 8GB of memory if you use
G1GC.  In fact, it only really works if you use  8GB.  Using ParNew  CMS,
tuning the JVM is a different story.  The following 2 pages are a good read
if you're interested in such details.

https://issues.apache.org/jira/browse/CASSANDRA-8150
http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html

My recommendation: Separate your concerns, put each (or a handful) of
applications on each cluster and maintain multiple clusters.  Put each
application in a different keyspace, model normally.  If you need to move
an app off onto it's own cluster, do so via setting up a second DC for that
keyspace, replicate, then shift over.

Jon


On Thu, May 28, 2015 at 3:06 AM Graham Sanderson gra...@vast.com wrote:

 Depending on your use case and data types (for example if you can have a
 minimally
 Nested Json representation of the objects;
 Than you could go with a common mapstring,string representation where
 keys are top love object fields and values are valid Json literals as
 strings; eg unquoted primitives, quoted strings, unquoted arrays or other
 objects

 Each top level field is then independently updatable - which may be
 beneficial (and allows you to trivially keep historical versions of objects
 of that is a requirement)

 If you are updating the object in its entirety on save then simply store
 the entire object in a single cql field, and denormalize any search fields
 you may need (which you kinda want to do anyway)

 Sent from my iPhone

 On May 28, 2015, at 1:49 AM, Arun Chaitanya chaitan64a...@gmail.com
 wrote:

 Hello Jack,

  Column families? As opposed to tables? Are you using Thrift instead of
 CQL3? You should be focusing on the latter, not the former.
 We have an ORM developed in our company, which maps each DTO to a column
 family. So, we have many column families. We are using CQL3.

  But either way, the general guidance is that there is no absolute limit
 of tables per se, but low hundreds is the recommended limit, regardless
 of whether how many key spaces they may be divided
  between. More than that is an anti-pattern for Cassandra - maybe you can
 make it work for your application, but it isn't recommended.
 You want to say that most cassandra users don't have more than 2-300
 column families? Is this achieved through careful data modelling?

  A successful Cassandra deployment is critically dependent on careful
 data modeling - who is responsible for modeling each of these tables, you
 and a single, tightly-knit team with very common interests  and very
 specific goals and SLAs or many different developers with different
 managers with different goals such as SLAs?
 The latter.

  When you say multi-tenant, are you simply saying that each of your
 organization's customers has their data segregated, or does each customer
 have direct access to the cluster?
 Each organization's data is in the same cluster. No customer doesn't have
 access to the cluster.

 Thanks,
 Arun

 On Wed, May 27, 2015 at 7:17 PM, Jack Krupansky