Speeding up schema generation during tests

2016-10-18 Thread Ali Akhtar
Is there a way to speed up the creation of keyspace + tables during
integration tests? I am using an RF of 1, with SimpleStrategy, but it still
takes upto 10-15 seconds.


Kafka(9.0.1) error : org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 1164731757 larger than 104857600)

2016-10-18 Thread Arun Rai
Hello Kafka/Cassandra experts,


*I am getting below error….*

org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(
NetworkReceive.java:91)

at org.apache.kafka.common.network.NetworkReceive.
readFrom(NetworkReceive.java:71)

at org.apache.kafka.common.network.KafkaChannel.receive(
KafkaChannel.java:153)

at org.apache.kafka.common.network.KafkaChannel.read(
KafkaChannel.java:134)

at org.apache.kafka.common.network.Selector.poll(Selector.java:286)

at kafka.network.Processor.run(SocketServer.scala:413)

at java.lang.Thread.run(Thread.java:745)

*[2016-10-06 22:17:42,001] WARN Unexpected error from /10.61.48.28
; closing connection
(org.apache.kafka.common.network.Selector)*

org.apache.kafka.common.network.InvalidReceiveException: Invalid receive
(size = 1969447758 larger than 104857600)

at



Below is my case.

*Cluster configuration :*



Kafka cluster : 3 vms with each one has 4 vcpu, 8 gb RAM

Cassandra : 4 vms with each one has 8 vcpu, 16 gb RAM



1.   Producer: Producer is rdkafka client, which produce the
messages(protobuf) to kafka topic(with 12 partitions). Please message size
always vary. It’s not fixed size message. Max message size could be from 1
mb to 3 mb.

2.   Consumer client : Client is java program who reads the messages
and load into Cassandra database.

*3.   **I get this error only when both producer and consumer is
running together , after 15-20 minute later I see this error in kafka log
files.*

4.   If I let consumer or producer run at single time.. I don’t get
this error.

5.   I do not get this error if Consumer only consume the messages and
do not insert into Cassandra.



Below is my kafka server configuration props:





broker.id=1

log.dirs=/data/kafka



host.name=10.61.19.87

port=9092



advertised.host.name=10.61.19.87

advertised.port=9092

listeners=PLAINTEXT://0.0.0.0:9092



delete.topic.enable=true



# Replication configurations

num.replica.fetchers=4

replica.fetch.max.bytes=1048576

replica.fetch.wait.max.ms=500

replica.high.watermark.checkpoint.interval.ms=5000

replica.socket.timeout.ms=3

replica.socket.receive.buffer.bytes=65536

replica.lag.time.max.ms=1



controller.socket.timeout.ms=3

controller.message.queue.size=10



# Log configuration

num.partitions=12

message.max.bytes=100

auto.create.topics.enable=true

log.index.interval.bytes=4096

log.index.size.max.bytes=10485760

log.retention.hours=24

log.flush.interval.ms=1

log.flush.interval.messages=2

log.flush.scheduler.interval.ms=2000

log.roll.hours=24

log.retention.check.interval.ms=30

log.segment.bytes=1073741824



# ZK configuration

zookeeper.connect=10.61.19.84:2181,10.61.19.86:2181

zookeeper.connection.timeout.ms=18000

zookeeper.sync.time.ms=2000



# Socket server configuration

num.io.threads=8

num.network.threads=8

socket.request.max.bytes=104857600

socket.receive.buffer.bytes=1048576

socket.send.buffer.bytes=1048576

queued.max.requests=16

fetch.purgatory.purge.interval.requests=100

producer.purgatory.purge.interval.requests=100



Any help will be really appreciated.

Arun Rai


Alter Keyspace using cqlengine python driver

2016-10-18 Thread Sandeep Dommaraju (BLOOMBERG/ 731 LEX)
Hi Guys,

Is there a way to ALTER KEYSPACE using cqlengine python driver?

Schema management for cqlengine supports CREATE/DROP KEYSPACE but does not seem 
to support ALTER.

https://github.com/datastax/python-driver/blob/master/cassandra/cqlengine/management.py
 
https://datastax.github.io/python-driver/api/cassandra/cqlengine/management.html

Am I missing something? Please suggest.

Thanks,
Sandeep

Re: wide rows

2016-10-18 Thread Yabin Meng
With CQL data modeling, everything is called a "row". But really in CQL, a
row is just a logical concept. So if you think of "wide partition" instead
of "wide row" (partition is what is determined by the has index of the
partition key), it will help the understanding a bit: one wide-partition
may contain multiple logical CQL rows - each CQL row just represents one
actual storage column of the partition.

Time-series data is usually a good fit for "wide-partition" data modeling,
but please remember that don't go too crazy with it.

Cheers,

Yabin

On Tue, Oct 18, 2016 at 11:23 AM, DuyHai Doan  wrote:

> // user table: skinny partition
> CREATE TABLE user (
> user_id uuid,
> firstname text,
> lastname text,
> 
> PRIMARY KEY ((user_id))
> );
>
> // sensor_data table: wide partition
> CREATE TABLE sensor_data (
>  sensor_id uuid,
>  date timestamp,
>  value double,
>  PRIMARY KEY ((sensor_id),  date)
> );
>
> On Tue, Oct 18, 2016 at 5:07 PM, S Ahmed  wrote:
>
>> Hi,
>>
>> Can someone clarify how you would model a "wide" row cassandra table?
>> From what I understand, a wide row table is where you keep appending
>> columns to a given row.
>>
>> The other way to model a table would be the "regular" style where each
>> row contains data so you would during a SELECT you would want multiple rows
>> as oppose to a wide row where you would get a single row, but a subset of
>> columns.
>>
>> Can someone show a simple data model that compares both styles?
>>
>> Thanks.
>>
>
>


Fwd: WARN [SharedPool-Worker-3] AbstractTracingAwareExecutorService.java

2016-10-18 Thread James Joseph
I have seen the following warn in system.log, as a temporary turn around i
 increased commitlog_size in cassandra.yaml to 64, but how can i trace it
down ??? which appilcation is trying to write large writes and to which
keyspace and table it is trying to write ??


WARN  [SharedPool-Worker-3] 2016-10-05 03:46:22,363
AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread
Thread[SharedPool-Worker-3,5,main]: {} java.lang.IllegalArgumentException:
Mutation of 19711728 bytes is too large for the maxiumum size of 16777216
  at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:221)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:383)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:363)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at org.apache.cassandra.db.Mutation.apply(Mutation.java:214)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at 
org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source) ~[na:1.8.0_92]
  at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorSe
rvice$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at java.lang.Thread.run(Unknown Source) [na:1.8.0_92]


Thanks
James.


Re: wide rows

2016-10-18 Thread DuyHai Doan
// user table: skinny partition
CREATE TABLE user (
user_id uuid,
firstname text,
lastname text,

PRIMARY KEY ((user_id))
);

// sensor_data table: wide partition
CREATE TABLE sensor_data (
 sensor_id uuid,
 date timestamp,
 value double,
 PRIMARY KEY ((sensor_id),  date)
);

On Tue, Oct 18, 2016 at 5:07 PM, S Ahmed  wrote:

> Hi,
>
> Can someone clarify how you would model a "wide" row cassandra table?
> From what I understand, a wide row table is where you keep appending
> columns to a given row.
>
> The other way to model a table would be the "regular" style where each row
> contains data so you would during a SELECT you would want multiple rows as
> oppose to a wide row where you would get a single row, but a subset of
> columns.
>
> Can someone show a simple data model that compares both styles?
>
> Thanks.
>


RE: wide rows

2016-10-18 Thread S Ahmed
Hi,

Can someone clarify how you would model a "wide" row cassandra table?  From
what I understand, a wide row table is where you keep appending columns to
a given row.

The other way to model a table would be the "regular" style where each row
contains data so you would during a SELECT you would want multiple rows as
oppose to a wide row where you would get a single row, but a subset of
columns.

Can someone show a simple data model that compares both styles?

Thanks.


Re: Cassandra installation best practices

2016-10-18 Thread kurt Greaves
Mehdi,

Nothing as detailed as Oracle's OFA currently exists. You can probably also
find some useful information here:
https://docs.datastax.com/en/landing_page/doc/landing_page/planning/planningAbout.html



Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 18 October 2016 at 07:38, Mehdi Bada  wrote:

> Hi Brooke,
>
> Thank you for your advices. Finally, no technical standards (provided by
> DataStax or Apache) exists for deploying Cassandra in a production
> environment?
>
> In comparison with some RDBMS (Oracle, MySQL), some standards (OFA for
> instance) exists and are provided by Oracle.
>
> Best regards
> Mehdi
>
> ---
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 499 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
> --
> *From: *"Brooke Jensen" 
> *To: *"user" 
> *Sent: *Tuesday, October 18, 2016 8:59:14 AM
> *Subject: *Re: Cassandra installation best practices
>
> Hi Mehdi,
> In addition, give some thought to your cluster topology. For maximum fault
> tolerance and availability I would recommend using at least three nodes
> with a replication factor of three. Ideally, you should also use Cassandra
> logical racks. This will reduce the risk of outage and make ongoing
> management of the cluster a lot easier.
>
>
> *Brooke Jensen*
> VP Technical Operations & Customer Services
> www.instaclustr.com | support.instaclustr.com
> 
>
> This email has been sent on behalf of Instaclustr Limited (Australia) and
> Instaclustr Inc (USA). This email and any attachments may contain
> confidential and legally privileged information.  If you are not the
> intended recipient, do not copy or disclose its content, but please reply
> to this email immediately and highlight the error to the sender and then
> immediately delete the message.
>
> On 18 October 2016 at 04:02, Anuj Wadehra  wrote:
>
>> Hi Mehdi,
>>
>> You can refer https://docs.datastax.com/en/landing_page/doc/landing_page/
>> recommendedSettings.html .
>>
>> Thanks
>> Anuj
>>
>> On Mon, 17 Oct, 2016 at 10:20 PM, Mehdi Bada
>>
>>  wrote:
>> Hi all,
>>
>> It is exist some best practices when installing Cassandra on production
>> environment? Some standard to follow? For instance, the file system type
>> etc..
>>
>>
>


Re: Adding disk capacity to a running node

2016-10-18 Thread Vladimir Yudovin
 On Mon, 17 Oct 2016 15:59:41 -0400Ben Bromhead 
b...@instaclustr.com wrote 

For the times that AWS retires an instance, you get plenty of notice and it's 
generally pretty rare. We run over 1000 instances on AWS and see one forced 
retirement a month if that. We've never had an instance pulled from under our 
feet without warning.




Yes, in case of planned event. But in case of some hardware failure it can 
happen. And it shouldn't be some catastrophe affecting the whole availability 
zone. Just failure of singe blade. 





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.






 On Mon, 17 Oct 2016 15:59:41 -0400Ben Bromhead 
b...@instaclustr.com wrote 




Yup as everyone has mentioned ephemeral are fine if you run in multiple AZs... 
which is pretty much mandatory for any production deployment in AWS (and other 
cloud providers) . i2.2xls are generally your best bet for high read throughput 
applications on AWS. 



Also on AWS ephemeral storage will generally survive a user initiated restart. 
For the times that AWS retires an instance, you get plenty of notice and it's 
generally pretty rare. We run over 1000 instances on AWS and see one forced 
retirement a month if that. We've never had an instance pulled from under our 
feet without warning.



To add another option for the original question, one thing you can do is to 
attach a large EBS drive to the instance and bind mount it to the directory for 
the table that has the very large SSTables. You will need to copy data across 
to the EBS volume. Let everything compact and then copy everything back and 
detach EBS. Latency may be higher than normal on the node you are doing this on 
(especially if you are used to i2.2xl performance). 



This is something we often have to do, when we encounter pathological 
compaction situations associated with bootstrapping, adding new DCs or STCS 
with a dominant table or people ignore high disk usage warnings :)




On Mon, 17 Oct 2016 at 12:43 Jeff Jirsa jeff.ji...@crowdstrike.com 
wrote:




-- 

Ben Bromhead

CTO | Instaclustr

+1 650 284 9692

Managed Cassandra / Spark on AWS, Azure and Softlayer




Ephemeral is fine, you just need to have enough replicas (in enough AZs and 
enough regions) to tolerate instances being terminated.

 

 

 

From: Vladimir Yudovin vla...@winguzone.com
Reply-To: "user@cassandra.apache.org" user@cassandra.apache.org
Date: Monday, October 17, 2016 at 11:48 AM
To: user user@cassandra.apache.org




Subject: Re: Adding disk capacity to a running node






 


It's extremely unreliable to use ephemeral (local) disks. Even if you don't 
stop instance by yourself, it can be restarted on different server in case of 
some hardware failure or AWS initiated update. So all node data will be lost.







 


Best regards, Vladimir Yudovin, 


Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.


 


 


 On Mon, 17 Oct 2016 14:45:00 -0400Seth Edwards s...@pubnub.com 
wrote 



 


These are i2.2xlarge instances so the disks currently configured as ephemeral 
dedicated disks. 


 


On Mon, Oct 17, 2016 at 11:34 AM, Laing, Michael 
michael.la...@nytimes.com wrote:


 





You could just expand the size of your ebs volume and extend the file system. 
No data is lost - assuming you are running Linux.


 


 


On Monday, October 17, 2016, Seth Edwards s...@pubnub.com wrote:


We're running 2.0.16. We're migrating to a new data model but we've had an 
unexpected increase in write traffic that has caused us some capacity issues 
when we encounter compactions. Our old data model is on STCS. We'd like to add 
another ebs volume (we're on aws) to our JBOD config and hopefully avoid any 
situation where we run out of disk space during a large compaction. It appears 
that the behavior we are hoping to get is actually undesirable and removed in 
3.2. It still might be an option for us until we can finish the migration. 


 


I'm not familiar with LVM so it may be a bit risky to try at this point. 



 


On Mon, Oct 17, 2016 at 9:42 AM, Yabin Meng yabinm...@gmail.com wrote:


I assume you're talking about Cassandra JBOD (just a bunch of disk) setup 
because you do mention it as adding it to the list of data directories. If this 
is the case, you may run into issues, depending on your C* version. Check this 
out: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_improving-2Djbodd=DQMFaQc=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3Mr=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3owm=ixOxpX-xpw1dJZNpaMT3mepToWX8gzmsVaXFizQLzoUs=e_rkkJ8RHJXe4KvyNfeRWQkdy-zZzOnaMDQle3nN808e=.


 


Or another approach is to use LVM to manage multiple devices into a single 
mount point. If you do so, from what Cassandra can see is just simply increased 
disk storage space and there should should have no problem.


 


Hope this helps,


 


Yabin



 


On 

Re: Cassandra installation best practices

2016-10-18 Thread Mehdi Bada
Hi Brooke, 

Thank you for your advices. Finally, no technical standards (provided by 
DataStax or Apache) exists for deploying Cassandra in a production environment? 

In comparison with some RDBMS (Oracle, MySQL), some standards (OFA for 
instance) exists and are provided by Oracle. 

Best regards 
Mehdi 

--- 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 499 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 




From: "Brooke Jensen"  
To: "user"  
Sent: Tuesday, October 18, 2016 8:59:14 AM 
Subject: Re: Cassandra installation best practices 

Hi Mehdi, 
In addition, give some thought to your cluster topology. For maximum fault 
tolerance and availability I would recommend using at least three nodes with a 
replication factor of three. Ideally, you should also use Cassandra logical 
racks. This will reduce the risk of outage and make ongoing management of the 
cluster a lot easier. 


Brooke Jensen 
VP Technical Operations & Customer Services 
www.instaclustr.com | support.instaclustr.com 

This email has been sent on behalf of Instaclustr Limited (Australia) and 
Instaclustr Inc (USA). This email and any attachments may contain confidential 
and legally privileged information. If you are not the intended recipient, do 
not copy or disclose its content, but please reply to this email immediately 
and highlight the error to the sender and then immediately delete the message. 

On 18 October 2016 at 04:02, Anuj Wadehra < anujw_2...@yahoo.co.in > wrote: 


Hi Mehdi, 

You can refer 
https://docs.datastax.com/en/landing_page/doc/landing_page/recommendedSettings.html
 . 

Thanks 
Anuj 

On Mon, 17 Oct, 2016 at 10:20 PM, Mehdi Bada 

BQ_BEGIN

< mehdi.b...@dbi-services.com > wrote: 
Hi all, 

It is exist some best practices when installing Cassandra on production 
environment? Some standard to follow? For instance, the file system type etc.. 





BQ_END




Re: Cassandra installation best practices

2016-10-18 Thread Brooke Jensen
Hi Mehdi,

In addition, give some thought to your cluster topology. For maximum fault
tolerance and availability I would recommend using at least three nodes
with a replication factor of three. Ideally, you should also use Cassandra
logical racks. This will reduce the risk of outage and make ongoing
management of the cluster a lot easier.


*Brooke Jensen*
VP Technical Operations & Customer Services
www.instaclustr.com | support.instaclustr.com


This email has been sent on behalf of Instaclustr Limited (Australia) and
Instaclustr Inc (USA). This email and any attachments may contain
confidential and legally privileged information.  If you are not the
intended recipient, do not copy or disclose its content, but please reply
to this email immediately and highlight the error to the sender and then
immediately delete the message.

On 18 October 2016 at 04:02, Anuj Wadehra  wrote:

> Hi Mehdi,
>
> You can refer https://docs.datastax.com/en/landing_page/doc/landing_page/
> recommendedSettings.html .
>
> Thanks
> Anuj
>
> On Mon, 17 Oct, 2016 at 10:20 PM, Mehdi Bada
>
>  wrote:
> Hi all,
>
> It is exist some best practices when installing Cassandra on production
> environment? Some standard to follow? For instance, the file system type
> etc..
>
>