from:"Andrey Ilinykh"

Re: Cassandra: Key-value or Column?

2017-03-28 Thread Andrey Ilinykh

yes, cassandra is a key-value. you can think about it as a wide row storage
( row key, column key) -> value

On Tue, Mar 28, 2017 at 10:19 AM, Les Hartzman  wrote:

> I was doing some research on different NoSQL DBs and found this article at
> Datastax, https://academy.datastax.com/planet-cassandra/what-is-nosql
>
> In it it states that Cassandra is a key-value store and not a column
> (wide-column) store. So my question is why? Is the document in error or is
> there some subtlety that I'm missing?
>
> Thanks.
>
> Les
>
>

Re: Splitting Cassandra Cluster between AWS availability zones

2017-03-07 Thread Andrey Ilinykh

I'd recommend three availability zones. In this case if you loose one AZ
you still have a quorum (assuming replication factor of 3)

Andrey

On Tue, Mar 7, 2017 at 9:05 AM, Ney, Richard  wrote:

> We’ve collapsed our 2 DC – 3 node Cassandra clusters into a single 6 node
> Cassandra cluster split between two AWS availability zones.
>
>
>
> Are there any behaviors we need to take into account to ensure the
> Cassandra cluster stability with this configuration?
>
>
>
> *RICHARD NEY*
>
> TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT
>
> *UNITED STATES*
>
> *richard@aspect.com *
>
> *aspect.com *
>
>
>
> [image: mailSigLogo-rev.jpg]
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>

Re: Left Cassandra mailing list

2016-08-02 Thread Andrey Ilinykh

To remove your address from the list, send a message to:
   

On Mon, Aug 1, 2016 at 11:29 PM, Mohammad Kermani <98kerm...@gmail.com>
wrote:

> How can I leave Cassandra mailing list?
>
> I get some emails every day  and currently I do not have time for it
>
>

Re: who does generate timestamp during the write?

2015-09-04 Thread Andrey Ilinykh

Your application.

On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi  wrote:

> Dear folks,
>
> When we hear about the notion of Last-Write-Wins in Cassandra according to
> timestamp, *who does generate this timestamp during the write,
> coordinator or each individual replica in which the write is going to be
> stored?*
>
>
> *Regards,*
>
>
>
> *Ibrahim*
>

Re: who does generate timestamp during the write?

2015-09-04 Thread Andrey Ilinykh

Coordinator doesn't generate timestamp, it is generated by client.

On Fri, Sep 4, 2015 at 10:37 AM, ibrahim El-sanosi <ibrahimsaba...@gmail.com
> wrote:

> Ok, why coordinator does generate timesamp, as the write is a part of
> Cassandra process after client submit the request to Cassandra?
>
> On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh <ailin...@gmail.com> wrote:
>
>> Your application.
>>
>> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi <
>> ibrahimsaba...@gmail.com> wrote:
>>
>>> Dear folks,
>>>
>>> When we hear about the notion of Last-Write-Wins in Cassandra according
>>> to timestamp, *who does generate this timestamp during the write,
>>> coordinator or each individual replica in which the write is going to be
>>> stored?*
>>>
>>>
>>> *Regards,*
>>>
>>>
>>>
>>> *Ibrahim*
>>>
>>
>>
>

Re: who does generate timestamp during the write?

2015-09-04 Thread Andrey Ilinykh

I meant thrift based api. If we are talking about CQL then timestamps are
generated by node you are connected to. This is a "client".

On Fri, Sep 4, 2015 at 10:49 AM, ibrahim El-sanosi <ibrahimsaba...@gmail.com
> wrote:

> Hi Andrey,
>
> I just came across this articale "
>
> "Each cell in a CQL table has a corresponding timestamp
> which is taken from the clock on *the Cassandra node* *that orchestrates the
> write.* When you are reading from a Cassandra cluster the node that
> coordinates the read will compare the timestamps of the values it fetches.
> Last write(=highest timestamp) wins and will be returned to the client."
>
> What do you think?
>
> "
>
> On Fri, Sep 4, 2015 at 6:41 PM, Andrey Ilinykh <ailin...@gmail.com> wrote:
>
>> Coordinator doesn't generate timestamp, it is generated by client.
>>
>> On Fri, Sep 4, 2015 at 10:37 AM, ibrahim El-sanosi <
>> ibrahimsaba...@gmail.com> wrote:
>>
>>> Ok, why coordinator does generate timesamp, as the write is a part of
>>> Cassandra process after client submit the request to Cassandra?
>>>
>>> On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh <ailin...@gmail.com>
>>> wrote:
>>>
>>>> Your application.
>>>>
>>>> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi <
>>>> ibrahimsaba...@gmail.com> wrote:
>>>>
>>>>> Dear folks,
>>>>>
>>>>> When we hear about the notion of Last-Write-Wins in Cassandra
>>>>> according to timestamp, *who does generate this timestamp during the
>>>>> write, coordinator or each individual replica in which the write is going
>>>>> to be stored?*
>>>>>
>>>>>
>>>>> *Regards,*
>>>>>
>>>>>
>>>>>
>>>>> *Ibrahim*
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Question about incremental backup

2014-08-23 Thread Andrey Ilinykh

keep in mind backing up SSTables is not enough. To have truly incremental
backup you have to store commit logs also.

Thank you,
  Andrey


On Sat, Aug 23, 2014 at 11:30 AM, Robert Coli rc...@eventbrite.com wrote:

 On Sat, Aug 23, 2014 at 8:06 AM, Jens Rantil jens.ran...@tink.se wrote:

  I am setting backup and restoration tooling for a Cassandra cluster and
 have a specific question regarding incremental backup.

 Let’s say I’m running incremental backups and take a snapshot. At the
 exact(ish) same time as my snapshot it taken another incremental *.db file
 is hard linked into the backups directory. My question is, how do I know
 which snapshot my incremental file belongs to?


 Tablesnap avoids this race by snapshotting files directly from the data
 directory, and backing it up with a meta-information file that contains a
 list of all SSTables in the data directory at the time it notices a new
 one. You can probably do something similar with the incremental snapshot
 system, but you might want to consider if you need to. :D

 https://github.com/JeremyGrosser/tablesnap

 =Rob

Re: Nodetool Repair questions

2014-08-12 Thread Andrey Ilinykh

1. You don't have to repair if you use QUORUM consistency and you don't
delete data.
2.Performance depends on size of data each node has. It's very difficult to
predict. It may take days.

Thank you,
  Andrey


On Tue, Aug 12, 2014 at 2:06 PM, Viswanathan Ramachandran 
vish.ramachand...@gmail.com wrote:

 Some questions on nodetool repair.

 1. This tool repairs inconsistencies across replicas of the row. Since
 latest update always wins, I dont see inconsistencies other than ones
 resulting from the combination of deletes, tombstones, and crashed nodes.
 Technically, if data is never deleted from cassandra, then nodetool repair
 does not need to be run at all. Is this understanding correct? If wrong,
 can anyone provide other ways inconsistencies could occur?

 2. Want to understand the performance of 'nodetool repair' in a Cassandra
 multi data center setup. As we add nodes to the cluster in various data
 centers, does the performance of nodetool repair on each node increase
 linearly, or is it quadratic ? The essence of this question is: If I have a
 keyspace with x number of replicas in each data center, do I have to deal
 with an upper limit on the number of data centers/nodes?


 Thanks

 Vish

Re: Nodetool Repair questions

2014-08-12 Thread Andrey Ilinykh

On Tue, Aug 12, 2014 at 4:46 PM, Viswanathan Ramachandran 
vish.ramachand...@gmail.com wrote:

 Andrey, QUORUM consistency and no deletes makes perfect sense.
 I believe we could modify that to EACH_QUORUM or QUORUM consistency and no
 deletes - isnt that right?


 yes.

Re: too many open files

2014-08-08 Thread Andrey Ilinykh

You may have this problem if your client doesn't reuse the connection but
opens new every type. So, run netstat and check the number of established
connections. This number should not be big.

Thank you,
  Andrey


On Fri, Aug 8, 2014 at 12:35 PM, Marcelo Elias Del Valle 
marc...@s1mbi0se.com.br wrote:

 Hi,

 I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too
 many open files exceptions when I try to perform a large number of
 operations in my 10 node cluster.

 I saw the documentation
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html
 and I have set everything to the recommended settings, but I keep getting
 the errors.

 In the documentation it says: Another, much less likely possibility, is
 a file descriptor leak in Cassandra. Run lsof -n | grep java to check
 that the number of file descriptors opened by Java is reasonable and
 reports the error if the number is greater than a few thousand.

 I guess it's not the case, or else a lot of people would be complaining
 about it, but I am not sure what I could do to solve the problem.

 Any hint about how to solve it?

 My client is written in python and uses Cassandra Python Driver. Here are
 the exceptions I am having in the client:
 [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.151, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.142, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.143, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.142, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.145, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.144, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.148, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.146, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.77, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.76, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.75, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.142, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.185, scheduling retry in 600.0
 seconds: [Errno 24] Too many open files
 [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.144, scheduling retry in 512.0
 seconds: Timed out connecting to 200.200.200.144
 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error
 attempting to reconnect to 200.200.200.77, scheduling retry in 512.0
 seconds: Timed out connecting to 200.200.200.77


 And here is the exception I am having in the server:

  WARN [Native-Transport-Requests:163] 2014-08-08 14:27:30,499
 BatchStatement.java (line 223) Batch of prepared statements for
 [identification.entity_lookup, identification.entity] is of size 25216,
 exceeding specified threshold of 5120 by 20096.
 ERROR [Native-Transport-Requests:150] 2014-08-08 14:27:31,611
 ErrorMessage.java (line 222) Unexpected exception during request
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun.nio.ch.IOUtil.read(IOUtil.java:192)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
 at
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)

Re: EC2 cassandra cluster node address problem

2014-06-25 Thread Andrey Ilinykh

yes, of course. Private ip is real ip address of node. Cassandra can listen
on this ip only. elastic ip is external. It belongs to AWS firewall. It is
similar to your web router. You can forward your external port to local
one, but application running on your local node doesn't know anything about
it.


On Wed, Jun 25, 2014 at 1:25 PM, Huiliang Zhang zhl...@gmail.com wrote:

 Thanks. In fact, it is Cassandra that returns private ip of nodes to my
 program by:

 client.describe_ring(keyspace)

 Then the program will start communicate with Cassandra through the private
 ips. One way is to translate the ips myself.


 On Tue, Jun 24, 2014 at 10:40 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 you can set rpc_address to 0.0.0.0, then it will listen on all
 interfaces. Also you have to modify security group settings to allow
 incoming connection for port 9160. But it is a really bad idea. By this
 way you open your cluster to whole world, ssh tunnel is the best way.


 On Tue, Jun 24, 2014 at 10:01 PM, Huiliang Zhang zhl...@gmail.com
 wrote:

 Thanks. Is there a way to configure Cassandra to use elastic ip instead
 of private ip?


 On Tue, Jun 24, 2014 at 9:29 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 Cassandra knows nothing about elastic ip. You have to use ssh tunnel or
 run your client on ec2 instance.

 Thank you,
   Andrey


 On Tue, Jun 24, 2014 at 8:55 PM, Huiliang Zhang zhl...@gmail.com
 wrote:

 Hi,

 I am using Cassandra on EC2 instances. My cassandra always returns
 private ips of the instances to the thrift program. Then the program 
 cannot
 connect to the private ips.

 I already changed the
 rpc_address: elastic ip
 rpc_address: elastic ip

 Then I restarted the cassandra cluster. But the system.peers still
 save the private ips as peer address.

 How to fix this?

 Thanks,
 Huiliang

Re: EC2 cassandra cluster node address problem

2014-06-24 Thread Andrey Ilinykh

Cassandra knows nothing about elastic ip. You have to use ssh tunnel or run
your client on ec2 instance.

Thank you,
  Andrey


On Tue, Jun 24, 2014 at 8:55 PM, Huiliang Zhang zhl...@gmail.com wrote:

 Hi,

 I am using Cassandra on EC2 instances. My cassandra always returns private
 ips of the instances to the thrift program. Then the program cannot connect
 to the private ips.

 I already changed the
 rpc_address: elastic ip
 rpc_address: elastic ip

 Then I restarted the cassandra cluster. But the system.peers still save
 the private ips as peer address.

 How to fix this?

 Thanks,
 Huiliang

Re: EC2 cassandra cluster node address problem

2014-06-24 Thread Andrey Ilinykh

you can set rpc_address to 0.0.0.0, then it will listen on all interfaces.
Also you have to modify security group settings to allow incoming
connection for port 9160. But it is a really bad idea. By this way you open
your cluster to whole world, ssh tunnel is the best way.


On Tue, Jun 24, 2014 at 10:01 PM, Huiliang Zhang zhl...@gmail.com wrote:

 Thanks. Is there a way to configure Cassandra to use elastic ip instead of
 private ip?


 On Tue, Jun 24, 2014 at 9:29 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 Cassandra knows nothing about elastic ip. You have to use ssh tunnel or
 run your client on ec2 instance.

 Thank you,
   Andrey


 On Tue, Jun 24, 2014 at 8:55 PM, Huiliang Zhang zhl...@gmail.com wrote:

 Hi,

 I am using Cassandra on EC2 instances. My cassandra always returns
 private ips of the instances to the thrift program. Then the program cannot
 connect to the private ips.

 I already changed the
 rpc_address: elastic ip
 rpc_address: elastic ip

 Then I restarted the cassandra cluster. But the system.peers still save
 the private ips as peer address.

 How to fix this?

 Thanks,
 Huiliang

Re: Cass 1.2.11: Replacing a node procedure

2014-02-13 Thread Andrey Ilinykh

decommission
http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node


On Thu, Feb 13, 2014 at 2:28 PM, Oleg Dulin oleg.du...@gmail.com wrote:

 Here is what I am thinking.

 1) Add the new node with token-1 of the old one and let it bootstrap.
 2) Once it bootstrapped, remove the old node from the ring

 Now, it is #2 that I need clarification on.

 Do I use decommission or remove ? How long should I expect those
 processes to run ?

 Regards,
 Oleg




 On 2014-02-13 22:01:10 +, Oleg Dulin said:

  Dear Distinguished Colleagues:

 I have a situation where in the production environment one of the
 machines is overheating and needs to be serviced. Now, the landscape looks
 like this:

 4 machines in primary DC, 4 machiens in DR DC. Replication factor is 2.

 I also have a QA environment with 4 machines in a single DC, RF=2 as well.

 We need to work with the manufaturer to figure out what is wrong with the
 machine. The proposed course of action is the following:

 1) Take the faulty prod machine (lets call it X) out of production.
 2) Take a healthy QA machine (lets call it Y) out of QA
 3) Plug QA machine into the prod cluster and rebuild it.
 4) Plug prod machine into the QA cluster and leave it alone and let the
 manufacturer service it to their liking until they say it is fixed, at
 which point we will just leave it in QA.

 So basically we are talking about replacing a dead node.

 I found this: http://www.datastax.com/documentation/cassandra/1.2/
 webhelp/index.html#cassandra/operations/ops_replace_node_t.html

 I am not using vnodes, just plain vanilla tokens and RandomPartitioner.
 So that procedure doesn't apply. I need some help putting together a
 step-by-step checklist what I would need to do.



 --
 Regards,
 Oleg Dulin
 http://www.olegdulin.com

Re: in AWS is it worth trying to talk to a server in the same zone as your client?

2014-02-12 Thread Andrey Ilinykh

yes, sure. Taking data from the same zone will reduce latency and save you
some money.


On Wed, Feb 12, 2014 at 10:13 AM, Brian Tarbox tar...@cabotresearch.comwrote:

 We're running a C* cluster with 6 servers spread across the four us-east1
 zones.

 We also spread our clients (hundreds of them) across the four zones.

 Currently we give our clients a connection string listing all six servers
 and let C* do its thing.

 This is all working just fine...and we're paying a fair bit in AWS
 transfer costs.  There is a suspicion that this transfer cost is driven by
 us passing data around between our C* servers and clients.

 Would there be any value to trying to get a client to talk to one of the
 C* servers in its own zone?

 I understand (at least partially!) about coordinator nodes and replication
 and know that no matter which server is the coordinator for an operation
 replication may cause bits to get transferred to/from servers in other
 zones.  Having said that...is there a chance that trying to encourage a
 client to initially contact a server in its own zone would help?

 Thank you,

 Brian Tarbox

Re: in AWS is it worth trying to talk to a server in the same zone as your client?

2014-02-12 Thread Andrey Ilinykh

I think you are mistaken. It is true for the same zone. between zones 0.01/G


On Wed, Feb 12, 2014 at 12:17 PM, Russell Bradberry rbradbe...@gmail.comwrote:

 Not when using private IP addresses.  That pricing *ONLY *applies if you
 are using the public interface or EIP/ENI.  If you use the private IP
 addresses there is no cost associated.



 On February 12, 2014 at 3:13:58 PM, William Oberman (
 ober...@civicscience.com //ober...@civicscience.com) wrote:

 Same region, cross zone transfer is $0.01 / GB (see
 http://aws.amazon.com/ec2/pricing/, Data Transfer section).


 On Wed, Feb 12, 2014 at 3:04 PM, Russell Bradberry 
 rbradbe...@gmail.comwrote:

  Cross zone data transfer does not cost any extra money.

  LOCAL_QUORUM = QUORUM if all 6 servers are located in the same logical
 datacenter.

  Ensure your clients are connecting to either the local IP or the AWS
 hostname that is a CNAME to the local ip from within AWS.  If you connect
 to the public IP you will get charged for outbound data transfer.



 On February 12, 2014 at 2:58:07 PM, Yogi Nerella 
 (ynerella...@gmail.com//ynerella...@gmail.com)
 wrote:

  Also, may be you need to check the read consistency to local_quorum,
 otherwise the servers still try to read the data from all other data
 centers.

 I can understand the latency, but I cant understand how it would save
 money?   The amount of data transferred from the AWS server to the client
 should be same no matter where the client is connected?



 On Wed, Feb 12, 2014 at 10:33 AM, Andrey Ilinykh ailin...@gmail.comwrote:

 yes, sure. Taking data from the same zone will reduce latency and save
 you some money.


 On Wed, Feb 12, 2014 at 10:13 AM, Brian Tarbox tar...@cabotresearch.com
  wrote:

 We're running a C* cluster with 6 servers spread across the four
 us-east1 zones.

 We also spread our clients (hundreds of them) across the four zones.

 Currently we give our clients a connection string listing all six
 servers and let C* do its thing.

 This is all working just fine...and we're paying a fair bit in AWS
 transfer costs.  There is a suspicion that this transfer cost is driven by
 us passing data around between our C* servers and clients.

 Would there be any value to trying to get a client to talk to one of
 the C* servers in its own zone?

 I understand (at least partially!) about coordinator nodes and
 replication and know that no matter which server is the coordinator for an
 operation replication may cause bits to get transferred to/from servers in
 other zones.  Having said that...is there a chance that trying to encourage
 a client to initially contact a server in its own zone would help?

 Thank you,

 Brian Tarbox

Re: Clarification on how multi-DC replication works

2014-02-11 Thread Andrey Ilinykh

1. reply part is missing.
2. It is confusing a little bit. I would not use term synchronous.
Everything is asynchronous here. Coordinator writes data to all local nodes
and waits for  response from ANY two of them (in case of quorum). In your
picture it looks like the coordinator first makes decision what nodes
should reply. It is not correct.


On Tue, Feb 11, 2014 at 9:36 AM, Mullen, Robert
robert.mul...@pearson.comwrote:

 So is that picture incorrect, or just incomplete missing the piece on how
 the nodes reply to the coordinator node.


 On Tue, Feb 11, 2014 at 9:38 AM, sankalp kohli kohlisank...@gmail.comwrote:

 @Mullen,
 I think your diagram does not answer the question on responses.
 @Sameer
 All nodes in DC2 will replay back to the co-ordinator in DC1. So if you
 have replication of DC1:3,DC2:3. A co-ordinator node will get 6 responses
 back if it is not in the replica set.
 Hope that answers your question.


 On Tue, Feb 11, 2014 at 8:16 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I had the same question a while back and put together this picture to
 help me understand the flow of data for multi region deployments. Hope that
 it helps.


 On Mon, Feb 10, 2014 at 7:52 PM, Sameer Farooqui sam...@blueplastic.com
  wrote:

 Hi,

 I was hoping someone could clarify a point about multi-DC replication.

 Let's say I have 2 data centers configured with replication factor = 3
 in each DC.

 My client app is sitting in DC 1 and is able to intelligently pick a
 coordinator that will also be a replica partner.

 So the client app sends a write with consistency for DC1 = Q and
 consistency for DC2 = Q to a coordinator node in DC1.

 That coordinator in DC1 forwards the write to 2 other nodes in DC1 and
 a coordinator in DC2.

 Is it correct that all 3 nodes in DC2 will respond back to the original
 coordinator in DC1? Or will the DC2 nodes respond back to the DC2
 coordinator?

 Let's say one of the replica nodes in DC2 is down. Who will hold the
 hint for that node? The original coordinator in DC1 or the coordinator in
 DC2?

Re: Clarification on how multi-DC replication works

2014-02-11 Thread Andrey Ilinykh

On Tue, Feb 11, 2014 at 10:14 AM, Mullen, Robert
robert.mul...@pearson.comwrote:

 Thanks for the feedback.

 The picture shows a sample request, which is why the coordinator points to
 two specific nodes.  What I was trying to convey that the coordinator node
 would ensure that 2 of the 3 nodes were written to before reporting success
 to the client.

This is my point. ANY 2 of 3. Your picture shows specific 2 of 3.




 I found the article here, it says that the non-blocking writes to the 2nd
 data center are asynchronous.  Is this blog post incorrect as well?

 http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers


Why is it incorrect? Everything is asynchronous, both local and remote. The
coordinator simply waits for response from local nodes. But it doesn't make
it synchronous, because it waits for response from ANY 2 nodes.

Re: Adding datacenter for move to vnodes

2014-02-06 Thread Andrey Ilinykh

My understanding is you can't mix vnodes and regular nodes in the same DC.
Is it correct?


On Thu, Feb 6, 2014 at 2:16 PM, Vasileios Vlachos 
vasileiosvlac...@gmail.com wrote:

 Hello,

 My question is why would you need another DC to migrate to Vnodes? How
 about decommissioning each node in turn, changing the cassandra.yaml
 accordingly, delete the data and bring the node back in the cluster and let
 it bootstrap from the others?

 We did that recently with our demo cluster. Is that wrong in any way? The
 only think to take into consideration is the disk space I think. We are not
 using amazon, but I am not sure how would that be different for this
 particular issue.

 Thanks,

 Bill
 On 6 Feb 2014 16:34, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Glad it helps.

 Good luck with this.

 Cheers,

 Alain


 2014-02-06 17:30 GMT+01:00 Katriel Traum katr...@google.com:

 Thank you Alain! That was exactly what I was looking for. I was worried
 I'd have to do a rolling restart to change the snitch.

 Katriel



 On Thu, Feb 6, 2014 at 1:10 PM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi, we did this exact same operation here too, with no issue.

 Contrary to Paulo we did not modify our snitch.

 We simply added a dc_suffix in the property in
 cassandra-rackdc.properties conf file for nodes in the new cluster :

 # Add a suffix to a datacenter name. Used by the Ec2Snitch and
 Ec2MultiRegionSnitch

 # to append a string to the EC2 region name.

 dc_suffix=-xl

 So our new cluster DC is basically : eu-west-xl

 I think this is less risky, at least it is easier to do.

 Hope this help.


 2014-02-02 11:42 GMT+01:00 Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com:

 We had a similar situation and what we did was first migrate the 1.1
 cluster to GossipingPropertyFileSnitch, making sure that for each node we
 specified the correct availability zone as the rack in
 the cassandra-rackdc.properties. In this way,
 the GossipingPropertyFileSnitch is equivalent to the EC2MultiRegionSnitch,
 so the data location does not change and no repair is needed afterwards.
 So, if your nodes are located in the us-east-1e AZ, your 
 cassandra-rackdc.properties
 should look like:

 dc=us-east
 rack=1e

 After this step is complete on all nodes, then you can add a new
 datacenter specifying different dc and rack on the
 cassandra-rackdc.properties of the new DC. Make sure you upgrade your
 initial datacenter to 1.2 before adding a new datacenter with vnodes
 enabled (of course).

 Cheers


 On Sun, Feb 2, 2014 at 6:37 AM, Katriel Traum katr...@google.comwrote:

 Hello list.

 I'm upgrading a 1.1 cassandra cluster to 1.2(.13).
 I've read here and in other places that the best way to migrate to
 vnodes is to add a new DC, with the same amount of nodes, and run rebuild
 on each of them.
 However, I'm faced with the fact that I'm using EC2MultiRegion
 snitch, which automagically creates the DC and RACK.

 Any ideas how I can go about adding a new DC with this kind of setup?
 I need these new machines to be in the same EC2 Region as the current 
 ones,
 so adding to a new Region is not an option.

 TIA,
 Katriel




 --
 *Paulo Motta*

 Chaordic | *Platform*
 *www.chaordic.com.br http://www.chaordic.com.br/*
 +55 48 3232.3200
 +55 83 9690-1314

Re: Question 1: JMX binding, Question 2: Logging

2014-02-04 Thread Andrey Ilinykh

JMX stuff is in /conf/cassandra-env.sh


On Tue, Feb 4, 2014 at 2:25 PM, Kyle Crumpton (kcrumpto) kcrum...@cisco.com
 wrote:

  Hi all,

  I'm fairly new to Cassandra. I'm deploying it to a PaaS. One thing this
 entails is that it must be able to have more than one instance on a single
 node. I'm running into the problem that JMX binds to 0.0.0.0:7199. My
 question is this: Is there a way to configure this? I have actually found
 the post that said to change the the following

 JVM_OPTS=$JVM_OPTS -Djava.rmi.server.hostname=127.1.246.3 where
 127.1.246.3 is the IP I want to bind to..

 This actually did not change the JMX binding by any means for me. I saw a
 post about a jmx listen address in cassandra.yaml and this also did not
 work.
 Any clarity on whether this is bindable at all? Or if there are plans for
 it?

  Also-

  I have logging turned on. For some reason, though, my Cassandra is not
 actually logging as intended. My log folder is actually empty after each
 (failed) run (due to the port being taken by my other cassandra process).

  Here is an actual copy of my log4j-server.properites file:
 http://fpaste.org/74470/15510941/

  Any idea why this might not be logging?

  Thank you and best regards

  Kyle

Re: no more zookeeper?

2014-01-28 Thread Andrey Ilinykh

Why would cassandra use zookeeper?


On Tue, Jan 28, 2014 at 7:18 AM, S Ahmed sahmed1...@gmail.com wrote:

 Does C* no long use zookeeper?

 I don't see a reference to it in the
 https://github.com/apache/cassandra/blob/trunk/build.xml

 If not, what replaced it?

Re: token for agent

2014-01-22 Thread Andrey Ilinykh

No. There is no any special in value 0.


On Wed, Jan 22, 2014 at 1:30 PM, Daniel Curry daniel.cu...@arrayent.comwrote:

   I was wondering how important to have a cluster that has a node with a
 token that begin with a zero for a three node cluster?



 3 NODES
 ---
0 -- Node 1
  56713727820156410577229101238628035242 -- Node 2
 113427455640312821154458202477256070484-- Node 3


 Will this effect the agents for not connecting?

 3 NODES
 ---
 170141183460469231731687303715884105728 -- Node 1
  56713727820156410577229101238628035242  -- Node 2
 113427455640312821154458202477256070484  -- Node 3


 Thank you

 --
 Daniel Curry
 Sr Linux Systems Administrator
 Arrayent, Inc.
 2317 Broadway Street, Suite 20
 Redwood City, CA 94063
 dan...@arrayent.com

Re: Cassandra ring not behaving like a ring

2014-01-15 Thread Andrey Ilinykh

what is the RF? What does nodetool ring show?


On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma
narendra.sha...@gmail.comwrote:

 Sorry for the odd subject but something is wrong with our cassandra ring.
 We have a 9 node ring as below.

 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL

 Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS.

 I added a new node with token that is exactly in middle of N6 and N7. So
 the ring displayed as following
 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N6.5 - UP/JOINING
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL


 I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to
 steam from (worst case) N5, N6, N7, N8. What could potentially cause the
 node to get confused about the ring?

 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com http://www.aeris.com*
 *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*

Re: Read/Write consistency issue

2014-01-10 Thread Andrey Ilinykh

For single thread, consistency ALL it should work. I believe you do
something different. What are these three numbers exactly?
old=60616 val =19 new =60635


On Fri, Jan 10, 2014 at 1:50 PM, Manoj Khangaonkar khangaon...@gmail.comwrote:

 Hi

 Using Cassandra 2.0.0.
 3 node cluster
 Replication 2.
 Using consistency ALL for both read and writes.

 I have a single thread that reads a value, updates it and writes it back
 to the table. The column type is big int. Updating counts for a timestamp.

 With single thread and consistency ALL , I expect no lost updates. But as
 seem from my application log below,

 10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
  old=59614 val =252 new =59866
 10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
  old=59866 val =252 new =60118
 10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
  old=60118 val =255 new =60373
 10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
  old=60373 val =243 new =60616
 10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
  old=60616 val =19 new =60635
 10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
  old=60616 val =233 new =60849

 See the last 2 lines of above log.
 value 60116 is updated to 60635. but the next operation reads the old
 value 60616 again.

 I am not using counter column type because it does not support TTL and i
 hear there are lot of open issues with counters.

 Is there anything else I can do to further tighten the consistency or is
 this pattern of high volume read - update - write not going to work in C* ?

 regards
 MJ

 --

Re: Restore with archive commitlog

2013-12-13 Thread Andrey Ilinykh

As someone told you this feature was added by Netflix to work with Priam
(cassandra management tool). Priam itself uses it for several months only,
so I doubt if anybody uses this feature in production. Any way, you can
ping guys working on Priam. This is your best bet.
https://github.com/Netflix/Priam

Let us know if you can figure out how to use it.

Thank you,
  Andrey


On Fri, Dec 13, 2013 at 6:31 AM, Bonnet Jonathan. 
jonathan.bon...@externe.bnpparibas.com wrote:

 Hello,

   As i told you i began to explore restore operations, see my config for
 archive commit logs:

 archive_command=/bin/bash /produits/cassandra/scripts/cassandra-archive.sh
 %path %name

 restore_command=/bin/bash /produits/cassandra/scripts/cassandra-restore.sh
 %from %to

 restore_directories=/produits/cassandra/cassandra_data/archived_commit

 restore_point_in_time=2013:12:11 17:00:00

 My 2 scripts

 cassandra-archive.sh:

 bzip2 --best -k $1
 mv $1.bz2 /produits/cassandra/cassandra_data/archived_commit/$2.bz2


 cassandra-restore.sh:
 cp -f $1 $2
 bzip2 -d $2



 For an example, at 2013:12:11 17:30:00 i had truncate a table which belong
 to a keyspace with no replication on one node, after that i made a nodetool
 flush. So when i restore to 2013:12:11 17:00:00 i expect to have my table
 bein fill up again.

 The node restart with this config correctly, i see my archive commit log
 come back to my commitlogdirectory, seems bizarre to me that these ones
 finish by *.out like CommitLog-3-1386927339271.log.out and not just .log.
 Everything is normal ?

 When i query my table now, this one is still empty. Finaly my restore
 doesn't work and i wonder why ?

 Do i have to make a restore on all nodes ? my keyspace have no replication
 but perhaps restore need same operation on all node.

 I miss something, i don't know.

 Thanks for your help.

Re: efficient way to store 8-bit or 16-bit value?

2013-12-11 Thread Andrey Ilinykh

Column metadata is about 20 bytes. So, there is no big difference if you
save 1 or 4 bytes.

Thank you,
  Andrey


On Wed, Dec 11, 2013 at 2:42 PM, onlinespending onlinespend...@gmail.comwrote:

 What do people recommend I do to store a small binary value in a column?
 I’d rather not simply use a 32-bit int for a single byte value. Can I have
 a one byte blob? Or should I store it as a single character ASCII string? I
 imagine each is going to have the overhead of storing the length (or null
 termination in the case of a string). That overhead may be worse than
 simply using a 32-bit int.

 Also is it possible to partition on a single character or substring of
 characters from a string (or a portion of a blob)? Something like:

 CREATE TABLE test (
 id text,
 value blob,
 PRIMARY KEY (string[0:1])
 )

vnodes on aws

2013-12-05 Thread Andrey Ilinykh

Hello everybody!
We run cassandra 1.1 on ec2 instances. We use three availability zones, the
replication factor is 3 also. NetworkTopologyStrategy guarantees each row
is replicated in all availability zones. So, if we lost one zone quorum
operations still work. We think about to upgrade to 1.2. Virtual nodes are
the main reason. My understanding is - vnodes are distributed randomly, so
their is no way to put every row into all availability zones.  Am I right?
What would be the best way to deploy vnodes across several data centers
(availability zones)?


Thank you,
  Andrey

Re: Minimum row size / minimum data point size

2013-10-03 Thread Andrey Ilinykh

It may help.
https://docs.google.com/spreadsheet/ccc?key=0Atatq_AL3AJwdElwYVhTRk9KZF9WVmtDTDVhY0xPSmc#gid=0


On Thu, Oct 3, 2013 at 1:31 PM, Robert Važan robert.va...@gmail.com wrote:

 I need to store one trillion data points. The data is highly compressible
 down to 1 byte per data point using simple custom compression combined with
 standard dictionary compression. What's the most space-efficient way to
 store the data in Cassandra? How much per-row overhead is there if I store
 one data point per row?

 The data is particularly hard to group. It's a large number of time series
 with highly variable density. That makes it hard to pack subsets of the
 data into meaningful column families / wide rows. Is there a table layout
 scheme that would allow me to approach the 1B per data point without
 forcing me to implement complex abstraction layer on application level?

Re: Why Solandra stores Solr data in Cassandra ? Isn't solr complete solution ?

2013-09-30 Thread Andrey Ilinykh

 Also, be aware that while Cassandra has knobs to allow you to get
 consistent read results (CL=QUORUM), DSE Search does not. If a node drops
 messages for whatever reason, outtage, mutation, etc. its solr indexes will
 be inconsistent with other nodes in its replication group.

 Will repair fix it?

Re: Commit log and data separation on SSD

2013-09-23 Thread Andrey Ilinykh

Actually, many SSD drives show much better performance for sequential write
then random writes, so you may benefit from a separate drive  for commit
logs.


On Mon, Sep 23, 2013 at 11:21 AM, Robert Coli rc...@eventbrite.com wrote:

 On Sun, Sep 22, 2013 at 4:02 PM, Shahryar Sedghi shsed...@gmail.comwrote:

 This my first SSD experience. With  normal disks we separate commit log
 from data. We have 2 SSDs dedicated to Cassandra  I was wondering if we
 gain a better performance  if we put commit  log in one and data in
 another, or just use raid 0 to have both SSDs combined.


 The primary win from separation comes when your head does not move while
 appending. As your SSDs have no heads, you do not get this win.

 =Rob

Re: Recomended storage choice for Cassandra on Amazon m1.xlarge instance

2013-09-03 Thread Andrey Ilinykh

You benefit from putting commit log on separate drive only if this drive is
an isolated spinning device. EC2 ephemeral is a virtual device, so I don't
think it makes sense to put commit log on a separated drive. I would build
raid0 from 4 drives and put everything their. But it would be interesting
to compare different configurations.

Thank you,
   Andrey


On Mon, Sep 2, 2013 at 7:11 PM, Renat Gilfanov gren...@mail.ru wrote:

 Hello,

 I'd like to ask what is the best options of separating commit log and data
 on Amazon m1.xlarge instance, given 4x420 Gb attached storages and EBS
 volume ?

 As far as I understand, the EBS is not the choice and it's recomended to
 use attached storages instead.
 Is it better to combine 4 ephemeral drives in 2 raid0 (or raid1 ?), and
 store data on the first and commit log on the second? Or may be trying
 other combinations like 1 attached storage for commit log, and 3 others
 grouped in raid0 for data?

 Thank you.

Re: Truncate question

2013-08-29 Thread Andrey Ilinykh

No.

Andrey


On Thu, Aug 29, 2013 at 3:48 PM, S C as...@outlook.com wrote:

 Do we have to run nodetool repair or nodetool cleanup after Truncating
 a Column Family?

 Thanks,
 SC

Re: configuring read write quorums in cassandra

2013-08-28 Thread Andrey Ilinykh

What do you mean to change the cassandra read and write quorums? Quorum
is quorum, 2 of 3 for example. What do you want to change?

Andrey


On Wed, Aug 28, 2013 at 1:33 PM, Muntasir Raihan Rahman 
muntasir.rai...@gmail.com wrote:

 Hello,

 Are there any tools (e.g like nodetool) that could be used to change
 the cassandra read and write quorums at run-time?

 In general what is a good strategy of changing these system parameters
 while cassandra is running?

 Any pointers could be useful!

 Thanks
 Muntasir.

Re: configuring read write quorums in cassandra

2013-08-28 Thread Andrey Ilinykh

Each query has its own consistency level. It has nothing to do with
cassandra.


On Wed, Aug 28, 2013 at 1:48 PM, Muntasir Raihan Rahman 
muntasir.rai...@gmail.com wrote:

 Sorry, let me clarify.

 Let us say I set read consistency level = ONE, and I am running the
 system for a while. Then I want to change the read consistency level
 to TWO so that reads now wait to hear back from 2 replicas instead of
 1 before responding to the client. I want to change this consistency
 level, and then continue running the system.

 Does that clarify my query?

 Thanks
 Muntasir.

 On Wed, Aug 28, 2013 at 3:39 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:
  What do you mean to change the cassandra read and write quorums?
 Quorum is
  quorum, 2 of 3 for example. What do you want to change?
 
  Andrey
 
 
  On Wed, Aug 28, 2013 at 1:33 PM, Muntasir Raihan Rahman
  muntasir.rai...@gmail.com wrote:
 
  Hello,
 
  Are there any tools (e.g like nodetool) that could be used to change
  the cassandra read and write quorums at run-time?
 
  In general what is a good strategy of changing these system parameters
  while cassandra is running?
 
  Any pointers could be useful!
 
  Thanks
  Muntasir.
 
 



 --
 Best Regards
 Muntasir Raihan Rahman
 Email: muntasir.rai...@gmail.com
 Department of Computer Science,
 University of Illinois Urbana Champaign,
 3111 Siebel Center,
 201 N. Goodwin Avenue,
 Urbana, IL  61801

Re: Setting up a multi-node cluster

2013-08-28 Thread Andrey Ilinykh

To be sure ports are open try to connect from one node to  another:

telnet node ip 7000

try all ports.

Andrey


On Wed, Aug 28, 2013 at 10:41 PM, Dinesh dinesh.gad...@gmail.com wrote:

 Hi John,

 I had my firewall disabled in both the nodes
 To make sure. I checked it
 # rcSuSEfirewall2 status
 Checking the status of SuSEfirewall2
  unused

 if it's, on it says running

 Please suggest the further steps, where to look and troubleshoot, if you
 have any idea






 On Thu, Aug 29, 2013 at 2:34 AM, John Pyeatt 
 john.pye...@singlewire.comwrote:

 Have you verified that your firewall is configured for the cassandra
 traffic. At the very least you need to make certain the following ports are
 open between nodes: 7000, 7001, 7199, 9160, 61620 and 61621.


 On Wed, Aug 28, 2013 at 12:36 AM, Dinesh dinesh.gad...@gmail.com wrote:

 In my case rpc_address in both the nodes is set to 0.0.0.0 which means
 it listens on all interfaces. it has a larger scope (to listen on all
 localhost, ipv4, hostnames, ipv6 addresses) than providing just the
 hostname/ipv4 addresses

 anyway I initially checked that, but it's the same exception I got in
 this case also




 On Wed, Aug 28, 2013 at 10:40 AM, Naresh Yadav nyadav@gmail.comwrote:


 You would need to configure rpc_address also with hostname/ips on both
 the nodes.

 Naresh

 On Wed, Aug 28, 2013 at 10:15 AM, Dinesh dinesh.gad...@gmail.comwrote:

 Hi,

 I am trying to setup a two node Cassandra cluster

 Able to start the first node, but not seeing the following exception
 while starting the second node

 ERROR 17:31:34,315 Exception encountered during startup
 java.lang.IllegalStateException: Unable to contact any seeds!
 at
 org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:947)
 at
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:716)
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:554)
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:451)
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:447)
 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:490)
 java.lang.IllegalStateException: Unable to contact any seeds!
 at
 org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:947)
 at
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:716)
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:554)
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:451)
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:447)
 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:490)
 Exception encountered during startup: Unable to contact any seeds!
 ERROR 17:31:34,322 Exception in thread
 Thread[StorageServiceShutdownHook,5,main]
 java.lang.NullPointerException
 at
 org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:321)
 at
 org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:370)
 at
 org.apache.cassandra.service.StorageService.access$000(StorageService.java:88)




 =
 My yaml configuration files have these modified


 first node yaml
 ---
 initial_token: -9223372036854775808 # generated this using tokengen
 tool
 seeds: 10.96.19.207 # which is the IP of first node
 listen_address: 10.96.19.207 # which is the IP of first node itself
 rpc_address: 0.0.0.0

 second node yaml
 
 initial_token: 0
 seeds: 10.96.19.207 # which is the IP of first node
 listen_address: 10.96.10.223 # which is the IP of second node
 rpc_address: 0.0.0.0


 ==

 Can anyone please help me what went wrong with my configuration?

 Regards
 Dinesh









 --
 Regards
 Dinesh




 --
 John Pyeatt
 Singlewire Software, LLC
 www.singlewire.com
 --
 608.661.1184
 john.pye...@singlewire.com




 --
 Regards
 Dinesh

Re: Stable Priam version with Cassandra 1.2.5

2013-08-20 Thread Andrey Ilinykh

latest versions of Priam use default properties defined in this file
https://github.com/Netflix/Priam/blob/master/priam/src/main/resources/Priam.properties

you can override all of them in SDB. I had the problem with
priam.cass.startscript
which points to /mnt/cassandra.
Also check tomcat process permissions, it is supposed to modify
cassandra.yaml file. Seed list may be one more issue, they use public IP
instead of names, so they are not accessible unless you open port 7000 to
everyone. Honestly, it was quite painful to use Priam 1.2

Thank you,
Andrey


On Tue, Aug 20, 2013 at 1:50 PM, Suruchi Deodhar 
suruchi.deod...@generalsentiment.com wrote:

 Hi,

 This is more of a Priam question, but asking it in the Cassandra forum
 since many of you may be using Priam to backup data from Cassandra.

 We are planning to migrate to Cassandra 1.2.5 in production. Which is the
 most stable version of Priam which is compatible with Cassandra 1.2.5 and
 is production-ready?

 I am currently testing Priam version 1.2 (from the git branch -
 https://github.com/Netflix/Priam/tree/1.2) with cassandra 1.2.5. I
 followed all the setup instructions from the Priam wiki. I have changed my
 default property priam.cass.home to point to my cassandra installation
 located at /usr/local/cassandra.

 When I launch my auto scale group and log-in to the node, I see that
 cassandra is not up. In the catalina.out, I see a line Couldnt execute the
 task because of /mnt/cassandra/conf/cassandra.yaml (No such file or
 directory). I believe that this is because the default cassandra home is
 /mnt/cassandra in Priam version 1.2. But I have changed the priam.cass.home
 attribute value in PriamProperties domain in SDB. Reading all the attribute
 values for the items in PriamProperties domain in sdb returns correct
 values. Is there something else that I need to change for Priam to locate
 the correct cassandra home directory?

 Further in the catalina.out log, I see the error - Exception -- Status
 Code: 403, AWS Service: Amazon S3, AWS Request ID: ., AWS Error Code:
 AccessDenied, AWS Error Message: Access Denied.
 What other permissions do I need to set on my AWS instance for Priam to
 startup with Cassandra?

 Also, does anyone have a good architecture diagram of Priam with
 Cassandra. It would be really useful to get a sense of how the system
 works. Could not find it anywhere.

 Thanks in advance for your help!

 ~Suruchi

Re: Cassandra nodetool repair question

2013-08-08 Thread Andrey Ilinykh

nodetool repair just triggers repair procedure. You can kill nodetool after
start, it doesn't change anything. To stop repair you have to use nodetool
stop VALIDATION|COMPACTION


Thank you,

  Andrey



On Thu, Aug 8, 2013 at 1:00 PM, Andy Losey and...@addthis.com wrote:

  Afternoon,

 We are noticing nodetool repair processes are not completing after a weeks
 worth of time, and have resulted in some Cassandra nodes having more than
 one process running do to cron scheduled. We are also chasing some
 performance degradation after upgrading all nodes to version 1.2.8 last
 Friday and would like to resolve this multiple repairs running at once
 issue in an effort to troubleshoot our performance issues.

 We'd like to know more about what is happening with the repair option. Is
 there a way to gracefully terminate them or any adverse affect to killing
 the processes we should look out for?

 Thanks,
 --
 Andy L

Re: lots of small nodes vs fewer big nodes

2013-08-07 Thread Andrey Ilinykh

You still have the same amount of RAM, so you cache the same amount of
data. I don't think you gain much here. On the other side, maintenance
procedures (compaction, repair) may hit your 2CPU box. I wouldn't do it.

Thank you,
  Andrey


On Wed, Aug 7, 2013 at 10:24 AM, Paul Ingalls paulinga...@gmail.com wrote:

 Quick question about systems architecture.

 Would it be better to run 5 nodes with 7GB RAM and 4CPU's or 10 nodes with
 3.5GB RAM and 2CPUS?

 I'm currently running the former, but am considering the latter.  My goal
 would be to improve overall performance by spreading the IO across more
 disks.  My currently cluster has low CPU utilization but does spend a good
 amount of time in iowait.  Would moving to more smaller nodes help with
 that?  Or would I run into trouble with the smaller ram and cpu?

 Thanks!

 Paul

Re: How often to run `nodetool repair`

2013-08-01 Thread Andrey Ilinykh

On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 1, 2013 at 9:35 AM, Carl Lerche m...@carllerche.com wrote:

 I read in the docs that `nodetool repair` should be regularly run unless
 no delete is ever performed. In my app, I never delete, but I heavily use
 the ttl feature. Should repair still be run regularly? Also, does repair
 take less time if it is run regularly? If not, is there a way to
 incrementally run it? It seems that when I do run repair, it takes a long
 time and causes high amounts CPU usage and iowait.


 TTL is effectively DELETE; you need to run a repair once every
 gc_grace_seconds. If you don't, data might un-delete itself.


How is it possible? Every replica has TTL, so it when it expires every
replica has tombstone. I don't see how you can get data with no tombstone.
What do I miss?

Andrey

Re: Recommended data size for Reads/Writes in Cassandra

2013-07-18 Thread Andrey Ilinykh

there is a limit of thrift message ( thrift_max_message_length_in_mb), by
default it is 64m if I'm not mistaken. This is your limit.


On Thu, Jul 18, 2013 at 2:03 PM, hajjat haj...@purdue.edu wrote:

 Hi,

 Is there a recommended data size for Reads/Writes in Cassandra? I tried
 inserting 10 MB objects and the latency I got was pretty high. Also, I was
 never able to insert larger objects (say 50 MB) since Cassandra kept
 crashing when I tried that.

 Here is my experiment setup:
 I used two Large VMs in EC2 within the same data-center. Inserts have ALL
 consistency (strong consistency).  The latencies were as follows:
 Data size:  10 MB   1 MB100 Bytes
 Latency:250ms   50ms8ms

 I've also done the same for two Large VMs across two data-centers. The
 latencies were around:
 Data size:  10 MB   1 MB100 Bytes
 Latency:1200ms  800ms   80ms

 1) Ain't the 10 MB latency extremely high?
 2) Is there a recommended data size to use with Cassandra (e.g., a few
 bytes
 up to 1 MB)?
 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
 anybody know why? I thought the max data size should be up to 2 GB?

 Thanks,
 Mohammad

 PS. Here is my python code I use to insert into Cassandra. I put my
 stopwatch timers around the insert statement:
 fh = open(TEST_FILE,'r')
 data = str(fh.read())

 POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
 timeout=None)
 USER = ColumnFamily(POOL, 'User')
 USER.insert('Ali', {'data':

 data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)




 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.

Re: what happen if coordinator node fails during write

2013-06-25 Thread Andrey Ilinykh

It depends on cassandra version. As far as I know in 1.2 coordinator logs
request before it updates replicas. If it fails it will replay log on
startup.
In 1.1 you may have inconsistant state, because only part of your request
is propagated to replicas.

Thank you,
  Andrey


On Tue, Jun 25, 2013 at 5:11 PM, Jiaan Zeng ji...@bloomreach.com wrote:

 Hi there,

 I am writing data to Cassandra by thrift client (not hector) and
 wonder what happen if the coordinator node fails. The same question
 applies for bulk loader which uses gossip protocol instead of thrift
 protocol. In my understanding, the HintedHandoff only takes care of
 the replica node fails.

 Thanks.

 --
 Regards,
 Jiaan

Re: Gossiper in Cassandra using unicast/broadcast/multicast ?

2013-06-20 Thread Andrey Ilinykh

Cassandra works very well in EC2 environment. EC2 doesn't support
broadcast/multicast. So, you should be fine.

Thank you,
  Andrey


On Thu, Jun 20, 2013 at 7:22 PM, Jason Tang ares.t...@gmail.com wrote:

 Hi

We are considering using Cassandra in virtualization environment. I
 wonder is Cassandra using unicast/broadcast/multicast for node discover or
 communication?

   From the code, I find the broadcast address is used for heartbeat in
 Gossiper.java, but I don't know how actually it works when node
 communication and when node start up (not for new node added in)

 BRs

Re: Cleanup understastanding

2013-05-28 Thread Andrey Ilinykh

cleanup removes data which doesn't belong to the current node. You have to
run it only if you move (or add new) nodes. In your case there is no any
reason to do it.


On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar 
vhmoli...@gmail.com wrote:

 Hello everyone.
 I have a daily maintenance task at c* which does:

 -truncate cfs
 -clearsnapshots
 -repair
 -cleanup

 The reason I need to clean things is that I wont need most of my inserted
 data on the next day. It's kind a business requirement.

 Well,  the problem I'm running to, is the misunderstanding about cleanup
 operation.
 I have 2 nodes with lower than half usage of disk, which is moreless 13GB;

 But, the last few days, arbitrarily each node have reported me a cleanup
 error indicating that the disk was full. Which is not true.

 *Error occured during cleanup*
 *java.util.concurrent.ExecutionException: java.io.IOException: disk full*


 So I'd like to know more about what does happens in a cleanup operation.
 Appreciate any help.

Re: Usage of getKeyRange method

2013-05-24 Thread Andrey Ilinykh

you can specify startKey/endKey only if you use ByteOrederedPartitioner. In
this case startToken/endToken are null.
I guess (but not sure) with RandomPartitioner you have to specify
startToken/endToken, keys are null then.

Thank you,
  Andrey


On Fri, May 24, 2013 at 6:53 AM, Renato Marroquín Mogrovejo 
renatoj.marroq...@gmail.com wrote:

 Hi all,

 I am trying to migrate some some Hector's RangeSlicesQuery to Astyanax,
 but the only method I have found is getKeyRange[1] which in turn has four
 parameters  startKey,endKey, startToken,  endToken, and count.
 The thing is that I am not sure what are the startToken and endToken
 parameters used for if I am already controlling the range with the startKey
 and endKey  parameters. Could anyone show me some light on this manner
 please?
 Thanks in advance.


 Renato M.

 [1]
 http://www.srcrr.com/java/astyanax/1.0.5/reference/com/netflix/astyanax/thrift/ThriftColumnFamilyQueryImpl-source.html

Re: cfhistograms

2013-03-25 Thread Andrey Ilinykh

What I don't understand hete is Row Size column. Why is it  always 0?

Thank you,
   Andrey


On Mon, Mar 25, 2013 at 9:36 AM, Brian Tarbox tar...@cabotresearch.comwrote:

 I think we all go through this learning curve.  Here is the answer I gave
 last time this question was asked:

 The output of this command seems to make no sense unless I think of it as
 5 completely separate histograms that just happen to be displayed
 together.

 Using this example output should I read it as: my reads all took either 1
 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And
 separately I had read latencies of 2, 8,69, etc?

 In other words...each row isn't really a row...i.e. on those 16033 reads
 from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row
 size and 0 column count.  Is that right?

 Offset  SSTables Write Latency  Read Latency  Row Size
  Column Count
 1  16033 00
  0 0
 2303   00
0 1
 3  0 00
  0 0
 4  0 00
  0 0
 5  0 00
  0 0
 6  0 00
  0 0
 7  0 00
  0 0
 8  0 02
  0 0
 10 0 00
  0  6261
 12 0 02
  0   117
 14 0 08
  0 0
 17 0 3   69
  0   255
 20 0 7  163
  0 0
 24 019 1369
  0 0


 On Mon, Mar 25, 2013 at 11:52 AM, Kanwar Sangha kan...@mavenir.comwrote:

  Can someone explain how to read the cfhistograms o/p ?

 ** **

 [root@db4 ~]# nodetool cfhistograms usertable data

 usertable/data histograms

 Offset  SSTables Write Latency  Read Latency  Row
 Size  Column Count

 12857444  4051 0
 0 342711

 26355104 27021 0
0  201313

 32579941 61600 0
 0  130489

 4 374067119286 0
 0 91378

 5   9175210934 0
 0 68548

 6  0321098 0
 0 54479

 7  0476677 0
 0 45427

 8  0734846 0
 0 38814

 10 0   2867967 4
 0 65512

 12 0   536684422
 0 59967

 14 0   691143136
 0 63980

 17 0  10155740   127
 0115714

 20 0   7432318   302
   0138759

 24 0   5231047   969
 0193477

 29 0   2368553  2790
 0209998

 35 0859591  4385
 0204751

 42 0456978  3790
 0214658

 50 0306084  2465
 0151838

 60 0223202  2158
 0 40277

 72 0122906  2896
 0  1735

 ** **

 ** **

 Thanks

 Kanwar

 ** **

Re: Can't replace dead node

2013-03-15 Thread Andrey Ilinykh

-1-Data.db')]
 INFO [main] 2013-03-15 18:22:18,930 ColumnFamilyStore.java (line 679)
Enqueuing flush of Memtable-LocationInfo@744919514(53/66 serialized/live
bytes, 2 ops)
 INFO [FlushWriter:2] 2013-03-15 18:22:18,931 Memtable.java (line 264)
Writing Memtable-LocationInfo@744919514(53/66 serialized/live bytes, 2 ops)
 INFO [FlushWriter:2] 2013-03-15 18:22:18,944 Memtable.java (line 305)
Completed flushing
/var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-hf-5-Data.db
(163 bytes) for commitlog position ReplayPosition(segmentId=1363371648611,
position=1143)
 INFO [main] 2013-03-15 18:22:18,969 StorageService.java (line 1133) Node
ip-10-147-174-27.ec2.internal/10.147.174.27 state jump to normal
 INFO [main] 2013-03-15 18:22:18,969 StorageService.java (line 701)
Bootstrap/Replace/Move completed! Now serving reads.
 INFO [main] 2013-03-15 18:22:19,011 CassandraDaemon.java (line 125)
Binding thrift service to ip-10-147-174-27.ec2.internal/10.147.174.27:9160
 INFO [main] 2013-03-15 18:22:19,015 CassandraDaemon.java (line 134) Using
TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO [main] 2013-03-15 18:22:19,019 CassandraDaemon.java (line 161) Using
synchronous/threadpool thrift server on ip-10-147-174-27.ec2.internal/
10.147.174.27 : 9160
 INFO [Thread-6] 2013-03-15 18:22:19,020 CassandraDaemon.java (line 213)
Listening for thrift clients...
 INFO [CompactionExecutor:3] 2013-03-15 18:22:19,031 CompactionTask.java
(line 230) Compacted to
[/var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-hf-6-Data.db,].
 573 to 468 (~81% of original) bytes for 4 keys at 0.004649MB/s.  Time:
96ms.



On Fri, Mar 8, 2013 at 8:22 AM, aaron morton aa...@thelastpickle.comwrote:

 If it does not have the schema check the logs for errors and ensure it is
 actually part of the cluster.

 You may have better luck with Priam specific questions on
 https://github.com/Netflix/Priam

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 7/03/2013, at 11:11 AM, Andrey Ilinykh ailin...@gmail.com wrote:

 Hello everybody!

 I used to run cassandra 1.1.5 with Priam. To replace dead node priam
 launches cassandra with cassandra.replace_token property. It works smoothly
 with 1.1.5. Couple days ago I moved to 1.1.10 and have a problem now. New
 cassandra successfully starts, joins the ring but it doesn't see my
 keyspaces. It doesn't try to stream data from other nodes. I see only
 system keyspace. Any idea what is the difference between 1.1.5 and 1.1.10?
 How am I supposed to replace dead node?

 Thank you,
Andrey

Can't replace dead node

2013-03-07 Thread Andrey Ilinykh

Hello everybody!

I used to run cassandra 1.1.5 with Priam. To replace dead node priam
launches cassandra with cassandra.replace_token property. It works smoothly
with 1.1.5. Couple days ago I moved to 1.1.10 and have a problem now. New
cassandra successfully starts, joins the ring but it doesn't see my
keyspaces. It doesn't try to stream data from other nodes. I see only
system keyspace. Any idea what is the difference between 1.1.5 and 1.1.10?
How am I supposed to replace dead node?

Thank you,
   Andrey

Re: what addresses to use in EC2 cluster (whenever an instance restarts it gets a new private ip)?

2013-02-11 Thread Andrey Ilinykh

You have to use private IPs, but if an instance dies you have to bootstrap
it with replace token flag. If you use EC2 I'd recommend Netflix's Priam
tool. It manages all that stuff, plus you have S3 backup.


Andrey


On Mon, Feb 11, 2013 at 11:35 AM, Brian Tarbox tar...@cabotresearch.comwrote:

 How do I configure my cluster to run in EC2?  In my cassandra.yaml I have
 IP addresses under seed_provider, listen_address and rpc_address.

 I tried setting up my cluster using just the EC2 private addresses but
 when one of my instances failed and I restarted it there was a new private
 address.  Suddenly my cluster thought it have five nodes rather than four.

 Then I tried using Elastic IP addresses (permanent addresses) but it turns
 out you get charged for network traffic between elastic addresses even if
 they are within the cluster.

 So...how do you configure the cluster when the IP addresses can change out
 from under you?

 Thanks.

 Brian Tarbox

Re: Clarification on num_tokens setting

2013-02-05 Thread Andrey Ilinykh

On Tue, Feb 5, 2013 at 12:42 PM, aaron morton aa...@thelastpickle.comwrote:

  With N nodes, the ring is divided into N*num_tokens. Correct?

 There is always num_tokens tokens in the ring.
 Each node has (num_tokens / N) * RF ranges on it.

 That means every node should have the same num_token parameter? In other
words it is cluster wide parameter. Correct?

Thank you,
  Andrey

Re: astyanax connection ring describe discovery

2013-01-25 Thread Andrey Ilinykh

I use astyanax 1.56.18 with cassandra 1.1.5. Everything works as supposed
to. What does ThreadPoolMonitor report?

Andrey


On Fri, Jan 25, 2013 at 10:26 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 IS anyone using astyanax with their cassandra along with TOKEN AWARE as in
 here (cassandra version 1.1.4)

 (see Token Aware section)
 https://github.com/Netflix/astyanax/wiki/Configuration

 We have maxConnsPerHost 20 right now and 3 seeds for our cluster but
 astyanax is not discovering any other nodes

 If so, what version of cassandra and astyanax are you using?

 For now, we had ot add all nodes ot the seeds list instead so it
 distributes amongst all nodes.

 Thanks,
 Dean

Re: Cassandra at Amazon AWS

2013-01-17 Thread Andrey Ilinykh

I'd recommend Priam.

http://techblog.netflix.com/2012/02/announcing-priam.html

Andrey


On Thu, Jan 17, 2013 at 5:44 AM, Adam Venturella aventure...@gmail.comwrote:

 Jared, how do you guys handle data backups for your ephemeral based
 cluster?

 I'm trying to move to ephemeral drives myself, and that was my last
 sticking point; asking how others in the community deal with backup in case
 the VM explodes.



 On Wed, Jan 16, 2013 at 1:21 PM, Jared Biel jared.b...@bolderthinking.com
  wrote:

 We're currently using Cassandra on EC2 at very low scale (a 2 node
 cluster on m1.large instances in two regions.) I don't believe that
 EBS is recommended for performance reasons. Also, it's proven to be
 very unreliable in the past (most of the big/notable AWS outages were
 due to EBS issues.) We've moved 99% of our instances off of EBS.

 As other have said, if you require more space in the future it's easy
 to add more nodes to the cluster. I've found this page
 (http://www.ec2instances.info/) very useful in determining the amount
 of space each instance type has. Note that by default only one
 ephemeral drive is attached and you must specify all ephemeral drives
 that you want to use at launch time. Also, you can create a RAID 0 of
 all local disks to provide maximum speed and space.


 On 16 January 2013 20:42, Marcelo Elias Del Valle mvall...@gmail.com
 wrote:
  Hello,
 
 I am currently using hadoop + cassandra at amazon AWS. Cassandra
 runs on
  EC2 and my hadoop process runs at EMR. For cassandra storage, I am using
  local EC2 EBS disks.
 My system is running fine for my tests, but to me it's not a good
 setup
  for production. I need my system to perform well for specially for
 writes on
  cassandra, but the amount of data could grow really big, taking several
 Tb
  of total storage.
  My first guess was using S3 as a storage and I saw this can be done
 by
  using Cloudian package, but I wouldn't like to become dependent on a
  pre-package solution and I found it's kind of expensive for more than
 100Tb:
  http://www.cloudian.com/pricing.html
  I saw some discussion at internet about using EBS or ephemeral
 disks for
  storage at Amazon too.
 
  My question is: does someone on this list have the same problem as
 me?
  What are you using as solution to Cassandra's storage when running it at
  Amazon AWS?
 
  Any thoughts would be highly appreciatted.
 
  Best regards,
  --
  Marcelo Elias Del Valle
  http://mvalle.com - @mvallebr

Re: Cassandra at Amazon AWS

2013-01-16 Thread Andrey Ilinykh

Storage size is not a problem, you always can add more nodes. Anyway, it is
not recommended to have nodes with more then 500G (compaction, repair take
forever). EC2 m1.large has 800G of ephemeral storage, EC2 m1.xlarge 1.6T.
I'd recommend xlarge, it has 4 CPUs, so maintenance procedures don't affect
performance a lot.

Andrey


On Wed, Jan 16, 2013 at 12:42 PM, Marcelo Elias Del Valle 
mvall...@gmail.com wrote:

 Hello,

I am currently using hadoop + cassandra at amazon AWS. Cassandra runs
 on EC2 and my hadoop process runs at EMR. For cassandra storage, I am using
 local EC2 EBS disks.
My system is running fine for my tests, but to me it's not a good setup
 for production. I need my system to perform well for specially for writes
 on cassandra, but the amount of data could grow really big, taking several
 Tb of total storage.
 My first guess was using S3 as a storage and I saw this can be done by
 using Cloudian package, but I wouldn't like to become dependent on a
 pre-package solution and I found it's kind of expensive for more than
 100Tb: http://www.cloudian.com/pricing.html
 I saw some discussion at internet about using EBS or ephemeral disks
 for storage at Amazon too.

 My question is: does someone on this list have the same problem as me?
 What are you using as solution to Cassandra's storage when running it at
 Amazon AWS?

 Any thoughts would be highly appreciatted.

 Best regards,
 --
 Marcelo Elias Del Valle
 http://mvalle.com - @mvallebr

Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread Andrey Ilinykh

To get column removed you have to meet two requirements
1. column should be expired
2. after that CF gets compacted

I guess your expired columns are propagated to high tier CF, which gets
compacted rarely.
So, you have to wait when high tier CF gets compacted.

Andrey



On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.comwrote:

 On cassandra 1.1.5 with a write heavy workload, we're having problems
 getting rows to be compacted away (removed) even though all columns have
 expired TTL.  We've tried size tiered and now leveled and are seeing the
 same symptom: the data stays around essentially forever.

 Currently we write all columns with a TTL of 72 hours (259200 seconds) and
 expect to add 10 GB of data to this CF per day per node.  Each node
 currently has 73 GB for the affected CF and shows no indications that old
 rows will be removed on their own.

 Why aren't rows being removed?  Below is some data from a sample row which
 should have been removed several days ago but is still around even though
 it has been involved in numerous compactions since being expired.

 $ ./bin/nodetool -h localhost getsstables metrics request_summary
 459fb460-5ace-11e2-9b92-11d67b6163b4

 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

 $ ls -alF
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

 $ ./bin/sstable2json
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x')
 {
 34353966623436302d356163652d313165322d396239322d313164363762363136336234:
 [[app_name,50f21d3d,1357785277207001,d],
 [client_ip,50f21d3d,1357785277207001,d],
 [client_req_id,50f21d3d,1357785277207001,d],
 [mysql_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_duration_us,50f21d3d,1357785277207001,d],
 [mysql_failure_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_success_call_cnt,50f21d3d,1357785277207001,d],
 [req_duration_us,50f21d3d,1357785277207001,d],
 [req_finish_time_us,50f21d3d,1357785277207001,d],
 [req_method,50f21d3d,1357785277207001,d],
 [req_service,50f21d3d,1357785277207001,d],
 [req_start_time_us,50f21d3d,1357785277207001,d],
 [success,50f21d3d,1357785277207001,d]]
 }


 Decoding the column timestamps to shows that the columns were written at
 Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan
 2013 02:34:37 GMT.  The date of the SSTable shows that it was generated on
 Jan 16 which is 3 days after all columns have TTL-ed out.


 The schema shows that gc_grace is set to 0 since this data is write-once,
 read-seldom and is never updated or deleted.

 create column family request_summary
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'UTF8Type'
   and key_validation_class = 'UTF8Type'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and compaction_strategy =
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'NONE'
   and bloom_filter_fp_chance = 1.0
   and compression_options = {'chunk_length_kb' : '64',
 'sstable_compression' :
 'org.apache.cassandra.io.compress.SnappyCompressor'};


 Thanks in advance for help in understanding why rows such as this are not
 removed!

 -Bryan

Re: Last Modified Time Series in cassandra

2012-12-21 Thread Andrey Ilinykh

You can select a column slice (specify time range wich for sure has last
data), but ask cassandra to return only one column. It is latest one. To
have the best performance use reversed sorting order.

Andrey


On Fri, Dec 21, 2012 at 6:40 AM, Ravikumar Govindarajan 
ravikumar.govindara...@gmail.com wrote:

 How do we model a timeseries data in cassandra for last modified time?

 -- ExampleCF
| -- SomeKey = Key
 | -- TimeUUID = Column-Name
 | -- PKID = Column-Value

 -- ExampleReverseIndexCF
| -- SomeKey = Key
 | -- PKID = Column-Name
 | -- TimeUUID = Column-Value

 To correctly reflect last-modified-time, I need to read existing
 timeuuid, delete it and add incoming timeuuid

 Are there alternatives to the above approach, because it looks a bit
 heavy-weight

 --
 Ravi

Re: Too Many Open files error

2012-12-20 Thread Andrey Ilinykh

This bug is fixed in 1.1.5

Andrey


On Thu, Dec 20, 2012 at 12:01 AM, santi kumar santi.ku...@gmail.com wrote:

 While running the nodetool repair , we are running into
 FileNotFoundException with too many open files error. We increased the
 ulimit value to 32768, and still we have seen this issue.

 THe number of files in the data directory is around 29500+.

 If we further increase the limit of ulimt, would it help?

 While tracking the log file for specific file for which it threw the
 FileNotFoundException, observed that it was part of Compaction. Does it
 have any thing to do with it?

 We are using 1.1.4.

Re: Too Many Open files error

2012-12-20 Thread Andrey Ilinykh

On Thu, Dec 20, 2012 at 1:17 AM, santi kumar santi.ku...@gmail.com wrote:

 Can you please give more details about this bug? bug id or something

https://issues.apache.org/jira/browse/CASSANDRA-4571


 Now if I want to upgrade, is there any specific process or best practices.

migration from 1.1.4 to 1.1.5 is straightforward- install 1.1.5, stop 1.1.4
(nodetool drain), start 1.1.5
http://www.datastax.com/docs/1.0/install/upgrading#completing-upgrade

Andrey



 Thanks
 Santi




 On Thu, Dec 20, 2012 at 1:44 PM, Andrey Ilinykh ailin...@gmail.comwrote:

 This bug is fixed in 1.1.5

 Andrey


 On Thu, Dec 20, 2012 at 12:01 AM, santi kumar santi.ku...@gmail.comwrote:

 While running the nodetool repair , we are running into
 FileNotFoundException with too many open files error. We increased the
 ulimit value to 32768, and still we have seen this issue.

 THe number of files in the data directory is around 29500+.

 If we further increase the limit of ulimt, would it help?

 While tracking the log file for specific file for which it threw the
 FileNotFoundException, observed that it was part of Compaction. Does it
 have any thing to do with it?

 We are using 1.1.4.

Re: Partition maintenance

2012-12-18 Thread Andrey Ilinykh

Just make month time stamp a part of row key. Then once a month select old
data, move it and delete.

Andrey


On Tue, Dec 18, 2012 at 8:08 AM, stephen.m.thomp...@wellsfargo.com wrote:

 Hi folks.  Still working through the details of building out a Cassandra
 solution and I have an interesting requirement that I’m not sure how to
 implement in Cassandra:

 ** **

 In our current Oracle world, we have the data for this system partitioned
 by month, and each month the data that are now 18-months old are archived
 to tape/cold storage and then the partition for that month is dropped.  Is
 there a way to do something similar with Cassandra without destroying our
 overall performance?

 ** **

 Thanks in advance,

 Steve

Re: Selecting rows efficiently from a Cassandra CF containing time series data

2012-12-11 Thread Andrey Ilinykh

I would consider to use wide rows. If you add timestamp to your column name
you have naturally sorted data. You can easily select any time range
without any indexes.

Thank you,
  Andrey


On Tue, Dec 11, 2012 at 6:23 AM, Chin Ko cko2...@gmail.com wrote:

 I would like to get some opinions on how to select an incremental range of
 rows efficiently from a Cassandra CF containing time series data.

 Background:
 We have a web application that uses a Cassandra CF as logging storage. We
 insert a row into the CF for every event of each user of the web
 application. The row key is timestamp+userid. The column values are
 unstructured data. We only insert rows but never update or delete any rows
 in the CF.

 Data volume:
 The CF grows by about 0.5 million rows per day. We have a 4 node cluster
 and use the RandomPartitioner to spread the rows across the nodes.

 Requirements:
 There is a need to transfer the Cassandra data to another relational
 database periodically. Due to the large size of the CF, instead of
 truncating the relational table and reloading all rows into it each time,
 we plan to run a job to select the delta rows since the last run and
 insert them into the relational database.

 We would like to have some flexibility in how often the data transfer job
 is done. It may be run several times each day, or it may be not run at all
 on a day.

 Options considered:
 - We are using RandomPartitioner, so range scan by row key is not feasible.
 - Add a secondary index on the timestamp column, but reading rows via
 secondary index still requires an equality condition and does not support
 range scan.
 - Add a secondary index on a column containing the date and hour of the
 timestamp. Iterate each hour between the time job was last run and now.
 Fetch all rows of each hour.

 I would appreciate any ideas of other design options of the Cassandra CF
 to enable extracting the rows efficiently.

 Besides Java, has anyone used any ETL tools to do this kind of delta
 extraction from Cassandra?

 Thanks,
 Chin

Re: Batch mutation streaming

2012-12-07 Thread Andrey Ilinykh

Cassandra uses thrift messages to pass data to and from server. A batch is
just a convenient way to create such message. Nothing happens until you
send this message. Probably, this is what you call close the batch.

Thank you,
  Andrey


On Fri, Dec 7, 2012 at 5:34 AM, Ben Hood 0x6e6...@gmail.com wrote:

 Hi,

 I'd like my app to stream a large number of events into Cassandra that
 originate from the same network input stream. If I create one batch
 mutation, can I just keep appending events to the Cassandra batch until I'm
 done, or are there some practical considerations about doing this (e.g. too
 much stuff buffering up on the client or server side, visibility of the
 data within the batch that hasn't been closed by the client yet)? Barring
 any discussion about atomicity, if I were able to stream a largish source
 into Cassandra, what would happen if the client crashed and didn't close
 the batch? Or is this kind of thing just a normal occurrence that Cassandra
 has to be aware of anyway?

 Cheers,

 Ben

Re: how to take consistant snapshot?

2012-12-07 Thread Andrey Ilinykh

That's right. But when I have incremental backup on each CF gets flushed
independently. I have hot CF which gets flushed every several minutes and
regular CF which gets flushed every hour or so. They have references to
each other and data in sstables is definitely inconsistent.

On Fri, Dec 7, 2012 at 9:28 AM, Tyler Hobbs ty...@datastax.com wrote:

Snapshots trigger a flush first, so data that's currently in the commit
log will be covered by the snapshot.

On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh ailin...@gmail.comwrote:

On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote:

For background

http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups

If you it for a single node then yes there is a chance of inconsistency
across CF's.

If you have mulitple nodes the snashots you take on the later nodes will
help. If you use CL QUOURM for reads you *may* be ok (cannot work it out
quickly.). If you use CL ALL for reads you will be ok. Or you can use
nodetool repair to ensure the data is consistent.

I'm talking about restoring whole cluster, so all nodes are restored
from backup and all of them are inconsistent because they lost data from
commit logs. It doesn't matter what CL I use, some data may be lost.
Cassandra 1.1 supports commit log archiving
http://www.datastax.com/docs/1.1/configuration/commitlog_archiving
I think if I store both flushed sstables and commit logs it should solve
my problem. I'm wondering if someone has any experience with this feature?

Thank you,
Andrey

--
Tyler Hobbs
DataStax http://datastax.com/

Re: how to take consistant snapshot?

2012-12-07 Thread Andrey Ilinykh

Agreed.

On Fri, Dec 7, 2012 at 12:38 PM, Tyler Hobbs ty...@datastax.com wrote:

Right. I don't personally think incremental backup is useful beyond
restoring individual nodes unless none of your data happens to reference
any other rows.

On Fri, Dec 7, 2012 at 11:37 AM, Andrey Ilinykh ailin...@gmail.comwrote:

On Fri, Dec 7, 2012 at 9:28 AM, Tyler Hobbs ty...@datastax.com wrote:

Snapshots trigger a flush first, so data that's currently in the commit
log will be covered by the snapshot.

On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh ailin...@gmail.comwrote:

On Thu, Dec 6, 2012 at 7:34 PM, aaron morton
aa...@thelastpickle.comwrote:

For background

http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups

If you it for a single node then yes there is a chance of
inconsistency across CF's.

If you have mulitple nodes the snashots you take on the later nodes
will help. If you use CL QUOURM for reads you *may* be ok (cannot work it
out quickly.). If you use CL ALL for reads you will be ok. Or you can use
nodetool repair to ensure the data is consistent.

I'm talking about restoring whole cluster, so all nodes are restored
from backup and all of them are inconsistent because they lost data from
commit logs. It doesn't matter what CL I use, some data may be lost.
Cassandra 1.1 supports commit log archiving
http://www.datastax.com/docs/1.1/configuration/commitlog_archiving
I think if I store both flushed sstables and commit logs it should
solve my problem. I'm wondering if someone has any experience with this
feature?

Thank you,
Andrey

--
Tyler Hobbs
DataStax http://datastax.com/

Re: how to take consistant snapshot?

2012-12-06 Thread Andrey Ilinykh

On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote:

 For background


 http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups

 If you it for a single node then yes there is a chance of inconsistency
 across CF's.

 If you have mulitple nodes the snashots you take on the later nodes will
 help. If you use CL QUOURM for reads you *may* be ok (cannot work it out
 quickly.). If you use CL ALL for reads you will be ok. Or you can use
 nodetool repair to ensure the data is consistent.

 I'm talking about restoring whole cluster, so all nodes are restored from
backup and all of them are inconsistent because they lost data  from commit
logs.  It doesn't matter what CL I use, some data may be lost.
Cassandra 1.1 supports commit log archiving
http://www.datastax.com/docs/1.1/configuration/commitlog_archiving
I think if I store both flushed sstables and commit logs it should solve my
problem. I'm wondering if someone has any experience with this feature?

Thank you,
  Andrey

how to take consistant snapshot?

2012-12-05 Thread Andrey Ilinykh

Hello, everybody!
I have production cluster with incremental backup on and I want to clone it
(create test one). I don't understand one thing- each column family gets
flushed (and copied to backup storage) independently. Which means the total
snapshot is inconsistent. If I restore from such snapshot  I have totally
useless system. To be more specific, let's say I have two CF, one serves as
an index for another. Every time I update one CF I update index CF. There
is a good chance that all replicas flush index CF first. Then I move it
into backup storage, restore and get CF which has pointers to
non existent data in another CF. What is a way to avoid this situation?

Thank you,
  Andrey

Re: splitting large sstables

2012-12-03 Thread Andrey Ilinykh

Could you provide more details how to use it? Let's say I already have a
huge sstable. What am i supposed to do to split it?

Thank you,
  Andrey


On Sat, Dec 1, 2012 at 11:29 AM, Radim Kolar h...@filez.com wrote:

 from time to time people ask here for splitting large sstables, here is
 patch doing that

 https://issues.apache.org/**jira/browse/CASSANDRA-4897https://issues.apache.org/jira/browse/CASSANDRA-4897

Re: Java high-level client

2012-11-28 Thread Andrey Ilinykh

+1


On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman
mkjell...@barracuda.comwrote:

 Netflix has a great client

 https://github.com/Netflix/astyanax

Re: Java high-level client

2012-11-28 Thread Andrey Ilinykh

First at all, it is backed by Netflix. They used it production for long
time, so it is pretty solid. Also they have nice tool (Priam) which makes
cassandra cloud (AWS) friendly. This is important for us.

Andrey


On Wed, Nov 28, 2012 at 11:53 AM, Wei Zhu wz1...@yahoo.com wrote:

 We are using Hector now. What is the major advantage of astyanax over
 Hector?

 Thanks.
 -Wei

   --
 *From:* Andrey Ilinykh ailin...@gmail.com
 *To:* user@cassandra.apache.org
 *Sent:* Wednesday, November 28, 2012 9:37 AM

 *Subject:* Re: Java high-level client

 +1


 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman 
 mkjell...@barracuda.com wrote:

 Netflix has a great client

 https://github.com/Netflix/astyanax

Re: Strange delay in query

2012-11-08 Thread Andrey Ilinykh

What is the size of columns? Probably those two are huge.


On Thu, Nov 8, 2012 at 4:01 AM, André Cruz andre.c...@co.sapo.pt wrote:

 On Nov 7, 2012, at 12:15 PM, André Cruz andre.c...@co.sapo.pt wrote:

  This error also happens on my application that uses pycassa, so I don't
 think this is the same bug.

 I have narrowed it down to a slice between two consecutive columns.
 Observe this behaviour using pycassa:

 
 DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
 column_count=2,
 column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
 DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976
 Connection 52905488 (xxx:9160) was checked out from pool 51715344
 DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976
 Connection 52905488 (xxx:9160) was checked in to pool 51715344
 [UUID('13957152-234b-11e2-92bc-e0db550199f4'),
 UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]

 A two column slice took more than 2s to return. If I request the next 2
 column slice:

 
 DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
 column_count=2,
 column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
 DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976
 Connection 52904912 (xxx:9160) was checked out from pool 51715344
 DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976
 Connection 52904912 (xxx:9160) was checked in to pool 51715344
 [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'),
 UUID('a364b028-2449-11e2-8882-e0db550199f4')]

 This takes 20msec... Is there a rational explanation for this different
 behaviour? Is there some threshold that I'm running into? Is there any way
 to obtain more debugging information about this problem?

 Thanks,
 André

Re: problem encrypting keys and data

2012-11-07 Thread Andrey Ilinykh

Honestly, I don't understand what encoding you are talking about. Just
write/read data as a byte array. You will read back exactly you write.

Thank you,
  Andrey


On Wed, Nov 7, 2012 at 1:43 PM, Brian Tarbox tar...@cabotresearch.comwrote:

 We have a requirement to store our data encrypted.
 Our encryption system turns our various strings into byte arrays.  So far
 so good.

 The problem is that the bytes in our byte arrays are sometimes
 negative...but when we look at them in the cassandra-cli (or try
 to programatically retrieve them) the bytes are all positive so we of
 course don't find the expected data.

 We have tried Byte encoding and UTF8 encoding without luck.  In looking at
 the Byte validator in particular I see nothing that ought to care about the
 sign of the bytes, but perhaps I'm missing something.

 Any suggestions would be appreciated, thanks.

 Brian Tarbox

Re: Replication factor and performance questions

2012-11-05 Thread Andrey Ilinykh

You will have one extra hop. Not big deal, actually. And many client
libraries (astyanax for example) are token aware, so they are smart
enough to call the right node.

On Mon, Nov 5, 2012 at 9:12 AM, Oleg Dulin oleg.du...@gmail.com wrote:
 Should be all under 400Gig on each.

 My question is -- is there additional overhead with replicas making requests
 to one another for keys they don't have ? how much of an overhead is that ?


 On 2012-11-05 17:00:37 +, Michael Kjellman said:

 Rule of thumb is to try to keep nodes under 400GB.
 Compactions/Repairs/Move operations etc become a nightmare otherwise. How
 much data do you expect to have on each node? Also depends on caches,
 bloom filters etc

 On 11/5/12 8:57 AM, Oleg Dulin oleg.du...@gmail.com wrote:

 I have 4 nodes at my disposal.

 I can configure them like this:

 1) RF=1, each node has 25% of the data. On random-reads, how big is the
 performance penalty if a node needs to look for data on another replica
 ?

 2) RF=2, each node has 50% of the data. Same question ?



 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/




 'Like' us on Facebook for exclusive content and other resources on all
 Barracuda Networks solutions.

 Visit http://barracudanetworks.com/facebook





 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/

Re: Benifits by adding nodes to the cluster

2012-10-29 Thread Andrey Ilinykh

This is how cassandra scales. More nodes means better performance.

thank you,
  Andrey

On Mon, Oct 29, 2012 at 2:57 PM, Roshan codeva...@gmail.com wrote:
 Hi All

 This may be a silly question, but what kind of benefits we can get by adding
 new nodes to the cluster?

 Some may be high availability. Any others?

 /Roshan



 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Benifits-by-adding-nodes-to-the-cluster-tp7583437.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.

Re: why does my Effective-Ownership and Load from ring give such different answers?

2012-10-19 Thread Andrey Ilinykh

Did you run cleanup?

Andrey

On Fri, Oct 19, 2012 at 10:23 AM, Brian Tarbox tar...@cabotresearch.com wrote:
 I had a two node cluster that I expanded to four nodes.  I ran the token
 generation script and moved all the nodes so that when I run nodetool ring
 each node reports 25% Effective-Ownership.

 However, my load numbers map out to 39%, 30%, 15%, 17%.

 How can that be?

 Thanks.

hadoop consistency level

2012-10-18 Thread Andrey Ilinykh

Hello, everybody!
I'm thinking about running hadoop jobs on the top of the cassandra
cluster. My understanding is - hadoop jobs read data from local nodes
only. Does it mean the consistency level is always ONE?

Thank you,
  Andrey

Re: hadoop consistency level

2012-10-18 Thread Andrey Ilinykh

On Thu, Oct 18, 2012 at 12:00 PM, Michael Kjellman
mkjell...@barracuda.com wrote:
 Unless you have Brisk (however as far as I know there was one fork that got
 it working on 1.0 but nothing for 1.1 and is not being actively maintained
 by Datastax) or go with CFS (which comes with DSE) you are not guaranteed
 all data is on that hadoop node. You can take a look at the forks if
 interested here: https://github.com/riptano/brisk/network but I'd personally
 be afraid to put my eggs in a basket that is certainly not super supported
 anymore.

 job.getConfiguration().set(cassandra.consistencylevel.read, QUORUM);
 should get you started.
This is what I don't understand. With QUORUM you read data from at
least two nodes. If so, you don't benefit from data locality. What's
the point to use hadoop? I can run application on any machine(s) and
iterate through column family. What is the difference?

Thank you,
  Andrey

Re: hadoop consistency level

2012-10-18 Thread Andrey Ilinykh

On Thu, Oct 18, 2012 at 1:24 PM, Michael Kjellman
mkjell...@barracuda.com wrote:
 Well there is *some* data locality, it's just not guaranteed. My
 understanding (and someone correct me if I'm wrong) is that
 ColumnFamilyInputFormat implements InputSplit and the getLocations()
 method.

 http://hadoop.apache.org/docs/mapreduce/current/api/org/apache/hadoop/mapre
 duce/InputSplit.html

 ColumnFamilySplit.java contains logic to do it's best to determine what
 node that particular hadoop node contains the data for that mapper.

But no guarantee local data is in sync with other nodes. Which means
you have CL ONE. If you want CL QUORUM you have to make remote call,
no matter if data is local or not.

Re: hadoop consistency level

2012-10-18 Thread Andrey Ilinykh

On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman
mkjell...@barracuda.com wrote:
 Not sure I understand your question (if there is one..)

 You are more than welcome to do CL ONE and assuming you have hadoop nodes
 in the right places on your ring things could work out very nicely. If you
 need to guarantee that you have all the data in your job then you'll need
 to use QUORUM.

 If you don't specify a CL in your job config it will default to ONE (at
 least that's what my read of the ConfigHelper source for 1.1.6 shows)

I have two questions.
1. I can benefit from data locality (and Hadoop) only with CL ONE. Is
it correct?
2. With CL QUORUM cassandra reads data from all replicas. In this case
Hadoop doesn't give me any  benefits. Application running outside the
cluster has the same performance. Is it correct?

Thank you,
  Andrey

Re: hadoop consistency level

2012-10-18 Thread Andrey Ilinykh

On Thu, Oct 18, 2012 at 2:31 PM, Jeremy Hanna
jeremy.hanna1...@gmail.com wrote:

 On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman
 mkjell...@barracuda.com wrote:
 Not sure I understand your question (if there is one..)

 You are more than welcome to do CL ONE and assuming you have hadoop nodes
 in the right places on your ring things could work out very nicely. If you
 need to guarantee that you have all the data in your job then you'll need
 to use QUORUM.

 If you don't specify a CL in your job config it will default to ONE (at
 least that's what my read of the ConfigHelper source for 1.1.6 shows)

 I have two questions.
 1. I can benefit from data locality (and Hadoop) only with CL ONE. Is
 it correct?

 Yes and at QUORUM it's quasi local.  The job tracker finds out where a range 
 is and sends a task to a replica with the data (local).  In the case of 
 CL.QUORUM (see the Read Path section of 
 http://wiki.apache.org/cassandra/ArchitectureInternals), it will do an actual 
 read of the data on the node closest (local).  Then it will get a digest from 
 other nodes to verify that they have the same data.  So in the case of RF=3 
 and QUORUM, it will read the data on the local node where the task is running 
 and will check the next closest replica for a digest to verify that it is 
 consistent.  Information is sent across the wire and there is the latency of 
 that, but it's not the data that's sent.

 2. With CL QUORUM cassandra reads data from all replicas. In this case
 Hadoop doesn't give me any  benefits. Application running outside the
 cluster has the same performance. Is it correct?

 CL QUORUM does not read data from all replicas.  Applications running outside 
 the cluster have to copy the data from the cluster, a much more copy/network 
 intensive operation than using CL.QUORUM with the built-in Hadoop support.


Thank you very much, guys! I have a much clearer picture now.

Andrey

Re: run repair on each node or every R nodes?

2012-10-17 Thread Andrey Ilinykh


 In my mind it does make sense, and what you're saying is correct. But I read
 that it was better to run repair in each node with a -pr option.

 Alain

Yes, it's correct. Running repair -pr on each node you repair whole
cluster without job duplication.

Andrey

Re: Cassandra nodes loaded unequally

2012-10-17 Thread Andrey Ilinykh

Some of your column families are not fully compacted. But it is pretty
normal, I would not worry about it. Eventually it should happen.

On Wed, Oct 17, 2012 at 1:46 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 I've got the same problem, and other people in the mailing list are
 reporting the same issue.

 I don't know what is happening here.

 RF 2, 2 nodes :

 10.59.21.241eu-west 1b  Up Normal  137.53 GB
 50.00%  0
 10.58.83.109eu-west 1b  Up Normal  102.46 GB
 50.00%  85070591730234615865843651857942052864

 I have no idea how to fix it.

 Alain

 2012/10/17 Ben Kaehne ben.kae...@sirca.org.au

 Nothing unusual.

 All servers are exactly the same. Nothing unusual in the log files. Is
 there any level of logging that I should be turning on?

 Regards,


 On Wed, Oct 17, 2012 at 9:51 AM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 With your environment (3 nodes, RF=3) it is very difficult to get
 uneven load. Each node receives the same number of read/write
 requests. Probably something is wrong on low level, OS or VM. Do you
 see anything unusual in log files?

 Andrey

 On Tue, Oct 16, 2012 at 3:40 PM, Ben Kaehne ben.kae...@sirca.org.au
 wrote:
  Not connecting to the same node every time. Using Hector to ensure an
  even
  distribution of connections accross the cluster.
 
  Regards,
 
  On Sat, Oct 13, 2012 at 4:15 AM, B. Todd Burruss bto...@gmail.com
  wrote:
 
  are you connecting to the same node every time?  if so, spread out
  your connections across the ring
 
  On Fri, Oct 12, 2012 at 1:22 AM, Alexey Zotov
  azo...@griddynamics.com
  wrote:
   Hi Ben,
  
   I suggest you to compare amount of queries for each node. May be the
   problem
   is on the client side.
   Yoy can do that using JMX:
   org.apache.cassandra.db:type=ColumnFamilies,keyspace=YOUR
   KEYSPACE,columnfamily=YOUR CF,ReadCount
   org.apache.cassandra.db:type=ColumnFamilies,keyspace=YOUR
   KEYSPACE,columnfamily=YOUR CF,WriteCount
  
   Also I suggest to check output of nodetool compactionstats.
  
   --
   Alexey
  
  
 
 
 
 
  --
  -Ben




 --
 -Ben

Re: what happens while node is bootstrapping?

2012-10-16 Thread Andrey Ilinykh



 No.  The bootstrapping node will writes for its new range while
 bootstrapping as consistency optimization (more or less), but does not
 contribute to the replication factor or consistency level; all of the
 original replicas for that range still receive writes, serve reads, and are
 the nodes that count for consistency level.  Basically, the bootstrapping
 node has no effect on the existing replicas in terms of RF or CL until the
 bootstrap completes.

I see. So, if I add new nodes to increase number of writes my cluster
can handle I will not see any improvement until bootstrap process
finished, which may take hours. Is it correct?

Thank you,
  Andrey

Re: Is Anti Entropy repair idempotent with respect to transferred data?

2012-10-16 Thread Andrey Ilinykh

 In my experience running repair on some counter data, the size of
 streamed data is much bigger than the cluster could possibly have lost
 messages or would be due to snapshotting at different times.

 I know the data will eventually be in sync on every repair, but I'm
 more interested in whether Cassandra transfers excess data and how to
 minimize this.

 Does any body have insights on this?

The problem is in granularity of Merkle tree. Cassandra sends regions
which have different hash values. It could be much bigger then a
single row.

Andrey

Re: Cassandra nodes loaded unequally

2012-10-16 Thread Andrey Ilinykh

With your environment (3 nodes, RF=3) it is very difficult to get
uneven load. Each node receives the same number of read/write
requests. Probably something is wrong on low level, OS or VM. Do you
see anything unusual in log files?

Andrey

On Tue, Oct 16, 2012 at 3:40 PM, Ben Kaehne ben.kae...@sirca.org.au wrote:
 Not connecting to the same node every time. Using Hector to ensure an even
 distribution of connections accross the cluster.

 Regards,

 On Sat, Oct 13, 2012 at 4:15 AM, B. Todd Burruss bto...@gmail.com wrote:

 are you connecting to the same node every time?  if so, spread out
 your connections across the ring

 On Fri, Oct 12, 2012 at 1:22 AM, Alexey Zotov azo...@griddynamics.com
 wrote:
  Hi Ben,
 
  I suggest you to compare amount of queries for each node. May be the
  problem
  is on the client side.
  Yoy can do that using JMX:
  org.apache.cassandra.db:type=ColumnFamilies,keyspace=YOUR
  KEYSPACE,columnfamily=YOUR CF,ReadCount
  org.apache.cassandra.db:type=ColumnFamilies,keyspace=YOUR
  KEYSPACE,columnfamily=YOUR CF,WriteCount
 
  Also I suggest to check output of nodetool compactionstats.
 
  --
  Alexey
 
 




 --
 -Ben

Re: what happens while node is bootstrapping?

2012-10-15 Thread Andrey Ilinykh

Does it mean that during bootstrapping process only replicas serve
read requests for new node range? In other words, replication factor
is RF-1?

On Mon, Oct 15, 2012 at 12:20 PM, John Lewis lewili...@gmail.com wrote:
 Bootstrapping nodes do not handle reads requests until the bootstrap process 
 is complete.

 JLewis

 On Oct 13, 2012, at 11:19 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 Hello, everybody!
 I'd like to clarify a bootstrapping process. As far as I understand,
 bootstrapping node starts to accept writes immediately.  What about
 reads?
 Bootstrapping node doesn't have all information, only replica nodes
 have. Does it mean read operations with CL ALL may fail during
 bootstrapping process?

 Thank you,
  Andrey

Re: run repair on each node or every R nodes?

2012-10-15 Thread Andrey Ilinykh

Only one region (node-00 is responsible for) will get repaired on all
three nodes.
Andrey
On Mon, Oct 15, 2012 at 11:56 AM, Alexis Midon alexismi...@gmail.com wrote:

 Hi all,

 I have a 9-node cluster with a replication factor R=3. When I run repair -pr
 on node-00, I see the exact same load and activity on node-{01,02}.
 Specifically, compactionstats shows the same Validation tasks.
 Does this mean that all 3 nodes will be repaired when nodetool returns? or
 do I still have to trigger a nodetool-repair on node-{01,02}?

 Thanks,

 Alexis

what happens while node is bootstrapping?

2012-10-14 Thread Andrey Ilinykh

Hello, everybody!
I'd like to clarify a bootstrapping process. As far as I understand,
bootstrapping node starts to accept writes immediately.  What about
reads?
Bootstrapping node doesn't have all information, only replica nodes
have. Does it mean read operations with CL ALL may fail during
bootstrapping process?

Thank you,
  Andrey

Re: Why data is not even distributed.

2012-10-08 Thread Andrey Ilinykh

The problem was - I calculated 3 tokens for random partitioner but
used them with BOP, so nodes were not supposed to be loaded evenly.
That's ok, I got it.
But what I don't understand, why nodetool ring shows equal ownership.
This is an example:
I created small cluster with BOP and three tokens
00



then I put some random data which is nicely distributed:

Address DC  RackStatus State   Load
Effective-Ownership Token

Token(bytes[])
127.0.0.1   datacenter1 rack1   Up Normal  1.92 MB
33.33%  Token(bytes[00])
127.0.0.2   datacenter1 rack1   Up Normal  1.93 MB
33.33%  Token(bytes[])
127.0.0.3   datacenter1 rack1   Up Normal  1.99 MB
33.33%  Token(bytes[])

then I moved node 2 to 0100 and node 3 to 0200. Which
means node 1 owns almost everything.

Address DC  RackStatus State   Load
Effective-Ownership Token

Token(bytes[0200])
127.0.0.1   datacenter1 rack1   Up Normal  5.76 MB
33.33%  Token(bytes[00])
127.0.0.2   datacenter1 rack1   Up Normal  30.37 KB
33.33%  Token(bytes[0100])
127.0.0.3   datacenter1 rack1   Up Normal  25.78 KB
33.33%  Token(bytes[0200])


As you can see all data is located on node 1. But nodetool ring still
shows 33.33% for each node. No matter how I move nodes, it always
gives me 33.33%.

It looks like a bug for me.

Thank you,
  Andrey

Re: what's the most 1.1 stable version?

2012-10-05 Thread Andrey Ilinykh

In 1.1.5 file descriptor leak was fixed. In my case it was critical.
Nodes went down every several days. But not everyone had this problem.

Thank you,
  Andrey

On Fri, Oct 5, 2012 at 7:42 AM, Alexandru Sicoe adsi...@gmail.com wrote:
 Hello,
  We are planning to upgrade from version 1.0.7 to the 1.1 branch. Which is
 the stable version that people are using? I see the latest release is 1.1.5
 but maybe it's not fully wise to use this. Is 1.1.4 the one to use?

 Cheers,
 Alex

Re: Why data is not even distributed.

2012-10-04 Thread Andrey Ilinykh

It was my first thought.
Then I md5 uuid and used the digest as a key:

MessageDigest md = MessageDigest.getInstance(MD5);

//in the loop
UUID uuid = UUID.randomUUID();
byte[] bytes = md.digest(asByteArray(uuid));

the result is exactly the same, first node takes 66%, second 33% and
third one is empty. for some reason rows which should be placed on
third node moved to first one.

Address DC  RackStatus State   Load
Effective-Ownership Token


Token(bytes[56713727820156410577229101238628035242])
127.0.0.1   datacenter1 rack1   Up Normal  7.68 MB
33.33%  Token(bytes[00])
127.0.0.3   datacenter1 rack1   Up Normal  79.17 KB
33.33%
Token(bytes[0113427455640312821154458202477256070485])
127.0.0.2   datacenter1 rack1   Up Normal  3.81 MB
33.33%
Token(bytes[56713727820156410577229101238628035242])



On Thu, Oct 4, 2012 at 12:33 AM, Tom fivemile...@gmail.com wrote:
 Hi Andrey,

 while the data values you generated might be following a true random
 distribution, your row key, UUID, is not (because it is created on the same
 machines by the same software with a certain window of time)

 For example, if you were using the UUID class in Java, these would be
 composed from several components (related to dimensions such as time and
 version), so you can not expect a random distribution over the whole space.


 Cheers
 Tom




 On Wed, Oct 3, 2012 at 5:39 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 Hello, everybody!

 I'm observing very strange behavior. I have 3 node cluster with
 ByteOrderPartitioner. (I run 1.1.5)
 I created a key space with replication factor of 1.
 Then I created one column family and populated it with random data.
 I use UUID as a row key, and Integer as a column name.
 Row keys were generated as

 UUID uuid = UUID.randomUUID();

 I populated about 10 rows with 100 column each.

 I would expect equal load on each node, but the result is totally
 different. This is what nodetool gives me:

 Address DC  RackStatus State   Load
 Effective-Ownership Token


 Token(bytes[56713727820156410577229101238628035242])
 127.0.0.1   datacenter1 rack1   Up Normal  27.61 MB
 33.33%  Token(bytes[00])
 127.0.0.3   datacenter1 rack1   Up Normal  206.47 KB
 33.33%
 Token(bytes[0113427455640312821154458202477256070485])
 127.0.0.2   datacenter1 rack1   Up Normal  13.86 MB
 33.33%
 Token(bytes[56713727820156410577229101238628035242])


 one node (127.0.0.3) is almost empty.
 Any ideas what is wrong?


 Thank you,
   Andrey

Why data is not even distributed.

2012-10-03 Thread Andrey Ilinykh

Hello, everybody!

I'm observing very strange behavior. I have 3 node cluster with
ByteOrderPartitioner. (I run 1.1.5)
I created a key space with replication factor of 1.
Then I created one column family and populated it with random data.
I use UUID as a row key, and Integer as a column name.
Row keys were generated as

UUID uuid = UUID.randomUUID();

I populated about 10 rows with 100 column each.

I would expect equal load on each node, but the result is totally
different. This is what nodetool gives me:

Address DC  RackStatus State   Load
Effective-Ownership Token


Token(bytes[56713727820156410577229101238628035242])
127.0.0.1   datacenter1 rack1   Up Normal  27.61 MB
33.33%  Token(bytes[00])
127.0.0.3   datacenter1 rack1   Up Normal  206.47 KB
33.33%
Token(bytes[0113427455640312821154458202477256070485])
127.0.0.2   datacenter1 rack1   Up Normal  13.86 MB
33.33%
Token(bytes[56713727820156410577229101238628035242])


one node (127.0.0.3) is almost empty.
Any ideas what is wrong?


Thank you,
  Andrey

Re: Why data tripled in size after repair?

2012-10-02 Thread Andrey Ilinykh

On Tue, Oct 2, 2012 at 12:05 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 It's in the 1.1 branch; I don't remember if it went into a release
 yet. If not, it'll be in the next 1.1.x release.

 As the ticket says, this is in since 1.1.1. I don't pretend this is
 well documented, but it's in.

Nope. It is in 1.1.1 only. I run 1.1.5, it doesn't have it.

Re: Why data tripled in size after repair?

2012-09-27 Thread Andrey Ilinykh

On Thu, Sep 27, 2012 at 9:52 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 I don't understand why it copied data twice. In worst case scenario it
 should copy everything (~90G)

 Sadly no, repair is currently peer-to-peer based (there is a ticket to
 fix it: https://issues.apache.org/jira/browse/CASSANDRA-3200, but
 that's not trivial). This mean that you can end up with RF times the
 data after a repair. Obviously that should be a worst case scenario as
 it implies everything is repaired, but at least the triplicate part is
 a problem, but a know and not so easy to fix one.

I see. It explains why I get 85G + 85G instead of 90G. But after next
repair I have six extra files 75G each,
how is it possible? It looks like repair is done per sstable, not CF.
Is it possible?


 Is it possible that each time you've ran repair, one of the node in
 the cluster was very out of sync with the other nodes. Maybe a node
 that has crashed for a long time?

No, nodes go down time to time (OOM), but I restart them
automatically. But my specific is - I have order preserved partitioner
and update intensively every 5th or 10th row.
As far as I understand, because of that when Merklee tree is
calculated, in every range I have several hot rows.  These rows are
good candidates to be inconsistant. There is one thing I don't
understand. Does Merklee tree calculation algorithm use sstables
flushed on hard drive or it uses mem tables also?
Let's say I have hot row which sits in memory in one node but
flushed out in another. Is the any difference in Merklee trees?

Thank you,
  Andrey

Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh

Hello everybody!
I have 3 node cluster with replication factor of 3.
each node has 800G disk and it used to have 100G of data.
What is strange every time I run repair data takes almost 3 times more
- 270G, then I run compaction and get 100G back.
Unfortunately, yesterday I forget to compact and run repair again (at
that moment I had around 270G). As result I have 720G on each node.
I run compaction again and get a lot of warnings like this

WARN [CompactionExecutor:732] 2012-09-26 16:13:00,745
CompactionTask.java (line 84) insufficient space to compact all
requested files

which makes sense, because I'm almost out of disk space.

So, I have two questions.

1. Why repair almost triples data size?

2. How to compact my data back to 100G?

Thank you,
  Andrey

Re: Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh

On Wed, Sep 26, 2012 at 11:07 AM, Rob Coli rc...@palominodb.com wrote:
 On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh ailin...@gmail.com wrote:
 [ repair ballooned my data size ]
 1. Why repair almost triples data size?

 You didn't mention what version of cassandra you're running. In some
 old versions of cassandra (prior to 1.0), repair often creates even
 more extraneous data than it should by design.

Thank you for reply.

I run 1.1.5

Honestly, I don't understand what is going on.

I ran major compaction on Sep 15
as result I had one big sstable and several smalls. This is one biggest:

-rw-rw-r-- 1 ubuntu ubuntu  90G Sep 15 12:56 Bidgely-rawstreams-he-8475-Data.db

On Sep 22 (one week later)I ran repair and get two more sstables:

-rw-rw-r-- 1 ubuntu ubuntu  85G Sep 22 00:41 Bidgely-rawstreams-he-8605-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  86G Sep 22 00:45 Bidgely-rawstreams-he-8606-Data.db

I don't understand why it copied data twice. In worst case scenario it
should copy everything (~90G), but data is triplicates (90G + 85G
+85G).
Yesterday I ran repair one more time, six(!) more big sstables are
added. It does'n make any sense! What do I miss?

-rw-rw-r-- 1 ubuntu ubuntu  75G Sep 26 09:43 Bidgely-rawstreams-he-8785-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  77G Sep 26 09:45 Bidgely-rawstreams-he-8788-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  76G Sep 26 11:54 Bidgely-rawstreams-he-8793-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  75G Sep 26 11:55 Bidgely-rawstreams-he-8797-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  76G Sep 26 14:03 Bidgely-rawstreams-he-8804-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  75G Sep 26 14:03 Bidgely-rawstreams-he-8807-Data.db

Even I somehow compact it back to 100G, I will have the same problem
very soon. What did I do wrong?

Andrey

92 matches

Mail list logo