Dropped Mutation Messages in two DCs at different sites

2017-01-03 Thread Benyi Wang
I need to batch load a lot of data everyday into a keyspace across two DCs, one DC is at west coast and the other is at east coast. I assume that the network delay between two DCs at different sites will cause a lot of dropped mutation messages if I write too fast in LOCAL DC using LOCAL_QUORUM.

Re: Is it safe to change RF in this situation?

2016-09-08 Thread Benyi Wang
). So it should be safe to do. > > When the repair has run you can start with the plan I suggested and run > repairs afterwards. > > Hannu > > On 8 Sep 2016, at 18:01, Benyi Wang <bewang.t...@gmail.com> wrote: > > Thanks. What about this situation: > > * Cha

Re: Is it safe to change RF in this situation?

2016-09-08 Thread Benyi Wang
<hkro...@gmail.com> wrote: > Yep, you can fix it by running repair or even faster by changing the > consistency level to local_quorum and deploying the new version of the app. > > Hannu > > On 8 Sep 2016, at 17:51, Benyi Wang <bewang.t...@gmail.com> wrote: > > Thanks Han

Re: Is it safe to change RF in this situation?

2016-09-08 Thread Benyi Wang
RUM is > effectively ALL in your case (RF=2) if you have just one DC, so you can > change the batch CL later. > > Cheers, > Hannu > > > On 8 Sep 2016, at 14:42, Benyi Wang <bewang.t...@gmail.com> wrote: > > > > * I have a keyspace with RF=2; > >

Is it safe to change RF in this situation?

2016-09-08 Thread Benyi Wang
* I have a keyspace with RF=2; * The client read the table using LOCAL_ONE; * There is a batch job loading data into the tables using ALL. I want to change RF to 3 and both the client and the batch job use LOCAL_QUORUM. My question is "Will the client still read the correct data when the repair

Re: Client Read Latency is too high during repair

2016-08-19 Thread Benyi Wang
Never mind. I found the root cause. This has nothing to do with Cassandra and repair. Some web services called by the client caused the problem. On Fri, Aug 19, 2016 at 11:53 AM, Benyi Wang <bewang.t...@gmail.com> wrote: > I'm using cassandra java driver to access a small cassandr

Client Read Latency is too high during repair

2016-08-19 Thread Benyi Wang
I'm using cassandra java driver to access a small cassandra cluster * The cluster have 3 nodes in DC1 and 3 nodes in DC2 * The keyspace is originally created in DC1 only with RF=2 * The client had good read latency about 40 ms of 99 percentile under 100 requests/sec (measured at the client side)

Re: How to tune Cassandra or Java Driver to get lower latency when there are a lot of writes?

2015-09-25 Thread Benyi Wang
g it inside > something like foreach partition). > > Also you can easily tune up and down the size of those tasks and therefore > batches to minimize harm on the prod system. > > On Sep 24, 2015, at 5:37 PM, Benyi Wang <bewang.t...@gmail.com> wrote: > > I use Spark

How to tune Cassandra or Java Driver to get lower latency when there are a lot of writes?

2015-09-24 Thread Benyi Wang
I have a cassandra cluster provides data to a web service. And there is a daily batch load writing data into the cluster. - Without the batch loading, the service’s Latency 99thPercentile is 3ms. But during the load, it jumps to 90ms. - I checked cassandra keyspace’s

Re: How to tune Cassandra or Java Driver to get lower latency when there are a lot of writes?

2015-09-24 Thread Benyi Wang
- Write to Cassandra. I knew using CQLBulkOutputFormat would be better, but it doesn't supports DELETE. ​ On Thu, Sep 24, 2015 at 1:27 PM, Gerard Maas <gerard.m...@gmail.com> wrote: > How are you loading the data? I mean, what insert method are you using? > > On Thu, Sep 24, 2015 at 9:58

Will virtual nodes have worse performance?

2015-07-15 Thread Benyi Wang
I have a small cluster with 3 nodes and installed Cassandra 2.1.2 from DataStax YUM repository. I knew 2.1.2 is not recommended for production. The problem I observed is: - When I use vnode with num_token=256, the read latency is about 20ms for 50 percentile. - If I disable vnode, the

Re: How to stop nodetool repair in 2.1.2?

2015-04-15 Thread Benyi Wang
It didn't work. I ran the command on all nodes, but I still can see the repair activities. On Wed, Apr 15, 2015 at 3:20 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: nodetool stop *VALIDATION* On Apr 15, 2015 5:16 PM, Benyi Wang bewang.t...@gmail.com wrote: I ran nodetool

How to stop nodetool repair in 2.1.2?

2015-04-15 Thread Benyi Wang
I ran nodetool repair -- keyspace table for a table, and it is still running after 4 days. I knew there is an issue for repair with vnodes https://issues.apache.org/jira/browse/CASSANDRA-5220. My question is how I can kill this sequential repair? I killed the process which I ran the repair

Re: How to stop nodetool repair in 2.1.2?

2015-04-15 Thread Benyi Wang
Using JMX worked. Thanks a lot. On Wed, Apr 15, 2015 at 3:57 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Apr 15, 2015 at 3:30 PM, Benyi Wang bewang.t...@gmail.com wrote: It didn't work. I ran the command on all nodes, but I still can see the repair activities. Your input

Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Benyi Wang
it sound like -pr runs on one node? I'm still don't understand the first range returned by the partitioned for a node? On Mon, Apr 13, 2015 at 1:40 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang bewang.t...@gmail.com wrote: - I need to run compaction

Do I need to run repair and compaction every node?

2015-04-13 Thread Benyi Wang
I read the document for several times, but I still not quite sure how to run repair and compaction. To my understanding, - I need to run compaction one each node, - To repair a table (column family), I only need to run repair on any of nodes. Am I right? Thanks.

Re: Why select returns tombstoned results?

2015-04-01 Thread Benyi Wang
? Are you by any chance running repairs on your data? On Mon, Mar 30, 2015 at 5:39 PM, Benyi Wang bewang.t...@gmail.com wrote: Thanks for replying. In cqlsh, if I change to Quorum (Consistency quorum), sometime the select return the deleted row, sometime not. I have two virtual data centers

Re: Why select returns tombstoned results?

2015-04-01 Thread Benyi Wang
going on (especially if you have an example of the wrong data being returned) is to do a sstable2json on all your tables and simply grep for an example key. On Mon, Mar 30, 2015 at 4:39 PM, Benyi Wang bewang.t...@gmail.com wrote: Thanks for replying. In cqlsh, if I change to Quorum

Why select returns tombstoned results?

2015-03-30 Thread Benyi Wang
Create table tomb_test ( guid text, content text, range text, rank int, id text, cnt int primary key (guid, content, range, rank) ) Sometime I delete the rows using cassandra java driver using this query DELETE FROM tomb_test WHERE guid=? and content=? and range=? in Batch

Re: Why select returns tombstoned results?

2015-03-30 Thread Benyi Wang
results. How many nodes do you have in the cluster and what is the replication factor for the keyspace? On Mon, Mar 30, 2015 at 7:41 PM, Benyi Wang bewang.t...@gmail.com wrote: Create table tomb_test ( guid text, content text, range text, rank int, id text, cnt int

Delete columns

2015-02-27 Thread Benyi Wang
In C* 2.1.2, is there a way you can delete without specifying the row key? create table ( guid text, key1 text, key2 text, data int primary key (guid, key1, key2) ); delete from a_table where key1='kv1' and key2='kv2'; I'm trying to avoid doing like this: * query the table to get

Re: How to bulkload into a specific data center?

2015-01-10 Thread Benyi Wang
On Fri, Jan 9, 2015 at 3:55 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Jan 9, 2015 at 11:38 AM, Benyi Wang bewang.t...@gmail.com wrote: - Is it possible to modify SSTableLoader to allow it access one data center? Even if you only write to nodes in DC A, if you replicate

Re: How to bulkload into a specific data center?

2015-01-09 Thread Benyi Wang
in this case would be 3x more with the bulk loader than with using a traditional cql or thrift based client. On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang bewang.t...@gmail.com wrote: I set up two virtual data centers, one for analytics and one for REST service. The analytics data center

How to bulkload into a specific data center?

2015-01-07 Thread Benyi Wang
I set up two virtual data centers, one for analytics and one for REST service. The analytics data center sits top on Hadoop cluster. I want to bulk load my ETL results into the analytics data center so that the REST service won't have the heavy load. I'm using CQLTableInputFormat in my Spark

Is it possible to delete columns or row using CQLSSTableWriter?

2015-01-07 Thread Benyi Wang
CQLSSTableWriter only accepts an INSERT or UPDATE statement. I'm wondering whether make it accept DELETE statement. I need to update my cassandra table with a lot of data everyday. * I may need to delete a row (given the partition key) * I may need to delete some columns. For example, there are

Is it possible to flush memtable in one virtual center?

2014-12-15 Thread Benyi Wang
We have one ring and two virtual data centers in our Cassandra cluster? one is for Real-Time and the other is for analytics. My questions are: 1. Are there memtables in Analytics Data Center? To my understanding, it is true. 2. Is it possible to flush memtables if exist in Analytics Data

What happens at server side when using SSTableLoader

2014-12-01 Thread Benyi Wang
Is there a page explaining what happens at server side when using SSTableLoader? I'm seeking the answers of the following questions: 1. What's about the existing data in the table? From my test, the data in sstable files will be applied to the existing data. Am I right? - The new