RE: Repairs at scale in Cassandra 2.1.13

2016-09-29 Thread Anubhav Kale
Thanks !

For subrange repairs I have seen two approaches. For our specific requirement, 
we want to do repairs on a small set of keyspaces.


1.   Thrift describe_local_ring(keyspace), parse and get token ranges for a 
given node, split token ranges for given keyspace + table using  
describe_splits_ex, and call nodetool repair subranges

a.   https://github.com/pauloricardomg/cassandra-list-subranges does it 
this way.

2.   Get tokens using nodetool info -T, split those, and call nodetool 
repair with subranges

a.   https://github.com/BrianGallew/cassandra_range_repair does it this way.

Can experts please help me understand the nuances between these APIs and which 
one is better / more efficient ? Since the first one is keyspace aware, I like 
that better since that lets us do repairs on specific keyspaces more 
concretely. I am leaning toward that atm.

Thanks !

From: Paulo Motta [mailto:pauloricard...@gmail.com]
Sent: Wednesday, September 28, 2016 5:16 AM
To: user@cassandra.apache.org
Subject: Re: Repairs at scale in Cassandra 2.1.13

There were a few streaming bugs fixed between 2.1.13 and 2.1.15 (see 
CHANGES.txt for more details), so I'd recommend you to upgrade to 2.1.15 in 
order to avoid having those.

2016-09-28 9:08 GMT-03:00 Alain RODRIGUEZ 
>:
Hi Anubhav,

I’m considering doing subrange repairs 
(https://github.com/BrianGallew/cassandra_range_repair/blob/master/src/range_repair.py)

I used this script a lot, and quite successfully.

An other working option that people are using is:

https://github.com/spotify/cassandra-reaper

Alexander, a coworker integrated an existing UI and made it compatible with 
incremental repairs:

Incremental repairs on Reaper: 
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-that-works
UI integration with incremental repairs on Reaper: 
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui

as I’ve heard from folks that incremental repairs simply don’t work even in 3.x 
(Yeah, that’s a strong statement but I heard that from multiple folks at the 
Summit).

Alexander also did a talk about repairs at the Summit (including incremental 
repairs) and someone from Netflix also did a good one as well, not mentioning 
incremental repairs but with some benchmarks and tips to run repairs. You might 
want to check one of those (or both):

https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk

I believe they haven't been released by Datastax yet, they probably will 
sometime soon.

Repair is something all the large setups companies are struggling with, I mean, 
Spotify made the Reaper and Netflix a talk about repairs presenting the 
range_repair.py script and much more stuff. But I know there is some work going 
on to improve things.

Meanwhile, given the load per node (600 GB, it's big but not that huge) and the 
number of node (400 is quite a high number of nodes), I would say that the 
hardest part for you would be to handle the scheduling part to avoid harming 
the cluster and make sure all the nodes are repaired. I believe Reaper might be 
a better match in your case as it does that quite well from what I heard, I am 
not really sure.

C*heers,
---
Alain Rodriguez - @arodream - 
al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting

Re: Way to write to dc1 but keep data only in dc2

2016-09-29 Thread Edward Capriolo
You can do something like this, though your use of terminology like "queue"
really do not apply.

You can setup your keyspace with replication in only one data center.

CREATE KEYSPACE NTSkeyspace WITH REPLICATION = { 'class' :
'NetworkTopologyStrategy', 'dc2' : 3 };

This will make the NTSkeyspace like only in one data center. You can always
write to any Cassandra node, since they will transparently proxy the writes
to the proper place. You can configure your client to ONLY bind to specific
hosts or data centers/hosts DC1.

You can use a write consistency level like ANY. IF you use a consistency
level like ONE. It will cause the the write to block anyway waiting for
completion on the other datacenter.

Since you mentioned the words "like a queue" I would suggest an alternative
is to writing the data do a distributed commit log like kafka. At that
point you can decouple the write systems either through producer consumer
or through a tool like Kafka's mirror maker.


On Thu, Sep 29, 2016 at 5:24 PM, Dorian Hoxha 
wrote:

> I have dc1 and dc2.
> I want to keep a keyspace only on dc2.
> But I only have my app on dc1.
> And I want to write to dc1 (lower latency) which will not keep data
> locally but just push it to dc2.
> While reading will only work for dc2.
> Since my app is mostly write, my app ~will be faster while not having to
> deploy to the app to dc2 or write directly to dc2 with higher latency.
>
> dc1 would act like a queue or something and just push data + delete
> locally.
>
> Does this make sense ?
>
> Thank You
>


Docs Contribution (was: Re: [RELEASE] Apache Cassandra 3.9 released)

2016-09-29 Thread Michael Shuler
On 09/29/2016 04:08 PM, Dorian Hoxha wrote:
> So how does documentation work? Example: I'm interested in Change Data
> Capture.

The documentation is in-tree, under doc/source, so create a patch and
upload it to a JIRA, just as any source change. :)

The docs on patches do have testing details, so perhaps you might also
add a documentation patch contribution section here that suits what you
are doing a little better for the next person.

http://cassandra.apache.org/doc/latest/development/patches.html

-- 
Kind regards,
Michael


Way to write to dc1 but keep data only in dc2

2016-09-29 Thread Dorian Hoxha
I have dc1 and dc2.
I want to keep a keyspace only on dc2.
But I only have my app on dc1.
And I want to write to dc1 (lower latency) which will not keep data locally
but just push it to dc2.
While reading will only work for dc2.
Since my app is mostly write, my app ~will be faster while not having to
deploy to the app to dc2 or write directly to dc2 with higher latency.

dc1 would act like a queue or something and just push data + delete locally.

Does this make sense ?

Thank You


Re: [RELEASE] Apache Cassandra 3.9 released

2016-09-29 Thread Dorian Hoxha
So how does documentation work? Example: I'm interested in Change Data
Capture.

*I do appreciate the work done.

On Thu, Sep 29, 2016 at 11:02 PM, Michael Shuler 
wrote:

> The Cassandra team is pleased to announce the release of Apache
> Cassandra version 3.9.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a bug fix release[1] on the 3.9 series. As always,
> please pay attention to the release notes[2] and Let us know[3] if you
> were to encounter any problem.
>
> Enjoy!
>
> [1]: (CHANGES.txt) https://goo.gl/SCtmhc
> [2]: (NEWS.txt) https://goo.gl/brKot5
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>


[RELEASE] Apache Cassandra 3.9 released

2016-09-29 Thread Michael Shuler
The Cassandra team is pleased to announce the release of Apache
Cassandra version 3.9.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.9 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/SCtmhc
[2]: (NEWS.txt) https://goo.gl/brKot5
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.8 released

2016-09-29 Thread Michael Shuler
The Cassandra team is pleased to announce the release of Apache
Cassandra version 3.8.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.8 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/QYFPm1
[2]: (NEWS.txt) https://goo.gl/f9y9ZV
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Nodetool repair

2016-09-29 Thread Li, Guangxing
Romain,

I was trying what you mentioned as below:

a. nodetool stop VALIDATION
b. echo run -b org.apache.cassandra.db:type=StorageService
forceTerminateAllRepairSessions | java -jar
/tmp/jmxterm/jmxterm-1.0-alpha-4-uber.jar
-l 127.0.0.1:7199

to stop a seemingly forever-going repair but seeing really odd behavior
with C* 2.0.9. Here is what I did:
1. First, I run 'nodetool tpstats' on all nodes in the cluster and seeing
only one node have 1 active pending AntiEntropySessions. All other nodes do
not have any pending or active AntiEntropySessions.
2. Then I grep 'Repair' on all logs on all nodes and seeing absolutely no
repair related activity in these logs for the past day.
3. Then on the node that has active AntiEntropySessions, I did steps 'a'
and 'b' above. Now all the sudden I start seeing repair activities, on
nodes that did not have pending AntiEntropySessions, I am seeing the
following in their logs:
INFO [NonPeriodicTasks:1] 2016-09-29 17:12:53,469 StreamingRepairTask.java
(line 87) [repair #e80e17d0-8667-11e6-a801-e172d7a67134] streaming task
succeed, returning response to /10.253.2.166
On node 10.253.2.166 which has active pending AntiEntropySessions, I am
seeing the following in the log:
INFO [AntiEntropySessions:136] 2016-09-29 17:03:02,405 RepairSession.java
(line 282) [repair #812dafe0-8666-11e6-a801-e172d7a67134] session completed
successfully

So it seems to me that by doing forceTerminateAllRepairSessions, it
actually 'wakes up' the dormant repair so it goes again. So far, the only
way I can get working to stop a repair is to restart C* node where the
repair command is initiated.

Thanks.

George.

On Fri, Sep 23, 2016 at 6:20 AM, Romain Hardouin 
wrote:

> OK. If you still have issues after setting streaming_socket_timeout_in_ms
> != 0, consider increasing request_timeout_in_ms to a high value, say 1 or 2
> minutes. See comments in https://issues.apache.org/
> jira/browse/CASSANDRA-7904
> Regarding 2.1, be sure to test incremental repair on your data before to
> run it in production ;-)
>
> Romain
>


How to find the reason for mutation drops ??

2016-09-29 Thread James Joseph
I am seeing mutation drops on one of my nodes in the cluster, the load is
low no Gc pauses no wide partitions either, so can i debug what is the
reason for mutation drops ??

i ran nodetool tpstats only one node out of 9  is dropping rest 8 nodes in
the cluster are having 0 mutation drops.

How can i debug this ??


Thanks
James


when taking backups using snapshot if the sstable gets compacted will nodetool snapshot hung ??

2016-09-29 Thread James Joseph
Hai we are taking backups using nodetool snapshots but i occasionally see
that my script pauses  while taking a snapshot of a CF, is this because
when it is taking snapshot does the sstables got compacted to a different
one  so it couldn't find that particular sstable on which it is taking
snapshot so it pauses at that particular CF ??


Thanks
James.


Re: High load on few nodes in a DC.

2016-09-29 Thread Pranay akula
Yes we are using token aware but not shuffling replicas.

On Wed, Sep 21, 2016 at 10:04 AM, Romain Hardouin 
wrote:

> Hi,
>
> Do you shuffle the replicas with TokenAwarePolicy?
> TokenAwarePolicy(LoadBalancingPolicy childPolicy, boolean
> shuffleReplicas)
>
> Best,
>
> Romain
> Le Mardi 20 septembre 2016 15h47, Pranay akula 
> a écrit :
>
>
> I was a able to find the hotspots causing the load,but the size of these
> partitions are in KB and no tombstones and no.of sstables is only 2 what
> else i need to debug to find the reason for high load for some nodes.
>   we are also using unlogged batches is that can be the reason ?? how to
> find which node is serving as a coordinator for un logged batches?? we are
> using token-aware policy.
>
> thanks
>
>
>
> On Mon, Sep 19, 2016 at 12:29 PM, Pranay akula  > wrote:
>
> I was able to see most used partitions but the nodes with less load are
> serving more read and write requests for that particular partitions when
> compared to nodes with high load, how can i find if these nodes are serving
> as co-coordinators for those read and write requests ?? how can i find the
> token range for these particular partitions and which node is the primary
> for these partition ??
>
>
> Thanks
>
> On Mon, Sep 19, 2016 at 11:04 AM, Pranay akula  > wrote:
>
> Hai Jeff,
>
> Thank, we are using RF 3 and cassandra version 2.1.8.
>
> Thanks
> Pranay.
>
> On Mon, Sep 19, 2016 at 10:55 AM, Jeff Jirsa 
> wrote:
>
> Is your replication_factor 2? Or is it 3?  What version are you using?
>
> The most likely answer is some individual partition that’s either being
> written/read more than others, or is somehow impacting the cluster (wide
> rows are a natural candidate).
>
> You don’t mention your version, but most modern versions of Cassandra ship
> with ‘nodetool toppartitions’, which will help you identify frequently
> written/read partitions – perhaps you can use that to identify a hotspot
> due to some external behavior (some partition being read thousands of
> times, over and over could certainly drive up load).
>
> -  Jeff
>
> *From: *Pranay akula 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, September 19, 2016 at 7:53 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *High load on few nodes in a DC.
>
> when our cluster was under load  i am seeing  1 or 2 nodes are on more
> load consistently when compared to others in dc i am not seeing any GC
> pauses or wide partitions  is this can be those nodes are continuously
> serving as coordinators ?? how can  i find what is the reason for high load
> on those two nodes ?? We are using Vnode.
>
>
> Thanks
> Pranay.
>
>
>
>
>
>
>


Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

2016-09-29 Thread Alexander Dejanovski
Atul,

our fork has been tested on 2.1 and 3.0.x clusters.
I've just tested with a CCM 3.6 cluster and it worked with no issue.

With Reaper, if you set incremental to false, it'll perform a full subrange
repair with no anticompaction.
You'll see this message in the logs : INFO  [AntiEntropyStage:1] 2016-09-29
16:11:34,950 ActiveRepairService.java:378 - Not a global repair, will not
do anticompaction

If you set incremental to true, it'll perform an incremental repair, one
node at a time, with anticompaction (set Parallelism to Parallel
exclusively with inc repair).

Let me know how it goes.


On Thu, Sep 29, 2016 at 3:06 PM Atul Saroha 
wrote:

> Hi Alexander,
>
> There is compatibility issue raised with spotify/cassandra-reaper for
> cassandra version 3.x.
> Is it comaptible with 3.6 in fork thelastpickle/cassandra-reaper ?
>
> There are some suggestions mentioned by *brstgt* which we can try on our
> side.
>
> On Thu, Sep 29, 2016 at 5:42 PM, Atul Saroha 
> wrote:
>
>> Thanks Alexander.
>>
>> Will look into all these.
>>
>> On Thu, Sep 29, 2016 at 4:39 PM, Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> Atul,
>>>
>>> since you're using 3.6, by default you're running incremental repair,
>>> which doesn't like concurrency very much.
>>> Validation errors are not occurring on a partition or partition range
>>> base, but if you're trying to run both anticompaction and validation
>>> compaction on the same SSTable.
>>>
>>> Like advised to Robert yesterday, and if you want to keep on running
>>> incremental repair, I'd suggest the following :
>>>
>>>- run nodetool tpstats on all nodes in search for running/pending
>>>repair sessions
>>>- If you have some, and to be sure you will avoid conflicts, roll
>>>restart your cluster (all nodes)
>>>- Then, run "nodetool repair" on one node.
>>>- When repair has finished on this node (track messages in the log
>>>and nodetool tpstats), check if other nodes are running anticompactions
>>>- If so, wait until they are over
>>>- If not, move on to the other node
>>>
>>> You should be able to run concurrent incremental compactions on
>>> different tables if you wish to speed up the complete repair of the
>>> cluster, but do not try to repair the same table/full keyspace from two
>>> nodes at the same time.
>>>
>>> If you do not want to keep on using incremental repair, and fallback to
>>> classic full repair, I think the only way in 3.6 to avoid anticompaction
>>> will be to use subrange repair (Paulo mentioned that in 3.x full repair
>>> also triggers anticompaction).
>>>
>>> You have two options here : cassandra_range_repair (
>>> https://github.com/BrianGallew/cassandra_range_repair) and Spotify
>>> Reaper (https://github.com/spotify/cassandra-reaper)
>>>
>>> cassandra_range_repair might scream about subrange + incremental not
>>> being compatible (not sure here), but you can modify the repair_range()
>>> method by adding a --full switch to the command line used to run repair.
>>>
>>> We have a fork of Reaper that handles both full subrange repair and
>>> incremental repair here :
>>> https://github.com/thelastpickle/cassandra-reaper
>>> It comes with a tweaked version of the UI made by Stephan Podkowinski (
>>> https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
>>> interactions to schedule, run and track repair - which adds fields to run
>>> incremental repair (accessible via ...:8080/webui/ in your browser).
>>>
>>> Cheers,
>>>
>>>
>>>
>>> On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha 
>>> wrote:
>>>
 Hi,

 We are not sure whether this issue is linked to that node or not. Our
 application does frequent delete and insert.

 May be our approach is not correct for nodetool repair. Yes, we
 generally fire repair on all boxes at same time. Till now, it was manual
 with default configuration ( command: "nodetool repair").
 Yes, we saw validation error but that is linked to already running
 repair of  same partition on other box for same partition range. We saw
 error validation failed with some ip as repair in already running for the
 same SSTable.
 Just few days back, we had 2 DCs with 3 nodes each and replication was
 also 3. It means all data on each node.

 On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
 a...@thelastpickle.com> wrote:

> Hi Atul,
>
> could you be more specific on how you are running repair ? What's the
> precise command line for that, does it run on several nodes at the same
> time, etc...
> What is your gc_grace_seconds ?
> Do you see errors in your logs that would be linked to repairs
> (Validation failure or failure to create a merkle tree)?
>
> You seem to mention a single node that went down but say the whole
> cluster seem to have zombie data.
> What is the connection you see 

Re: TRUNCATE throws OperationTimedOut randomly

2016-09-29 Thread Romain Hardouin
Hi,
@Edward > In older versions you can not control when this call will 
timeout,truncate_request_timeout_in_ms is available for many years, starting 
from 1.2. Maybe you have another setting parameter in mind?
@GeorgeTry to put cassandra logs in debug
Best,
Romain
 

Le Mercredi 28 septembre 2016 20h31, George Sigletos 
 a écrit :
 

 Even when I set a lower request-timeout in order to trigger a timeout, still 
no WARN or ERROR in the logs

On Wed, Sep 28, 2016 at 8:22 PM, George Sigletos  wrote:

Hi Joaquin,

Unfortunately neither WARN nor ERROR found in the system logs across the 
cluster when executing truncate. Sometimes it executes immediately, other times 
it takes 25 seconds, given that I have connected with --request-timeout=30 
seconds. 

The nodes are a bit busy compacting. On a freshly restarted cluster, truncate 
seems to work without problems.

Some warnings that I see around that time but not exactly when executing 
truncate are:
WARN  [CompactionExecutor:2] 2016-09-28 20:03:29,646 SSTableWriter.java:241 - 
Compacting large partition system/hints:6f2c3b31-4975- 470b-8f91-e706be89a83a 
(133819308 bytes

Kind regards,
George

On Wed, Sep 28, 2016 at 7:54 PM, Joaquin Casares  
wrote:

Hi George,
Try grepping for WARN and ERROR on the system.logs across all nodes when you 
run the command. Could you post any of the recent stacktraces that you see?
Cheers,
Joaquin Casares
ConsultantAustin, TX
Apache Cassandra Consultinghttp://www.thelastpickle.com
On Wed, Sep 28, 2016 at 12:43 PM, George Sigletos  
wrote:

Thanks a lot for your reply.

I understand that truncate is an expensive operation. But throwing a timeout 
while truncating a table that is already empty?

A workaround is to set a high --request-timeout when connecting. Even 20 
seconds is not always enough

Kind regards,
George


On Wed, Sep 28, 2016 at 6:59 PM, Edward Capriolo  wrote:

Truncate does a few things (based on version) 
  truncate takes snapshots  truncate causes a flush
  in very old versions truncate causes a schema migration.

In newer versions like cassandra 3.4 you have this knob.

# How long the coordinator should wait for truncates to complete# (This can be 
much longer, because unless auto_snapshot is disabled# we need to flush first 
so we can snapshot before removing the data.)truncate_request_timeout_in_ms : 
6

In older versions you can not control when this call will timeout, it is fairly 
normal that it does!

On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos  
wrote:

Hello,

I keep executing a TRUNCATE command on an empty table and it throws 
OperationTimedOut randomly:

cassandra@cqlsh> truncate test.mytable;
OperationTimedOut: errors={}, last_host=cassiebeta-01
cassandra@cqlsh> truncate test.mytable;
OperationTimedOut: errors={}, last_host=cassiebeta-01

Having a 3 node cluster running 2.1.14. No connectivity problems. Has anybody 
come across the same error?

Thanks,
George













   

Re: Optimising the data model for reads

2016-09-29 Thread Romain Hardouin
Hi Julian,
The problem with any deletes here is that you can *read* potentially many 
tombstones. I mean you have two concerns: 1. Avoid to read tombstones during a 
query 2. How to evict tombstones as quickly as possible to reclaim disk space   
 The first point is a data model consideration. Generally speaking, to avoid to 
read tombstones we have to think about order. Let's take an example not related 
to your data model: say you have a "updated_at" column, maybe you always want 
to read the newest data (e.g. < 7 days) while oldest ones will be TTL'ed 
(tombstones). If you order your data by "updated_at DESC" (and TTL>7 days and 
there are no manual deletes) you won't read tombstones.
The second point depends on many factors: gc_grace, compaction strategy, 
compaction throughput, number of compactors, IO performances, #CPUs, ...    
Also, with such a data model, you will have unbalance data distribution. What 
if a user has 1,000,000 files or more?You can use a composite partition key to 
avoid that: PRIMARY KEY ((userid, fileid), ...).The data distribution will be 
much better and on top of that you won't read tombstones when a file is deleted 
(because you won't query the partition key at all). *However if you always read 
many files per user, each query will hit many nodes.*You have to decide 
depending on the query pattern, the average/max number of files per user, the 
average/max file size, etc.
Regarding the compaction strategy, LCS is good for read heavy workload but you 
need good disk IO and enough CPUs/vCPUs (watch out if your write workload is 
quite heavy).The LCS will compact frequently so, *if tombstones are evictable*, 
they will be evicted faster that with STCS.As you mentioned, you have 10 days 
of gc_grace so you might consider to lower this value if maintenance repair are 
running in few hours/days.
LCS is doing a good job with updates and that gives me an idea: what about soft 
deletes? A clustering column "status int" could do the trick. Let's say 
1=>"live file", 2=>"to delete".When a user deletes a file, you set the "status" 
to 2 and write the userid and fileid in a table "files_to_delete" (the 
partition key can be the date of the day if there are not millions of deletion 
per day). Then a batch job can run during off-peak hours to delete i.e. add a 
tombstone on files to delete.In read queries you would have to add "WHERE 
status = 1 AND ...". Again it's just an idea that crosses my mind, I never 
tested this model, but maybe you can think about it. The bonus is that you can 
"undeleted" a file as long as the batch job has not been triggered.
Best,
Romain 

Le Jeudi 29 septembre 2016 11h31, Thomas Julian  a 
écrit :
 

 Hello,

I have created a column family for User File Management.
CREATE TABLE "UserFile" ("USERID" bigint,"FILEID" text,"FILETYPE" 
int,"FOLDER_UID" text,"FILEPATHINFO" text,"JSONCOLUMN" text,PRIMARY KEY 
("USERID","FILEID"));

Sample Entry

(4*003, 3f9**6a1, null, 2 , 
[{"FOLDER_TYPE":"-1","UID":"1","FOLDER":"\"HOME\""}] 
,{"filename":"untitled","size":1,"kind":-1,"where":""})


Queries :

Select "USERID","FILEID","FILETYPE","FOLDER_UID","JSONCOLUMN" from "UserFile" 
where "USERID"= and "FILEID" in (,,...)

Select "USERID","FILEID","FILEPATHINFO" from "UserFile" where "USERID"= 
and "FILEID" in (,,...) 

This column family was perfectly working in our lab. I was able to fetch the 
results for the queries stated at less than 10ms. I deployed this in 
production(Cassandra 2.1.13), It was working perfectly for a month or two. But 
now at times the queries are taking 5s to 10s. On analysing further, I found 
that few users are deleting the files too frequently. This generates too many 
tombstones. I have set the gc_grace_seconds to the default 10 days and I have 
chosen SizeTieredCompactionStrategy. I want to optimise this Data Model for 
read efficiency. 

Any help is much appreciated.

Best Regards,
Julian.




   

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

2016-09-29 Thread Atul Saroha
Hi Alexander,

There is compatibility issue raised with spotify/cassandra-reaper for
cassandra version 3.x.
Is it comaptible with 3.6 in fork thelastpickle/cassandra-reaper ?

There are some suggestions mentioned by *brstgt* which we can try on our
side.

On Thu, Sep 29, 2016 at 5:42 PM, Atul Saroha 
wrote:

> Thanks Alexander.
>
> Will look into all these.
>
> On Thu, Sep 29, 2016 at 4:39 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Atul,
>>
>> since you're using 3.6, by default you're running incremental repair,
>> which doesn't like concurrency very much.
>> Validation errors are not occurring on a partition or partition range
>> base, but if you're trying to run both anticompaction and validation
>> compaction on the same SSTable.
>>
>> Like advised to Robert yesterday, and if you want to keep on running
>> incremental repair, I'd suggest the following :
>>
>>- run nodetool tpstats on all nodes in search for running/pending
>>repair sessions
>>- If you have some, and to be sure you will avoid conflicts, roll
>>restart your cluster (all nodes)
>>- Then, run "nodetool repair" on one node.
>>- When repair has finished on this node (track messages in the log
>>and nodetool tpstats), check if other nodes are running anticompactions
>>- If so, wait until they are over
>>- If not, move on to the other node
>>
>> You should be able to run concurrent incremental compactions on different
>> tables if you wish to speed up the complete repair of the cluster, but do
>> not try to repair the same table/full keyspace from two nodes at the same
>> time.
>>
>> If you do not want to keep on using incremental repair, and fallback to
>> classic full repair, I think the only way in 3.6 to avoid anticompaction
>> will be to use subrange repair (Paulo mentioned that in 3.x full repair
>> also triggers anticompaction).
>>
>> You have two options here : cassandra_range_repair (
>> https://github.com/BrianGallew/cassandra_range_repair) and Spotify
>> Reaper (https://github.com/spotify/cassandra-reaper)
>>
>> cassandra_range_repair might scream about subrange + incremental not
>> being compatible (not sure here), but you can modify the repair_range()
>> method by adding a --full switch to the command line used to run repair.
>>
>> We have a fork of Reaper that handles both full subrange repair and
>> incremental repair here : https://github.com/thelastpi
>> ckle/cassandra-reaper
>> It comes with a tweaked version of the UI made by Stephan Podkowinski (
>> https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
>> interactions to schedule, run and track repair - which adds fields to run
>> incremental repair (accessible via ...:8080/webui/ in your browser).
>>
>> Cheers,
>>
>>
>>
>> On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha 
>> wrote:
>>
>>> Hi,
>>>
>>> We are not sure whether this issue is linked to that node or not. Our
>>> application does frequent delete and insert.
>>>
>>> May be our approach is not correct for nodetool repair. Yes, we
>>> generally fire repair on all boxes at same time. Till now, it was manual
>>> with default configuration ( command: "nodetool repair").
>>> Yes, we saw validation error but that is linked to already running
>>> repair of  same partition on other box for same partition range. We saw
>>> error validation failed with some ip as repair in already running for the
>>> same SSTable.
>>> Just few days back, we had 2 DCs with 3 nodes each and replication was
>>> also 3. It means all data on each node.
>>>
>>> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
>>> a...@thelastpickle.com> wrote:
>>>
 Hi Atul,

 could you be more specific on how you are running repair ? What's the
 precise command line for that, does it run on several nodes at the same
 time, etc...
 What is your gc_grace_seconds ?
 Do you see errors in your logs that would be linked to repairs
 (Validation failure or failure to create a merkle tree)?

 You seem to mention a single node that went down but say the whole
 cluster seem to have zombie data.
 What is the connection you see between the node that went down and the
 fact that deleted data comes back to life ?
 What is your strategy for cyclic maintenance repair (schedule, command
 line or tool, etc...) ?

 Thanks,

 On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha 
 wrote:

> Hi,
>
> We have seen a weird behaviour in cassandra 3.6.
> Once our node was went down more than 10 hrs. After that, we had ran
> Nodetool repair multiple times. But tombstone are not getting sync 
> properly
> over the cluster. On day- today basis, on expiry of every grace period,
> deleted records start surfacing again in cassandra.
>
> It seems Nodetool repair in not syncing tomebstone across cluster.
> FYI, we have 3 data centres 

Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

2016-09-29 Thread Robert Sicoie
Thanks Alexander,

After roll restart the blocked repair job stopped and I was able to run
repair again.

Regards,
Robert

Robert Sicoie

On Wed, Sep 28, 2016 at 6:46 PM, Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> Robert,
>
> You can restart them in any order, that doesn't make a difference afaik.
>
> Cheers
>
> Le mer. 28 sept. 2016 17:10, Robert Sicoie  a
> écrit :
>
>> Thanks Alexander,
>>
>> Yes, with tpstats I can see the hanging active repair(s) (output
>> attached). For one there are 31 pending repair. On others there are less
>> pending repairs (min 12). Is there any recomandation for the restart order?
>> The one with more less pending repairs first, perhaps?
>>
>> Thanks,
>> Robert
>>
>> Robert Sicoie
>>
>> On Wed, Sep 28, 2016 at 5:35 PM, Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> They will show up in nodetool compactionstats :
>>> https://issues.apache.org/jira/browse/CASSANDRA-9098
>>>
>>> Did you check nodetool tpstats to see if you didn't have any running
>>> repair session ?
>>> Just to make sure (and if you can actually do it), roll restart the
>>> cluster and try again. Repair sessions can get sticky sometimes.
>>>
>>> On Wed, Sep 28, 2016 at 4:23 PM Robert Sicoie 
>>> wrote:
>>>
 I am using nodetool compactionstats to check for pending compactions
 and it shows me 0 pending on all nodes, seconds before running nodetool
 repair.
 I am also monitoring PendingCompactions on jmx.

 Is there other way I can find out if is there any anticompaction
 running on any node?

 Thanks a lot,
 Robert

 Robert Sicoie

 On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski <
 a...@thelastpickle.com> wrote:

> Robert,
>
> you need to make sure you have no repair session currently running on
> your cluster, and no anticompaction.
> I'd recommend doing a rolling restart in order to stop all running
> repair for sure, then start the process again, node by node, checking that
> no anticompaction is running before moving from one node to the other.
>
> Please do not use the -pr switch as it is both useless (token ranges
> are repaired only once with inc repair, whatever the replication factor)
> and harmful as all anticompactions won't be executed (you'll still have
> sstables marked as unrepaired even if the process has ran entirely with no
> error).
>
> Let us know how that goes.
>
> Cheers,
>
> On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie 
> wrote:
>
>> Thanks Alexander,
>>
>> Now I started to run the repair with -pr arg and with keyspace and
>> table args.
>> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288
>> RepairRunnable.java:246 - Repair session 
>> 89af4d10-856f-11e6-b28f-df99132d7979
>> for range [(8323429577695061526,8326640819362122791],
>> ..., (4212695343340915405,4229348077081465596]]] Validation failed
>> in /10.45.113.88"
>>
>> for one of the tables. 10.45.113.88 is the ip of the machine I am
>> running the nodetool on.
>> I'm wondering if this is normal...
>>
>> Thanks,
>> Robert
>>
>>
>>
>>
>> Robert Sicoie
>>
>> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> Hi,
>>>
>>> nodetool scrub won't help here, as what you're experiencing is most
>>> likely that one SSTable is going through anticompaction, and then 
>>> another
>>> node is asking for a Merkle tree that involves it.
>>> For understandable reasons, an SSTable cannot be anticompacted and
>>> validation compacted at the same time.
>>>
>>> The solution here is to adjust the repair pressure on your cluster
>>> so that anticompaction can end before you run repair on another node.
>>> You may have a lot of anticompaction to do if you had high volumes
>>> of unrepaired data, which can take a long time depending on several 
>>> factors.
>>>
>>> You can tune your repair process to make sure no anticompaction is
>>> running before launching a new session on another node or you can try my
>>> Reaper fork that handles incremental repair : https://github.com/
>>> adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui
>>> I may have to add a few checks in order to avoid all collisions
>>> between anticompactions and new sessions, but it should be helpful if 
>>> you
>>> struggle with incremental repair.
>>>
>>> In any case, check if your nodes are still anticompacting before
>>> trying to run a new repair session on a node.
>>>
>>> Cheers,
>>>
>>>
>>> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <
>>> robert.sic...@gmail.com> wrote:
>>>
 Hi guys,

 I have a 

Re: WARN Writing large partition for materialized views

2016-09-29 Thread Robert Sicoie
Thanks!

Robert Sicoie

On Thu, Sep 29, 2016 at 12:49 PM, Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> Hi Robert,
>
> Materialized Views are regular C* tables underneath, so based on their PK
> they can generate big partitions.
> It is often advised to keep partition size under 100MB because larger
> partitions are hard to read and compact. They usually put pressure on the
> heap and lead to long GC pauses  + laggy compactions.
> You could possibly OOM while trying to fully read a partition that is way
> too big for your heap.
>
> It is indeed a schema problem and you most likely have to bucket your MV
> in order to split those partitions into smaller chunks. In the case of MV,
> you possibly need to add a bucketing field to the table it relies on (if
> you don't have one already), and add it to the MV partition key.
>
> You should try to use cassandra-stress to test your bucket sizes :
> https://docs.datastax.com/en/cassandra/3.x/cassandra/
> tools/toolsCStress.html
> In your schema definition you can now specify the creation of a MV.
>
> Cheers,
>
>
> On Wed, Sep 28, 2016 at 7:35 PM Robert Sicoie 
> wrote:
>
>> Hi guys,
>>
>> I run a cluster with 5 nodes, cassandra version 3.0.5.
>>
>> I get this warning:
>> 2016-09-28 17:22:18,480 BigTableWriter.java:171 - Writing large
>> partition...
>>
>> for some materialized view. Some have values over 500MB. How this affects
>> performance? What can/should be done? I suppose is a problem in the schema
>> design.
>>
>> Thanks,
>> Robert Sicoie
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

2016-09-29 Thread Atul Saroha
Thanks Alexander.

Will look into all these.

On Thu, Sep 29, 2016 at 4:39 PM, Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> Atul,
>
> since you're using 3.6, by default you're running incremental repair,
> which doesn't like concurrency very much.
> Validation errors are not occurring on a partition or partition range
> base, but if you're trying to run both anticompaction and validation
> compaction on the same SSTable.
>
> Like advised to Robert yesterday, and if you want to keep on running
> incremental repair, I'd suggest the following :
>
>- run nodetool tpstats on all nodes in search for running/pending
>repair sessions
>- If you have some, and to be sure you will avoid conflicts, roll
>restart your cluster (all nodes)
>- Then, run "nodetool repair" on one node.
>- When repair has finished on this node (track messages in the log and
>nodetool tpstats), check if other nodes are running anticompactions
>- If so, wait until they are over
>- If not, move on to the other node
>
> You should be able to run concurrent incremental compactions on different
> tables if you wish to speed up the complete repair of the cluster, but do
> not try to repair the same table/full keyspace from two nodes at the same
> time.
>
> If you do not want to keep on using incremental repair, and fallback to
> classic full repair, I think the only way in 3.6 to avoid anticompaction
> will be to use subrange repair (Paulo mentioned that in 3.x full repair
> also triggers anticompaction).
>
> You have two options here : cassandra_range_repair (https://github.com/
> BrianGallew/cassandra_range_repair) and Spotify Reaper (
> https://github.com/spotify/cassandra-reaper)
>
> cassandra_range_repair might scream about subrange + incremental not being
> compatible (not sure here), but you can modify the repair_range() method
> by adding a --full switch to the command line used to run repair.
>
> We have a fork of Reaper that handles both full subrange repair and
> incremental repair here : https://github.com/
> thelastpickle/cassandra-reaper
> It comes with a tweaked version of the UI made by Stephan Podkowinski (
> https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
> interactions to schedule, run and track repair - which adds fields to run
> incremental repair (accessible via ...:8080/webui/ in your browser).
>
> Cheers,
>
>
>
> On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha 
> wrote:
>
>> Hi,
>>
>> We are not sure whether this issue is linked to that node or not. Our
>> application does frequent delete and insert.
>>
>> May be our approach is not correct for nodetool repair. Yes, we generally
>> fire repair on all boxes at same time. Till now, it was manual with default
>> configuration ( command: "nodetool repair").
>> Yes, we saw validation error but that is linked to already running repair
>> of  same partition on other box for same partition range. We saw error
>> validation failed with some ip as repair in already running for the same
>> SSTable.
>> Just few days back, we had 2 DCs with 3 nodes each and replication was
>> also 3. It means all data on each node.
>>
>> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> Hi Atul,
>>>
>>> could you be more specific on how you are running repair ? What's the
>>> precise command line for that, does it run on several nodes at the same
>>> time, etc...
>>> What is your gc_grace_seconds ?
>>> Do you see errors in your logs that would be linked to repairs
>>> (Validation failure or failure to create a merkle tree)?
>>>
>>> You seem to mention a single node that went down but say the whole
>>> cluster seem to have zombie data.
>>> What is the connection you see between the node that went down and the
>>> fact that deleted data comes back to life ?
>>> What is your strategy for cyclic maintenance repair (schedule, command
>>> line or tool, etc...) ?
>>>
>>> Thanks,
>>>
>>> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha 
>>> wrote:
>>>
 Hi,

 We have seen a weird behaviour in cassandra 3.6.
 Once our node was went down more than 10 hrs. After that, we had ran
 Nodetool repair multiple times. But tombstone are not getting sync properly
 over the cluster. On day- today basis, on expiry of every grace period,
 deleted records start surfacing again in cassandra.

 It seems Nodetool repair in not syncing tomebstone across cluster.
 FYI, we have 3 data centres now.

 Just want the help how to verify and debug this issue. Help will be
 appreciated.


 --
 Regards,
 Atul Saroha

 *Lead Software Engineer | CAMS*

 M: +91 8447784271
 Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
 Udyog Vihar Phase IV,Gurgaon, Haryana, India

 --
>>> -
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache 

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

2016-09-29 Thread Alexander Dejanovski
Atul,

since you're using 3.6, by default you're running incremental repair, which
doesn't like concurrency very much.
Validation errors are not occurring on a partition or partition range base,
but if you're trying to run both anticompaction and validation compaction
on the same SSTable.

Like advised to Robert yesterday, and if you want to keep on running
incremental repair, I'd suggest the following :

   - run nodetool tpstats on all nodes in search for running/pending repair
   sessions
   - If you have some, and to be sure you will avoid conflicts, roll
   restart your cluster (all nodes)
   - Then, run "nodetool repair" on one node.
   - When repair has finished on this node (track messages in the log and
   nodetool tpstats), check if other nodes are running anticompactions
   - If so, wait until they are over
   - If not, move on to the other node

You should be able to run concurrent incremental compactions on different
tables if you wish to speed up the complete repair of the cluster, but do
not try to repair the same table/full keyspace from two nodes at the same
time.

If you do not want to keep on using incremental repair, and fallback to
classic full repair, I think the only way in 3.6 to avoid anticompaction
will be to use subrange repair (Paulo mentioned that in 3.x full repair
also triggers anticompaction).

You have two options here : cassandra_range_repair (
https://github.com/BrianGallew/cassandra_range_repair) and Spotify Reaper (
https://github.com/spotify/cassandra-reaper)

cassandra_range_repair might scream about subrange + incremental not being
compatible (not sure here), but you can modify the repair_range() method by
adding a --full switch to the command line used to run repair.

We have a fork of Reaper that handles both full subrange repair and
incremental repair here : https://github.com/thelastpickle/cassandra-reaper
It comes with a tweaked version of the UI made by Stephan Podkowinski (
https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
interactions to schedule, run and track repair - which adds fields to run
incremental repair (accessible via ...:8080/webui/ in your browser).

Cheers,



On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha 
wrote:

> Hi,
>
> We are not sure whether this issue is linked to that node or not. Our
> application does frequent delete and insert.
>
> May be our approach is not correct for nodetool repair. Yes, we generally
> fire repair on all boxes at same time. Till now, it was manual with default
> configuration ( command: "nodetool repair").
> Yes, we saw validation error but that is linked to already running repair
> of  same partition on other box for same partition range. We saw error
> validation failed with some ip as repair in already running for the same
> SSTable.
> Just few days back, we had 2 DCs with 3 nodes each and replication was
> also 3. It means all data on each node.
>
> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Atul,
>>
>> could you be more specific on how you are running repair ? What's the
>> precise command line for that, does it run on several nodes at the same
>> time, etc...
>> What is your gc_grace_seconds ?
>> Do you see errors in your logs that would be linked to repairs
>> (Validation failure or failure to create a merkle tree)?
>>
>> You seem to mention a single node that went down but say the whole
>> cluster seem to have zombie data.
>> What is the connection you see between the node that went down and the
>> fact that deleted data comes back to life ?
>> What is your strategy for cyclic maintenance repair (schedule, command
>> line or tool, etc...) ?
>>
>> Thanks,
>>
>> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha 
>> wrote:
>>
>>> Hi,
>>>
>>> We have seen a weird behaviour in cassandra 3.6.
>>> Once our node was went down more than 10 hrs. After that, we had ran
>>> Nodetool repair multiple times. But tombstone are not getting sync properly
>>> over the cluster. On day- today basis, on expiry of every grace period,
>>> deleted records start surfacing again in cassandra.
>>>
>>> It seems Nodetool repair in not syncing tomebstone across cluster.
>>> FYI, we have 3 data centres now.
>>>
>>> Just want the help how to verify and debug this issue. Help will be
>>> appreciated.
>>>
>>>
>>> --
>>> Regards,
>>> Atul Saroha
>>>
>>> *Lead Software Engineer | CAMS*
>>>
>>> M: +91 8447784271
>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>>
>>> --
>> -
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>
>
> --
> Regards,
> Atul Saroha
>
> *Lead Software Engineer | CAMS*
>
> M: +91 8447784271
> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>
> --
-
Alexander Dejanovski
France

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

2016-09-29 Thread Atul Saroha
Hi,

We are not sure whether this issue is linked to that node or not. Our
application does frequent delete and insert.

May be our approach is not correct for nodetool repair. Yes, we generally
fire repair on all boxes at same time. Till now, it was manual with default
configuration ( command: "nodetool repair").
Yes, we saw validation error but that is linked to already running repair
of  same partition on other box for same partition range. We saw error
validation failed with some ip as repair in already running for the same
SSTable.
Just few days back, we had 2 DCs with 3 nodes each and replication was also
3. It means all data on each node.

On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> Hi Atul,
>
> could you be more specific on how you are running repair ? What's the
> precise command line for that, does it run on several nodes at the same
> time, etc...
> What is your gc_grace_seconds ?
> Do you see errors in your logs that would be linked to repairs (Validation
> failure or failure to create a merkle tree)?
>
> You seem to mention a single node that went down but say the whole cluster
> seem to have zombie data.
> What is the connection you see between the node that went down and the
> fact that deleted data comes back to life ?
> What is your strategy for cyclic maintenance repair (schedule, command
> line or tool, etc...) ?
>
> Thanks,
>
> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha 
> wrote:
>
>> Hi,
>>
>> We have seen a weird behaviour in cassandra 3.6.
>> Once our node was went down more than 10 hrs. After that, we had ran
>> Nodetool repair multiple times. But tombstone are not getting sync properly
>> over the cluster. On day- today basis, on expiry of every grace period,
>> deleted records start surfacing again in cassandra.
>>
>> It seems Nodetool repair in not syncing tomebstone across cluster.
>> FYI, we have 3 data centres now.
>>
>> Just want the help how to verify and debug this issue. Help will be
>> appreciated.
>>
>>
>> --
>> Regards,
>> Atul Saroha
>>
>> *Lead Software Engineer | CAMS*
>>
>> M: +91 8447784271
>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>
>> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Regards,
Atul Saroha

*Lead Software Engineer | CAMS*

M: +91 8447784271
Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
Udyog Vihar Phase IV,Gurgaon, Haryana, India


Re: WARN Writing large partition for materialized views

2016-09-29 Thread Alexander Dejanovski
Hi Robert,

Materialized Views are regular C* tables underneath, so based on their PK
they can generate big partitions.
It is often advised to keep partition size under 100MB because larger
partitions are hard to read and compact. They usually put pressure on the
heap and lead to long GC pauses  + laggy compactions.
You could possibly OOM while trying to fully read a partition that is way
too big for your heap.

It is indeed a schema problem and you most likely have to bucket your MV in
order to split those partitions into smaller chunks. In the case of MV, you
possibly need to add a bucketing field to the table it relies on (if you
don't have one already), and add it to the MV partition key.

You should try to use cassandra-stress to test your bucket sizes :
https://docs.datastax.com/en/cassandra/3.x/cassandra/tools/toolsCStress.html
In your schema definition you can now specify the creation of a MV.

Cheers,


On Wed, Sep 28, 2016 at 7:35 PM Robert Sicoie 
wrote:

> Hi guys,
>
> I run a cluster with 5 nodes, cassandra version 3.0.5.
>
> I get this warning:
> 2016-09-28 17:22:18,480 BigTableWriter.java:171 - Writing large
> partition...
>
> for some materialized view. Some have values over 500MB. How this affects
> performance? What can/should be done? I suppose is a problem in the schema
> design.
>
> Thanks,
> Robert Sicoie
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Optimising the data model for reads

2016-09-29 Thread Thomas Julian
Hello,



I have created a column family for User File Management.


CREATE TABLE "UserFile" ("USERID" bigint,"FILEID" text,"FILETYPE" 
int,"FOLDER_UID" text,"FILEPATHINFO" text,"JSONCOLUMN" text,PRIMARY KEY 
("USERID","FILEID"));



Sample Entry



(4*003, 3f9**6a1, null, 2 , 
[{"FOLDER_TYPE":"-1","UID":"1","FOLDER":"\"HOME\""}] 
,{"filename":"untitled","size":1,"kind":-1,"where":""})




Queries :



Select "USERID","FILEID","FILETYPE","FOLDER_UID","JSONCOLUMN" from "UserFile" 
where "USERID"=value and "FILEID" in (value,value,...)



Select "USERID","FILEID","FILEPATHINFO" from "UserFile" where 
"USERID"=value and "FILEID" in (value,value,...) 



This column family was perfectly working in our lab. I was able to fetch the 
results for the queries stated at less than 10ms. I deployed this in 
production(Cassandra 2.1.13), It was working perfectly for a month or two. But 
now at times the queries are taking 5s to 10s. On analysing further, I found 
that few users are deleting the files too frequently. This generates too many 
tombstones. I have set the gc_grace_seconds to the default 10 days and I have 
chosen SizeTieredCompactionStrategy. I want to optimise this Data Model for 
read efficiency. 



Any help is much appreciated.



Best Regards,

Julian.








Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

2016-09-29 Thread Alexander Dejanovski
Hi Atul,

could you be more specific on how you are running repair ? What's the
precise command line for that, does it run on several nodes at the same
time, etc...
What is your gc_grace_seconds ?
Do you see errors in your logs that would be linked to repairs (Validation
failure or failure to create a merkle tree)?

You seem to mention a single node that went down but say the whole cluster
seem to have zombie data.
What is the connection you see between the node that went down and the fact
that deleted data comes back to life ?
What is your strategy for cyclic maintenance repair (schedule, command line
or tool, etc...) ?

Thanks,

On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha 
wrote:

> Hi,
>
> We have seen a weird behaviour in cassandra 3.6.
> Once our node was went down more than 10 hrs. After that, we had ran
> Nodetool repair multiple times. But tombstone are not getting sync properly
> over the cluster. On day- today basis, on expiry of every grace period,
> deleted records start surfacing again in cassandra.
>
> It seems Nodetool repair in not syncing tomebstone across cluster.
> FYI, we have 3 data centres now.
>
> Just want the help how to verify and debug this issue. Help will be
> appreciated.
>
>
> --
> Regards,
> Atul Saroha
>
> *Lead Software Engineer | CAMS*
>
> M: +91 8447784271
> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


[cassandra 3.6.] Nodetool Repair + tombstone behaviour

2016-09-29 Thread Atul Saroha
Hi,

We have seen a weird behaviour in cassandra 3.6.
Once our node was went down more than 10 hrs. After that, we had ran
Nodetool repair multiple times. But tombstone are not getting sync properly
over the cluster. On day- today basis, on expiry of every grace period,
deleted records start surfacing again in cassandra.

It seems Nodetool repair in not syncing tomebstone across cluster.
FYI, we have 3 data centres now.

Just want the help how to verify and debug this issue. Help will be
appreciated.

-- 
Regards,
Atul Saroha

*Lead Software Engineer | CAMS*

M: +91 8447784271
Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
Udyog Vihar Phase IV,Gurgaon, Haryana, India