Re: Fast Writes to Cassandra Failing Through Python Script

2018-03-15 Thread Jon Haddad
TWCS does SizeTieredCompaction within the window, so it’s not likely to make a difference. I’m +1’ing what Jeff said, 128ms memtable_flush_period_in_ms is almost certainly your problem, unless you’ve changed other settings and haven’t told us about them. > On Mar 15, 2018, at 9:54 AM, Affan

Re: nodetool repair and compact

2018-04-01 Thread Jon Haddad
You’ll find the answers to your questions (and quite a bit more) in this blog post from my coworker: http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html Repair doesn’t clean up tombstones, they’r

Backup & Restore w/ AWS Blog Post

2018-04-03 Thread Jon Haddad
Hey folks. We (The Last Pickle) have helped a number of clients set up backup & restore on AWS over the last couple of years. Alain has been working on a thorough blog post over the last several months to try to document pros, cons and techniques. Hopefully it proves to be helpful to the comm

Re: Text or....

2018-04-04 Thread Jon Haddad
Depending on the compression rate, I think it would generate less garbage on the Cassandra side if you compressed it client side. Something to test out. > On Apr 4, 2018, at 7:19 AM, Jeff Jirsa wrote: > > Compressing server side and validating checksums is hugely important in the > more freq

JVM Tuning post

2018-04-11 Thread Jon Haddad
Hey folks, We (The Last Pickle) have helped a lot of teams with JVM tuning over the years, finally managed to write some stuff down. We’re hoping the community finds it helpful. http://thelastpickle.com/blog/2018/04/11/gc-tuning.html

Re: read request is slow

2019-03-15 Thread Jon Haddad
1. What was the read request? Are you fetching a single row, a million, something else? 2. What are your GC settings? 3. What's the hardware in use? What resources have been allocated to each instance? 4. Did you see this issue after a single request or is the cluster under heavy load? If you're

Re: read request is slow

2019-03-16 Thread Jon Haddad
t; HEAP_NEWSIZE to 100 MB >> >> >> >> And >> >> >> >> heap with 50% of that as a starting point? Hw do I do this? >> >> >> >> Thanks >> >> >> >> >> >> *From:* Dieudonné Madishon NGAYA [mailto:dmng...@gmail.com

Re: Fw: read request is slow

2019-03-18 Thread Jon Haddad
;> Тема: Re: read request is slow >>> От: Dieudonné Madishon NGAYA >>> Кому: user@cassandra.apache.org >>> Копия: >>> >>> >>> >>> For your information,since cassandra 3.0, it includes ttop and other >>> options inside sjk >

Re: Garbage Collector

2019-03-19 Thread Jon Haddad
G1 is optimized for high throughput with higher pause times. It's great if you have mixed / unpredictable workloads, and as Elliott mentioned is mostly set & forget. ZGC requires Java 11, which is only supported on trunk. I plan on messing with it soon, but I haven't had time yet. We'll share t

Re: Cassandra Possible read/write race condition in LOCAL_ONE?

2019-03-28 Thread Jon Haddad
I'm reading the OP as doing this from a single server, if that's the case QUORUM / LOCAL_QUORUM will work. On Thu, Mar 28, 2019 at 3:29 PM Jeff Jirsa wrote: > > Yes it can race; if you don't want to race, you'd want to use SERIAL or > LOCAL_SERIAL. > > On Thu, Mar 28, 2019 at 3:04 PM Richard Xin

Re: Assassinate fails

2019-04-04 Thread Jon Haddad
Ken, Alain is right about the system tables. What you're describing only works on non-local tables. Changing the CL doesn't help with keyspaces that use LocalStrategy. Here's the definition of the system keyspace: CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'} AND durable

Re: Assassinate fails

2019-04-04 Thread Jon Haddad
ssandra user. > Within cqlsh, there is code that forces the default cassandra user to connect > by querying system_auth at QUORUM consistency. This can be problematic in > larger clusters, and is another reason why you should never use the default > cassandra user. > > >

Re: Assassinate fails

2019-04-04 Thread Jon Haddad
gt; > > This problem is often seen when logging in with the default cassandra user. > > Within cqlsh, there is code that forces the default cassandra user to > > connect by querying system_auth at QUORUM consistency. This can be > > problematic in larger clusters, and is an

Re: How to monitor datastax driver compression performance?

2019-04-08 Thread Jon Haddad
If it were me, I'd look at raw request rates (in terms of requests / second as well as request latency), network throughput and then some flame graphs of both the server and your application: https://github.com/jvm-profiling-tools/async-profiler. I've created an issue in tlp-stress to add compress

Re: How to monitor datastax driver compression performance?

2019-04-09 Thread Jon Haddad
another one with less payload could not. > > Thanks for your help Jon. > > > El lun., 8 abr. 2019 a las 19:13, Jon Haddad () escribió: >> >> If it were me, I'd look at raw request rates (in terms of requests / >> second as well as request latency), network throu

Re: Questions about C* performance related to tombstone

2019-04-09 Thread Jon Haddad
Normal deletes are fine. Sadly there's a lot of hand wringing about tombstones in the generic sense which leads people to try to work around *every* case where they're used. This is unnecessary. A tombstone over a single row isn't a problem, especially if you're only fetching that one row back.

Re: 2.1.9 --> 2.2.13 upgrade node startup after upgrade very slow

2019-04-17 Thread Jon Haddad
Run the async java profiler on the node to determine what it's doing: https://github.com/jvm-profiling-tools/async-profiler On Wed, Apr 17, 2019 at 11:31 AM Carl Mueller wrote: > > No, we just did the package upgrade 2.1.9 --> 2.2.13 > > It definitely feels like some indexes are being recalculate

Re: 2.1.9 --> 2.2.13 upgrade node startup after upgrade very slow

2019-04-17 Thread Jon Haddad
Let me be more specific - run the async java profiler and generate a flame graph to determine where CPU time is spent. On Wed, Apr 17, 2019 at 11:36 AM Jon Haddad wrote: > > Run the async java profiler on the node to determine what it's doing: > https://github.com/jvm-profili

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jon Haddad
Agreed with Jeff here. The whole "community recommends no more than 1TB" has been around, and inaccurate, for a long time. The biggest issue with dense nodes is how long it takes to replace them. 4.0 should help with that under certain circumstances. On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa

Re: Increasing the size limits implications

2019-04-30 Thread Jon Haddad
Just curious - why are you using such large batches? Most of the time when someone asks this question, it's because they're using batches as they would in an RDBMS, because larger transactions improve performance. That doesn't apply with Cassandra. Batches are OK at keeping multiple tables in sy

Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-04 Thread Jon Haddad
That line is only relevant for when you're starting your cluster and you need to define your initial tokens in a non-random way. Random token distribution doesn't work very well when you only use 4 tokens. Once you get the cluster set up you don't need to specify tokens anymore, you can just use

Re: Priority in IN () cqlsh comand

2019-05-05 Thread Jon Haddad
Do separate queries for each partition you want. There's no benefit in using the IN() clause here, and performance is significantly worse with multi-partition IN(), especially if the partitions are small. On Sun, May 5, 2019 at 4:52 AM Soheil Pourbafrani wrote: > > Hi, > > I want to run cqlsh qu

Re: Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread Jon Haddad
. > > Sent using Zoho Mail > > > > ==== Forwarded message > From: Jon Haddad > To: > Date: Sat, 04 May 2019 22:10:39 +0430 > Subject: Re: How to set up a cluster with allocate_tokens_for_keyspace? > Forwarded message > > That line is only

Re: Collecting Latency Metrics

2019-05-30 Thread Jon Haddad
Yep. I would *never* use mean when it comes to performance to make any sort of decisions. I prefer to graph all the p99 latencies as well as the max. Some good reading on the topic: https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/ On Thu, May 30, 2019 at 7:35 AM Chris Lohfin

Re: Recover lost node from backup or evict/re-add?

2019-06-12 Thread Jon Haddad
100% agree with Sean. I would only use Cassandra backups in a case where you need to restore from full cluster loss. Example: An entire DC burns down, tornado, flooding. Your routine node replacement after a failure should be replace_address_first_boot. To ensure this goes smoothly, run regular

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

2019-06-28 Thread Jon Haddad
Yep - not to mention the increased complexity and overhead of going from ONE to QUORUM, or the increased cost of QUORUM in RF=5 vs RF=3. If you're in a cloud provider, I've found you're almost always better off adding a new DC with a higher RF, assuming you're on NTS like Jeff mentioned. On Fri,

Re: Compaction throughput

2019-07-19 Thread Jon Haddad
It's a limit on the total compaction throughput. On Fri, Jul 19, 2019 at 10:39 AM Vlad wrote: > Hi, > > is 'nodetool setcompactionthroughput' sets limit for all compactions on > the node, or is it per compaction thread? > > Thanks. >

Re: Materialized View's additional PrimaryKey column

2019-07-24 Thread Jon Haddad
I really, really advise against using MVs. I've had to help a number of teams move off them. Not sure what list of bugs you read, but if the list didn't include "will destabilize your cluster to the point of constant downtime" then the list was incomplete. Jon On Wed, Jul 24, 2019 at 6:32 AM me

Re: Materialized View's additional PrimaryKey column

2019-07-25 Thread Jon Haddad
for using MV was avoiding updates (delete + > create) on primaryKey columns because we suppose that cassandra developers > can manage this unpreferred operation better then us. I'm really confused > now. > > > > On Wednesday, July 24, 2019, 11:30:15 PM GMT+3, Jon Haddad

Re: Performance impact with ALLOW FILTERING clause.

2019-07-25 Thread Jon Haddad
If you're thinking about rewriting your data to be more performant when doing analytics, you might as well go the distance and put it in an analytics friendly format like Parquet. My 2 cents. On Thu, Jul 25, 2019 at 11:01 AM ZAIDI, ASAD A wrote: > Thank you all for your insights. > > > > When s

Re: Cheat Sheet for Unix based OS, Performance troubleshooting

2019-07-28 Thread Jon Haddad
http://www.brendangregg.com/linuxperf.html On Sat, Jul 27, 2019 at 2:45 AM Paul Chandler wrote: > I have always found Amy's Cassandra 2.1 tuning guide great for the Linux > performance tuning: > https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html > > Sent from my iPhone > > On 26 J

Re: Cassandra read requests not getting timeout

2019-08-05 Thread Jon Haddad
I think this might be because the timeout only applied to each request, and the driver is paginating in the background. Each page is a new request. On Mon, Aug 5, 2019, 12:08 AM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Mon, Aug 5, 2019 at 8:50 AM nokia ceph > wrote: > >> Hi

Re: Datafile Corruption

2019-08-08 Thread Jon Haddad
Any chance you're using NVMe with an older Linux kernel? I've seen a *lot* filesystem errors from using older CentOS versions. You'll want to be using a version > 4.15. On Thu, Aug 8, 2019 at 9:31 AM Philip Ó Condúin wrote: > *@Jeff *- If it was hardware that would explain it all, but do you t

Re: New column

2019-08-18 Thread Jon Haddad
If you're giving the partition key you won't scan the whole table. The overhead will depend on the size or the partition. Would be an interesting workload for our tlp-stress tool, I'll code something up for the next release. On Sun, Aug 18, 2019, 12:58 PM Rahul Reddy wrote: > Hello, > > We have

Re: New column

2019-08-19 Thread Jon Haddad
gt; Jon, > > If we expect non of our partition key to have more than 100 records and > pass partition key in where clause we wouldnt see issues using new column > and allow filtering? Can you please point me to any doc how allow > filtering works. I was in assumption of it goes through

Re: Disk space utilization by from some Cassandra

2019-08-21 Thread Jon Haddad
This advice hasn't been valid for a long time now for most use cases. The only time you need to reserve 50% disk space is if you're going to be running major compactions against a table in your cluster that occupies 50% of its total disk space. Nowadays, that's far less common than it was when yo

Re: New column

2019-08-22 Thread Jon Haddad
Just to close the loop on this, I did a release of tlp-stress last night, which now has this workload (AllowFiltering). You can grab a deb, rpm, tarball or docker image. Docs are here: http://thelastpickle.com/tlp-stress/ Jon On Mon, Aug 19, 2019 at 2:21 PM Jon Haddad wrote: > It'll

Re: Is it possible to build multi cloud cluster for Cassandra

2019-09-05 Thread Jon Haddad
Technically, not a problem. Use GossipingPropertyFileSnitch to keep things simple and you can go across whatever cloud providers you want without issue. The biggest issue you're going to have isn't going to be Cassandra, it's having the expertise in the different cloud providers to understand the

Re: Update/where statement Adds Row

2019-09-11 Thread Jon Haddad
Probably not a great idea unless you're using it sparingly. Using LWTs without knowing all the caveats is likely to lead to terrible cluster performance. On Wed, Sep 11, 2019, 10:59 PM A wrote: > Is it ok if I do this? > > ... where email = em AND company_id = id IF EXISTS > > > > > > Sent fr

Re: cluster rolling restart

2019-10-16 Thread Jon Haddad
I agree with Jeff here. Ideally you should be so comfortable with rolling restarts that they become second nature. Cassandra is designed to handle them and you should not be afraid to do them regularly. On Wed, Oct 16, 2019, 8:06 AM Jeff Jirsa wrote: > > Personally I encourage you to rolling res

Re: Elevated response times from all nodes in a data center at the same time.

2019-10-16 Thread Jon Haddad
It's possible the queries you're normally running are served out of page cache, and during the latency spike you're hitting your disks. If you're using read ahead you might be hitting a throughput limit on the disks. I've got some numbers and graphs I can share later when I'm not on my phone. Jon

Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

2019-10-21 Thread Jon Haddad
I still use ParNew + CMS over G1GC with Java 8. I haven't done a comparison with JDK 11 yet, so I'm not sure if it's any better. I've heard it is, but I like to verify first. The pause times with ParNew + CMS are generally lower than G1 when tuned right, but as Chris said it can be tricky. If y

Re: [EXTERNAL] Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

2019-10-21 Thread Jon Haddad
testing harnesses. It isn’t worth our time. As a previous > writer mentioned, there is usually better return on our time tuning the > schema (aka helping developers understand Cassandra’s strengths). > > > > We use 16 – 32 GB heaps, nothing smaller than that. > > >

Re: [EXTERNAL] Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

2019-10-21 Thread Jon Haddad
ospel truth that -XX:+UseNUMA is a good >> thing on AWS (or anything virtualized), you’d have to run your own tests >> and find out. >> >> >> >> R >> >> *From: *Jon Haddad >> *Reply-To: *"user@cassandra.apache.org" >> *Date: *Mo

Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

2019-10-22 Thread Jon Haddad
CPU waiting on memory will look like CPU overhead. There's a good post on the topic by Brendan Gregg: http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html Regarding GC, I agree with Reid. You're probably not going to saturate your network card no matter what your settings,

Re: Cassandra Rack - Datacenter Load Balancing relations

2019-10-23 Thread Jon Haddad
Personally, I wouldn't ever do this. I recommend separate DCs if you want to keep workloads separate. On Wed, Oct 23, 2019 at 4:06 PM Sergio wrote: > I forgot to comment for > >OPTION C) >1. Node DC RACK AZ 1 read ONE us-east-1a 2 read ONE us-east-1b >2. 3 read ONE us-east

Re: Cassandra Rack - Datacenter Load Balancing relations

2019-10-23 Thread Jon Haddad
t-1a >- 4 write TWO us-east-1b 5 write TWO us-east-1b >- 6 write TWO us-east-1b > > > Here we have 2 DC read and write > One Rack per DC > One Availability Zone per DC > > Thanks, > > Sergio > > > On Wed, Oct 23, 2019, 1:11 PM Jon Haddad wr

Re: merge two cluster

2019-10-23 Thread Jon Haddad
Probably not beneficial, I wouldn't do it. Not a fan of multi-tenancy with Cassandra unless the use cases are so small that your noisy neighbor problem is not very noisy at all. For those cases I don't know what you get from Cassandra other than a cool resume. On Wed, Oct 23, 2019 at 12:41 PM Re

Re: Repair Issues

2019-10-24 Thread Jon Haddad
There's some major warning signs for me with your environment. 4GB heap is too low, and Cassandra 3.7 isn't something I would put into production. Your surface area for problems is massive right now. Things I'd do: 1. Never use incremental repair. Seems like you've already stopped doing them,

Re: TWCS and gc_grace_seconds

2019-10-26 Thread Jon Haddad
My coworker Radovan wrote up a post on the relationship between gc grace and hinted handoff: https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html Jon On Sat, Oct 26, 2019 at 6:45 AM Hossein Ghiyasi Mehr wrote: > It needs to change gc_grace_seconds carefully because

Re: What is the status of counters? Should I use them?

2019-10-30 Thread Jon Haddad
Counters are good for things like page views, bad for money. Yes they can under or overcount in certain situations. If your cluster is stable, you'll see very little of it in practice. I've done quite a bit of tuning of counters. Here's the main takeaways: * They do a read before a write, so u

Re: What is the status of counters? Should I use them?

2019-10-30 Thread Jon Haddad
table on a regular basis as any > other? > > > > ‐‐‐ Original Message ‐‐‐ > On Wednesday, 30 October 2019 16:26, Jon Haddad wrote: > > Counters are good for things like page views, bad for money. Yes they can > under or overcount in certain situations. If your cluster is

Re: Where to get old RPMs?

2019-10-30 Thread Jon Haddad
Archives are here: http://archive.apache.org/dist/cassandra/ For example, the RPM for 3.11.x you can find here: http://archive.apache.org/dist/cassandra/redhat/311x/ The old releases are removed by Apache automatically as part of their policy, it's not specific to Cassandra. On Wed, Oct 30, 201

Re: Cassandra 4 alpha/alpha2

2019-10-31 Thread Jon Haddad
What artifact did you use and what OS are you on? On Thu, Oct 31, 2019 at 12:40 PM Abdul Patel wrote: > Hey Everyone > > Did anyone was successfull to install either alpha or alpha2 version for > cassandra 4.0? > Found 2 issues : > 1> cassandra-env.sh: > JAVA_VERSION varianle is not defined. > J

Re: Cassandra 4 alpha/alpha2

2019-11-01 Thread Jon Haddad
A new thing like this would be much better served by the community through several iterations. For instance, over the last year I've developed a tool for spinning up lab clusters, it's here: https://thelastpickle.com/tlp-cluster/ I had to make a *lot* of tradeoffs here. Everything Jeff mentioned

Re: Priority for cassandra nodes in cluster

2016-11-12 Thread Jon Haddad
Agreed w/ Benjamin. Trying to diagnose issues in prod will be a nightmare. Keep your DB servers homogeneous. > On Nov 12, 2016, at 1:52 PM, Benjamin Roth wrote: > > 1. From a 15 year experience of running distributed Services: dont Mix > Services on machines if you don't have to. Dedicate

Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad
You’ve asked a lot of questions on this mailing list, and you’ve gotten help on a ton of beginner issues. Making fun of someone for asking similar beginner questions is not cool at all. Cut it out. > On Nov 14, 2016, at 10:13 AM, Ali Akhtar wrote: > > Another solution could be to print the

Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad
Jon > On Nov 14, 2016, at 10:25 AM, Ali Akhtar wrote: > > Excuse me? I did not make fun of anyone. I gave valid suggestions that are > all theoretically possible. > > If it came off in a condescending way, i am genuinely sorry. > > > On 14 Nov 2016 11:22 pm, &q

Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad
While Cassandra *can* be used this way, I don’t recommend it. It’s going to be far cheaper and easier to maintain to store data in an Object store like S3, like Oskar recommended. > On Nov 14, 2016, at 10:16 AM, l...@airstreamcomm.net wrote: > > We store videos and files in Cassandra by chunki

Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Jon Haddad
LWT != Last Write Wins. They are totally different. LWTs give you (assuming you also read at SERIAL) “atomic consistency”, meaning you are able to perform operations atomically and in isolation. That’s the safety blanket everyone wants but is extremely expensive, especially in Cassandra. T

Re: Cassandra version numbering

2017-02-23 Thread Jon Haddad
No > On Feb 23, 2017, at 1:59 PM, Rakesh Kumar wrote: > > Is ver 3.0.10 same as 3.10. > > Cassandra website mentions this: Cassandra 3.10 Changelog > > But in other places 3.0.10 is mentioned.

Re: G1 GC settings

2015-10-13 Thread Jon Haddad
You may want to read Al Tobey’s Cassandra tuning guide. It’s got a section on G1. It’s being widely used, successfully, at massive scale. https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html > On Oct 13, 2015, a

Re: LOCAL_SERIAL

2015-10-15 Thread Jon Haddad
ZK seems a little overkill for just 1 feature though. LOCAL_SERIAL is fine if all you want to do is keep a handful of keys up to date. There’s a massive cost in adding something new to your infrastructure, and imo, very little gain in this case. > On Oct 15, 2015, at 8:29 AM, Eric Stevens w

Re: Read query taking a long time

2015-10-19 Thread Jon Haddad
I wrote a blog post a while back you may find helpful on diagnosing problems in production. There's a lot of potential things that could be wrong with your cluster and going back and forth on the ML to pin down the right one will take forever. http://rustyrazorblade.com/2014/09/cassandra-summi

Re: Data visualization tools for Cassandra

2015-10-20 Thread Jon Haddad
PySpark (dataframes) + Pandas + Seaborn/Matplotlib > On Oct 20, 2015, at 11:22 AM, Charles Rich wrote: > > Take a look at jKool, a DataStax partner at jKoolCloud.com > . It provides visualization for data in DSE. > > Regards, > > Charley > > From: Gene [mailto:gh5

Re: timestamp as clustering key doesn't work as expected

2015-10-23 Thread Jon Haddad
What version of Cassandra? I can’t think of a reason why you’d see this output. If you can reliably reproduce, this should be filed as a JIRA. https://issues.apache.org/jira > On Oct 23, 2015, at 8:55 AM, Kai Wang wrote: > > Hi, > > I use a timestamp column as the last clustering key so t

Re: Oracle TIMESTAMP(9) equivalent in Cassandra

2015-10-29 Thread Jon Haddad
Keep in mind that in a distributed environment you probably have so much variance that nanosecond precision is pointless. Even google notes that in the paper, Dapper, a Large-Scale Distributed Systems Tracing Infrastructure [http://research.google.com/pubs/pub36356.html

Re: compression cpu overhead

2015-11-03 Thread Jon Haddad
You won't see any overhead on writes because you don't actually write to sstables when performing a write. Just the commit log & memtable. Memtables are flushes asynchronously. > On Nov 4, 2015, at 1:57 AM, Tushar Agrawal wrote: > > For writes it's negligible. For reads it makes a significan

Re: scylladb

2015-11-05 Thread Jon Haddad
Nope, no one I know. Let me know if you try it I'd love to hear your feedback. > On Nov 5, 2015, at 9:22 AM, tommaso barbugli wrote: > > Hi guys, > > did anyone already try Scylladb (yet another fastest NoSQL database in town) > and has some thoughts/hands-on experience to share? > > Cheers,

Re: Deletes Reappeared even when nodes are not down

2015-11-13 Thread Jon Haddad
Any chance your clocks are off? > On Nov 13, 2015, at 1:09 PM, Peddi, Praveen wrote: > > Hi, > We are using Cassandra 2.0.8, with replication factor of 3. > > We are seeing a scenario where some of the rows in the table reappears even > after they are deleted. We have seen this in Prod 3 tim

Re: Deletes Reappeared even when nodes are not down

2015-11-13 Thread Jon Haddad
deleted (based on last modified date field). We > are definitely not talking about few millis here. > > Praveen > > From: Jon Haddad <mailto:jonathan.had...@gmail.com>> > Reply-To: "user@cassandra.apache.org <mailto:user@cassandra.apache.org>" > mailto:

Re: Deletes Reappeared even when nodes are not down

2015-11-13 Thread Jon Haddad
clocks are not 20 minutes off. > > > From: Jon Haddad <mailto:jonathan.had...@gmail.com>> > Reply-To: "user@cassandra.apache.org <mailto:user@cassandra.apache.org>" > mailto:user@cassandra.apache.org>> > Date: Friday, November 13, 2015 at 4:24 PM

Re: Overriding timestamp with light weight transactions

2015-11-16 Thread Jon Haddad
Perhaps you should fix your clock drift issues instead of trying to use a workaround? > On Nov 16, 2015, at 11:39 AM, Peddi, Praveen wrote: > > Hi, > We are using Cassandra 2.0.9 and we currently have “using timestamp” clause > in all our update queries. We did this to fix occasional issues wi

Re: Overriding timestamp with light weight transactions

2015-11-16 Thread Jon Haddad
We override > the timestamp only if we see current timestamp on the row is in future. Why > do you think overriding timestamp is a work around? It seems like a valid > reason to override timestamps. > > Thanks > Praveen > > > From: Jon Haddad <mailto:jon

Re: Triggering Deletion/Updation

2015-11-22 Thread Jon Haddad
There's no built in way of doing cascading deletes in Cassandra, I really wouldn't recommend using triggers for this either. My advice is to manage it in your app code. > On Nov 22, 2015, at 9:59 AM, Prem Yadav wrote: > > if it is cassandra 2.0+, > you can implement your trigger. Please check

Re: SELECT some_column vs SELECT *

2015-11-24 Thread Jon Haddad
If it's sparsely populated you'll get the same benefit from the schema definition. You don't pay for fields you don't use. > On Nov 24, 2015, at 12:17 PM, Jack Krupansky wrote: > > Are all or ost of the 1000+ columns populated for a given row? If they are > sparse you can replace them with a

Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

2017-04-07 Thread Jon Haddad
Alex Dejanovski wrote a good post on how the LIMIT clause works and why it doesn’t (until 3.4) work the way you think it would. http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html > On Apr 7, 2017, at 7:23 AM, Jerry Lam wrote: > > Hi Jan, >

Re: Slow writes and Frequent timeouts

2017-04-17 Thread Jon Haddad
What are your hardware specs? Where are you running the cluster? Is every node in the same physical datacenter? What command are you using to run stress? > On Apr 17, 2017, at 9:57 AM, Akshay Suresh > wrote: > > Hi > > I have not done much. Just created a schema with SimpleStrategy and a

Re: Query on Data Modelling of a specific usecase

2017-04-19 Thread Jon Haddad
How much data do you plan to store in each table? I’ll be honest, this doesn’t sound like a Cassandra use case at first glance. 1 table per report x 1000 is going to be a bad time. Odds are with different queries, you’ll need multiple views, so lets call that a handful of tables per report.

Re: Downside to running multiple nodetool repairs at the same time?

2017-04-21 Thread Jon Haddad
We (The Last Pickle) forked reaper a while ago and added support for 3.0. https://github.com/thelastpickle/cassandra-reaper We set up a mailing list here for Reaper specific questions: https://groups.google.com/forum/#!forum/tlp-apache-cassand

Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad
Sure, you could use DNS. Where does it say IP addresses are a requirement? > On May 1, 2017, at 1:36 PM, Roman Naumenko wrote: > > If I understand how Cassandra nodes work, they must contain a list of seed’s > IP addressed in config file. > > This requirement makes cluster setup unnecessarily

Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad
mmended value: 256 >> -seeds: internal IP address of each seed node > > I saw also hostnames mentioned few times, but it just makes it even more > confusing. > > — > Roman > >> On May 1, 2017, at 3:50 PM, Jon Haddad > <mailto:jonathan.had...@gmail.com&g

Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad
; Well, I guess I have to figure out what’s up with IPs/hostnames by experiment. > Information about service discovery is practically absent. > Not to mention all important details about fqdns/hostnames, automatic > replacing seed nodes or what not. > > — > Roman > >> On

Re: Service discovery in the Cassandra cluster

2017-05-02 Thread Jon Haddad
n some ec2 instances, drop some cassandra deb packages on 'em - > the thing will figure out how to run... > > Also, how would you get "initial state of the cluster" if the cluster... is > being initialized? > Or that's easy, according to the docs - just hardco

Re: token distribution in multi-dc

2017-05-03 Thread Jon Haddad
It’s important to note that whatever you find on DataStax Academy belongs to DataStax. It’s not community run, and is technically not part of this project. To correct an error there, you should get in touch with DataStax, not ask people on the Apache Cassandra ML to fix it, as we have zero con

Re: Totally unbalanced cluster

2017-05-04 Thread Jon Haddad
Adding nodes with NTS is easier, in my opinion. You don’t need to worry about replica placement, if you do it right. > On May 4, 2017, at 7:43 AM, Cogumelos Maravilha > wrote: > > Hi Alain thanks for your kick reply. > > > Regarding SimpleStrategy perhaps you are right but it's so easy to a

Re: DTCS to TWCS

2017-05-04 Thread Jon Haddad
We (The Last Pickle) wrote a blog post on using TWCS pre-3.0: http://thelastpickle.com/blog/2017/01/10/twcs-part2.html Alex Dejanovski wrote a very comprehensive guide to TWCS I recommend reading before putting it in prod: http://thela

Re: Smart Table creation for 2D range query

2017-05-05 Thread Jon Haddad
I think you’ll want to model your table similar to how an R-Tree [1] / Quad tree [2] works. Let’s suppose you had a 10x10 meter land area and you wanted to put stuff in there. In order to find “all the things in point x,y”, you could break your land area into a grid. A partition would contain

Re: manual deletes with TWCS

2017-05-05 Thread Jon Haddad
You cannot. From Alex’s TLP post: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html TWCS is no fit for workload that perform deletes on non TTLed data. Consider that SSTables from different time windows will never be compacted to

Re: Smart Table creation for 2D range query

2017-05-08 Thread Jon Haddad
It gets a little tricky when you try to add in the coordinates to the clustering key if you want to do operations that are more complex. For instance, finding all the elements within a radius of point (x,y) isn’t particularly fun with Cassandra. I recommend moving that logic into the applicat

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

2017-05-09 Thread Jon Haddad
I don’t see any way it wouldn’t. Have you tried tracing it? > On May 9, 2017, at 8:32 AM, Kant Kodali wrote: > > Hi All, > > It looks like Cassandra 3.10 has partial partition key search but does it > result in a table scan? for example I can have the following > > create table hello( > a te

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

2017-05-09 Thread Jon Haddad
tion key and get the max b ? > > > ​ > > > On Tue, May 9, 2017 at 6:33 AM, Jon Haddad <mailto:jonathan.had...@gmail.com>> wrote: > I don’t see any way it wouldn’t. Have you tried tracing it? > > > On May 9, 2017, at 8:32 AM, Kant Kodali > <mailto:

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

2017-05-09 Thread Jon Haddad
Output from both queries, demonstrating full cluster scans: https://gist.github.com/rustyrazorblade/c4947fc37da85bca50e08aa1ef3c7a06 <https://gist.github.com/rustyrazorblade/c4947fc37da85bca50e08aa1ef3c7a06> Jon > On May 9, 2017, at 9:24 AM, Jon Haddad wrote: > > Nope, I di

Re: Smart Table creation for 2D range query

2017-05-09 Thread Jon Haddad
.i0>. > As Jon mentions, this puts more work on the client, but might give you a lot > of querying flexibility when using Cassandra. > > Jim > > On Mon, May 8, 2017 at 11:13 PM, Jon Haddad <mailto:jonathan.had...@gmail.com>> wrote: > It gets a little tricky wh

Re: Smart Table creation for 2D range query

2017-05-09 Thread Jon Haddad
d > be efficiently queried. > > Jim > > On Tue, May 9, 2017 at 11:19 AM, Jon Haddad > wrote: > >> The problem with using geohashes is that you can’t efficiently do ranges >> with random token distribution. So even if your scalar values are close to >> each

Re: Multi datacenter node loss

2017-07-21 Thread Jon Haddad
SimpleStrategy doesn’t take DC or rack into account at all. It simply places replicas on subsequent tokens. You could end up with 3 copies in 1 DC and zero in another. /** * This class returns the nodes responsible for a given * key but does not respect rack awareness. Basically * return

Re: Data Loss irreparabley so

2017-07-27 Thread Jon Haddad
We (The Last Pickle) maintain an open source tool to help manage repairs across your clusters called Reaper. It’s a lot easier to set up and manage than trying to manage it through cron. http://thelastpickle.com/reaper.html > On Jul 27, 2017, at 12:38 AM,

Re: Cassandra Data migration from 2.2.3 to 3.7

2017-08-01 Thread Jon Haddad
Just curious, why go to 3.7? 3.11 has hundreds of bug fixes that 3.7 doesn’t and will continue to receive fixes. > On Aug 1, 2017, at 3:44 PM, Harika Vangapelli -T (hvangape - AKRAYA INC at > Cisco) wrote: > > Jeff, I tried the below steps for just 3 rows of data, It looks to be > working. B

Re: Migrate from DSE (Datastax) to Apache Cassandra

2017-08-15 Thread Jon Haddad
I agree with Jeff, it’s not necessary to launch a new cluster for this operation. > On Aug 15, 2017, at 7:39 PM, Jeff Jirsa wrote: > > Or just alter the key space replication strategy and remove the DSE specific > strategies in favor of network topology strategy > > > -- > Jeff Jirsa > >

Re: cqlsh -e output - How to change the default delimiter '|' in the output

2017-08-15 Thread Jon Haddad
Using COPY .. TO you can export using the DELIMITER option, does that help? > On Aug 15, 2017, at 9:01 PM, Harikrishnan A wrote: > > Thank you all > > Regards, > Hari > > > On Tuesday, August 15, 2017 12:55 AM, Erick Ramirez > wrote: > > > +1 to Jim and Tobin. cqlsh wasn't designed for w

  1   2   3   >