Re: cold vs hot data
I guess also OS-level page cache also will help out implicitly to make sure your common pages aren't touching disk. On Fri, Sep 14, 2018 at 2:46 AM Alaa Zubaidi (PDF) wrote: > Hi, > > We are using Apache Cassandra 3.11.2 on RedHat 7 > The data can grow to +100TB however the hot data will be in most cases > less than 10TB but we still need to keep the rest of data accessible. > Anyone has this problem? > What is the best way to make the cluster more efficient? > Is there a way to somehow automatically move the old data to different > storage (rack, dc, etc)? > Any ideas? > > Regards, > > -- > > Alaa > > > *This message may contain confidential and privileged information. If it > has been sent to you in error, please reply to advise the sender of the > error and then immediately permanently delete it and all attachments to it > from your systems. If you are not the intended recipient, do not read, > copy, disclose or otherwise use this message or any attachments to it. The > sender disclaims any liability for such unauthorized use. PLEASE NOTE that > all incoming e-mails sent to PDF e-mail accounts will be archived and may > be scanned by us and/or by external service providers to detect and prevent > threats to our systems, investigate illegal or inappropriate behavior, > and/or eliminate unsolicited promotional e-mails (“spam”). If you have any > concerns about this process, please contact us at * > *legal.departm...@pdf.com* *.* -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Current active queries and status/type
You can do sampling of tracing on a table to avoid some of the overhead. On Fri, Mar 2, 2018, 00:23 D. Salvatore <dd.salvat...@gmail.com> wrote: > Hi Nicolas, > Thank you very much for the response. > I am looking into something with a smaller time frame than a minute. > Tracing is a good way to get these information but it introduces a huge > overhead in the system that I'd like to avoid it. > > Thanks > Salvatore > > 2018-03-01 15:08 GMT+00:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>: > >> Hi, >> >> With >> org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency and >> OneMinuteRate you can have such a metrics >> >> As for the state of the request with regards to other node I do no think >> you can have that IMHO with JMX (this is available using TRACING per >> request) >> >> >> On 1 March 2018 at 15:50, D. Salvatore <dd.salvat...@gmail.com> wrote: >> >>> Hello! >>> There is any way to know how many queries a node is currently serving >>> through JMX(or other tools)? And the state of the request so, for example, >>> if the request is waiting for data from another node? >>> >>> Thanks >>> Salvatore >>> >> >> > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: One time major deletion/purge vs periodic deletion
Sounds like you are using Cassandra as a queue. It's an antibiotic pattern. What I would do would be to rely on TTL for removal of data and use the TWCS compaction strategy to handle removal and you just focus on insertion. On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) <chars...@cisco.com> wrote: > Hi, > > > > Wanted the community’s feedback on deciding the schedule of Archive > and Purge job. > > Is it better to Purge a large volume of data at regular intervals (like > run A jobs once in 3 months ) or purge smaller amounts more frequently > (run the job weekly??) > > > > Some estimates on the number of deletes performed would be…upto 80-90K > rows purged in 3 months vs 10K deletes every week ?? > > > > Thanks, > > Charu > > > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Multiple nodes decommission
AFAIK, the fastest way to add multiple nodes is to make sure your clients are only reading/writing to/from your current datacenter, create a new datacenter with replication 0, add nodes to the new datacenter, increase replication factor of the new datacenter, do `nodetool bootstrap` on all nodes on new datacenter, point your clients to the new DC and finally decommision the old one. I've done that multiple times and it's been much faster than adding a few nodes. Obviously, this depends on how much data you have... /J On Sat, Apr 15, 2017 at 10:19 AM, Vlad <qa23d-...@yahoo.com> wrote: > *>range reassignments which becomes effective after a successful > decommission.* > > But during leaving nodes announce themselves as "leaving". Do other > leaving nodes taking this into account and not stream data to them? > (applicable also for joining). I hope so )) > > I guess problem with sequential adding/removing nodes is data > overstreaming and non-even load distribution. I mean if we have three racks > it's better to add/remove by three nodes (one in each rack) and to avoid > state with four nodes, for example. > > Any thoughts? > > > On Tuesday, April 11, 2017 7:55 PM, benjamin roth <brs...@gmail.com> > wrote: > > > I did not test it but I'd bet that parallel decommision will lead to > inconsistencies. > Each decommission results in range movements and range reassignments which > becomes effective after a successful decommission. > If you start several decommissions at once, I guess the calculated > reassignments are invalid for at least one node after the first node > finished the decommission process. > > I hope someone will correct me if i am wrong. > > 2017-04-11 18:43 GMT+02:00 Jacob Shadix <jacobsha...@gmail.com>: > > Are you using vnodes? I typically do one-by-one as the decommission will > create additional load/network activity streaming data to the other nodes > as the token ranges are reassigned. > > -- Jacob Shadix > > On Sat, Apr 8, 2017 at 10:55 AM, Vlad <qa23d-...@yahoo.com> wrote: > > Hi, > > how multiple nodes should be decommissioned by "nodetool decommission"- > one by one or in parallel ? > > Thanks. > > > > > > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
`nodetool verify` outcome check
Hi, We've had a discussion internally to start to run `nodetool verify` periodically to test for bitrot. Does anyone know how I could check if the verification failed or succeeded from, say, a script? Is there an error exit code or some output I could grep for? Thanks, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Backup restore with a different name
Bryan, On Wed, Nov 2, 2016 at 11:38 AM, Bryan Cheng <br...@blockcypher.com> wrote: > do you mean restoring the cluster to that state, or just exposing that > state for reference while keeping the (corrupt) current state in the live > cluster? I mean "exposing that state for reference while keeping the (corrupt) current state in the live cluster". Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Backup restore with a different name
Thanks Anubhav, Looks like a Java project without any documentation whatsoever ;) How do I use the tool? What does it do? Cheers, Jens On Wed, Nov 2, 2016 at 11:36 AM, Anubhav Kale <anubhav.k...@microsoft.com> wrote: > You would have to build some logic on top of what’s natively supported. > > > > Here is an option: https://github.com/anubhavkale/CassandraTools/ > tree/master/BackupRestore > > > > > > *From:* Jens Rantil [mailto:jens.ran...@tink.se] > *Sent:* Wednesday, November 2, 2016 2:21 PM > *To:* Cassandra Group <user@cassandra.apache.org> > *Subject:* Backup restore with a different name > > > > Hi, > > > > Let's say I am periodically making snapshots of a table, say "users", for > backup purposes. Let's say a developer makes a mistake and corrupts the > table. Is there an easy way for me to restore a replica, say > "users_20161102", of the original table for the developer to looks at the > old copy? > > > > Cheers, > > Jens > > > > -- > > Jens Rantil > > Backend engineer > > Tink AB > > > > Email: jens.ran...@tink.se > > Phone: +46 708 84 18 32 > > Web: www.tink.se > <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tink.se%2F=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cb96c56bb385a4b769ef508d403662a7c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636137184724447397=Vg7mNwD7Wcvyui1HSugueVv4GAc7961mWXUYMR0cE%2B4%3D=0> > > > > Facebook > <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.facebook.com%2F%23!%2Ftink.se=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cb96c56bb385a4b769ef508d403662a7c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636137184724457399=JD11Q5%2FlsE0nUZLoq%2FTI3tYKh3nZgNnlU8uCSBOJEOQ%3D=0> > Linkedin > <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fcompany%2F2735919%3Ftrk%3Dvsrp_companies_res_photo%26trkInfo%3DVSRPsearchId%253A1057023381369207406670%252CVSRPtargetId%253A2735919%252CVSRPcmpt%253Aprimary=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cb96c56bb385a4b769ef508d403662a7c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636137184724457399=z3yBN1NQpgEfjR2G5O4mwWE5GVw1ziIgj80v2%2FBIkB4%3D=0> > Twitter > <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Ftink=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cb96c56bb385a4b769ef508d403662a7c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636137184724457399=yo9FEAlRW6LdH5fsm4YdZoYLn6VvSt0h0iZCSEl2avY%3D=0> > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Backup restore with a different name
Hi, Let's say I am periodically making snapshots of a table, say "users", for backup purposes. Let's say a developer makes a mistake and corrupts the table. Is there an easy way for me to restore a replica, say "users_20161102", of the original table for the developer to looks at the old copy? Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Cassandra Poor Read Performance Response Time
Hi, I am by no means an expert on Cassandra, nor on DateTieredCompactionStrategy. However, looking in "Query 2.xlsx" I see a lot of Partition index with 0 entries found for sstable 186 To me, that looks like Cassandra is looking at a lot of sstables and realize too late that they don't contain any relevant data. Are you using TTLs when you write data? Do the TTLs vary? If they do, there's a risk Cassandra will have to inspect a lot of tables that turns out to hold expired data. Also, have you checked `nodetool cfstats` and bloom filter false positives? Does `nodetool cfhistograms` give you any insights? I'm mostly thinking in terms of unbalanced partition keys. Have you checked the logs for how long GC pauses are being taken? Somewhat implementation specific: Would adjusting the time bucket to a smaller time resolution be an option? Also, since you are using DateTieredCompactionStrategy, have you considered using a TIMESTAMP constraint[1]? That might help you a lot actually. [1] https://issues.apache.org/jira/browse/CASSANDRA-5514 Cheers, Jens On Mon, Oct 31, 2016 at 11:10 PM, _ _ <rage...@hotmail.com> wrote: > Hi > > Currently i am running a cassandra cluster of 3 nodes (with it replicating > to both nodes) and am experiencing poor performance, usually getting second > response times when running queries when i am expecting/needing millisecond > response times. Currently i have a table which looks like: > > CREATE TABLE tracker.all_ad_impressions_counter_1d ( > time_bucket bigint, > ad_id text, > uc text, > count counter, > PRIMARY KEY ((time_bucket, ad_id), uc) > ) WITH CLUSTERING ORDER BY (uc ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'base_time_seconds': '3600', 'class': > 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', > 'max_sstable_age_days': '30', 'max_threshold': '32', 'min_threshold': '4', > 'timestamp_resolution': 'MILLISECONDS'} > AND compression = {'chunk_length_in_kb': '64', 'class': ' > org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > > > and queries which look like: > > SELECT > time_bucket, > uc, > count > FROM > all_ad_impressions_counter_1d > > WHERE ad_id = ? > AND time_bucket = ? > > the cluster is running on servers with 16 GB RAM, and 4 CPU cores and 3 > 100GB datastores, the storage is not local and these VMs are being managed > through openstack. There are roughly 200 million records being written per > day (1 time_bucket) and maybe a few thousand records per partition > (time_bucket, ad_id) at most. The amount of writes is not having a > significant effect on our read performance as when writes are stopped, the > read response time does not improve noticeably. I have attached a trace of > one query i ran which took around 3 seconds which i would expect to take > well below a second. I have also included the cassandra.yaml file and jvm > options file. We do intend to change the storage to local storage and > expect this will have a significant impact but i was wondering if there's > anything else which could be changed which will also have a significant > impact on read performance? > > Thanks > Ian > > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Does anyone store larger values in Cassandra E.g. 500 KB?
If I would do this, I would have have two tables; chunks and data: CREATE TABLE file_chunks { filename string, chunk int, size int, // Optional if you want to query the total size of a file. PRIMARY KEY (filename, chunk) } CREATE TABLE chunks { filename string, chunk int, data blob, PRIMARY KEY ((filename, chunk)) } By keeping the data chunks in a separate table, you'd make sure to spread the data more evenly across the cluster. If the size of the files differ in a size this is a much better approach. Also, using `(filename, chunk)` in `data` table makes it possible for you do have a background process that makes sure to delete rows in `chunks` that no longer exist in `file_chunks`. Jens On Friday, October 21, 2016, jason zhao yang <zhaoyangsingap...@gmail.com> wrote: > 1. usually before storing object, serialization is needed, so we can know > the size. > 2. add "chunk id" as last clustering key. > > Vikas Jaiman <er.vikasjai...@gmail.com > <javascript:_e(%7B%7D,'cvml','er.vikasjai...@gmail.com');>>于2016年10月21日周五 > 下午11:46写道: > >> Thanks for your answer but I am just curious about: >> >> i)How do you identify the size of the object which you are going to chunk? >> >> ii) While reading or updating how it is going to read all those chunks? >> >> Vikas >> >> On Thu, Oct 20, 2016 at 9:25 PM, Justin Cameron <jus...@instaclustr.com >> <javascript:_e(%7B%7D,'cvml','jus...@instaclustr.com');>> wrote: >> >>> You can, but it is not really very efficient or cost-effective. You may >>> encounter issues with streaming, repairs and compaction if you have very >>> large blobs (100MB+), so try to keep them under 10MB if possible. >>> >>> I'd suggest storing blobs in something like Amazon S3 and keeping just >>> the bucket name & blob id in Cassandra. >>> >>> On Thu, 20 Oct 2016 at 12:03 Vikas Jaiman <er.vikasjai...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','er.vikasjai...@gmail.com');>> wrote: >>> >>>> Hi, >>>> >>>> Normally people would like to store smaller values in Cassandra. Is >>>> there anyone using it to store for larger values (e.g 500KB or more) and if >>>> so what are the issues you are facing . I Would like to know the tweaks >>>> also which you are considering. >>>> >>>> Thanks, >>>> Vikas >>>> >>> -- >>> >>> Justin Cameron >>> >>> Senior Software Engineer | Instaclustr >>> >>> >>> >>> >>> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) >>> and Instaclustr Inc (USA). >>> >>> This email and any attachments may contain confidential and legally >>> privileged information. If you are not the intended recipient, do not copy >>> or disclose its content, but please reply to this email immediately and >>> highlight the error to the sender and then immediately delete the message. >>> >>> >> >> >> -- >> > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: understanding partitions and # of nodes
By "partitions" I assume you refer to "partition keys". Generally, the more partitions keys, the better. Having more partition keys means your data generally is spread out more evenly across the cluster, makes repairs run faster (or so I've heard), makes adding new nodes more smooth, and makes it less likely that you are at hitting tombstone limits. Also, 100 partition keys in a Cassandra table is nothing. If you don't have more partition keys than that, Cassandra might not be the right fit. Cheers, Jens On Wednesday, September 21, 2016, S Ahmed <sahmed1...@gmail.com> wrote: > Hello, > > If you have a 10 node cluster, how does having 10 partitions or 100 > partitions change how cassandra will perform? > > With 10 partitions you will have 1 partition per node. > WIth 100 partitions you will have 10 partitions per node. > > With 100 partitions I guess it helps because when you add more nodes to > your cluster, the data can be redistributed since you have more nodes. > > What else are things to consider? > > Thanks. > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Nodetool repair
On Mon, Sep 19, 2016 at 3:07 PM Alain RODRIGUEZ <arodr...@gmail.com> wrote: ... > - The size of your data > - The number of vnodes > - The compaction throughput > - The streaming throughput > - The hardware available > - The load of the cluster > - ... > I've also heard that the number of clustering keys per partition key could have an impact. Might be worth investigating. Cheers, Jens -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Nodetool repair
Hi Lokesh, Which version of Cassandra are you using? Which compaction strategy are you using? AFAIK, a repair doesn't trigger a major compaction, but I might be wrong here. What you could do is to run a repair for a subset of the ring (see `-st` and `-et` `nodetool repair` parameters). If you repair 1/1000 or the ring, repairing the whole ring will take ~1000 longer than your sample. Also, you might want to look at incremental repairs. If you kill the process in the middle the repair will not start again. You will need to reissue it. Cheers, Jens On Sun, Sep 18, 2016 at 2:58 PM Lokesh Shrivastava < lokesh.shrivast...@gmail.com> wrote: > Hi, > > I tried to run nodetool repair command on one of my keyspaces and found > that it took lot more time than I anticipated. Is there a way to know in > advance the ETA of manual repair before triggering it? I believe repair > performs following operations - > > 1) Major compaction > 2) Exchange of merkle trees with neighbouring nodes. > > Is there any other operation performed during manual repair? What if I > kill the process in the middle? > > Thanks. > Lokesh > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: How Fast Does Information Spread With Gossip?
> Is a minute a reasonable upper bound for most clusters? I have no numbers and I'm sure this differs depending on how large your cluster is. We have a small cluster of around 12 nodes and I statuses generally propagate in under 5 seconds for sure. So, it will definitely be less than 1 minute. Cheers, Jens On Wed, Sep 14, 2016 at 8:49 PM jerome <jeromefroel...@hotmail.com> wrote: > Hi, > > > I was curious if anyone had any kind of statistics or ballpark figures on > how long it takes information to propagate through a cluster with Gossip? > I'm particularly interested in how fast information about the liveness of a > node spreads. For example, in an n-node cluster the median amount of time > it takes for all nodes to learn that a node went down is f(n) seconds. Is a > minute a reasonable upper bound for most clusters? Too high, too low? > > > Thanks, > > Jerome > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Maximum number of columns in a table
l used in rdbms. But I >>>>>>> need rows together to work with them (indexing etc). >>>>>>> >>>>>>> @sfespace >>>>>>> The map is needed when you have a dynamic schema. I don't have a >>>>>>> dynamic schema (may have, and will use the map if I do). I just have >>>>>>> thousands of schemas. One user needs 10 integers, while another user >>>>>>> needs >>>>>>> 20 booleans, and another needs 30 integers, or a combination of them >>>>>>> all. >>>>>>> >>>>>>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan <doanduy...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> "Another possible alternative is to use a single map column" >>>>>>>> >>>>>>>> --> how do you manage the different types then ? Because maps in >>>>>>>> Cassandra are strongly typed >>>>>>>> >>>>>>>> Unless you set the type of map value to blob, in this case you >>>>>>>> might as well store all the object as a single blob column >>>>>>>> >>>>>>>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com < >>>>>>>> sfesc...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Another possible alternative is to use a single map column. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha < >>>>>>>>> dorian.ho...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Since I will only have 1 table with that many columns, and the >>>>>>>>>> other tables will be "normal" tables with max 30 columns, and the >>>>>>>>>> memory of >>>>>>>>>> 2K columns won't be that big, I'm gonna guess I'll be fine. >>>>>>>>>> >>>>>>>>>> The data model is too dynamic, the alternative would be to create >>>>>>>>>> a table for each user which will have even more overhead since the >>>>>>>>>> number >>>>>>>>>> of users is in the several thousands/millions. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan < >>>>>>>>>> doanduy...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> There is no real limit in term of number of columns in a table, >>>>>>>>>>> I would say that the impact of having a lot of columns is the >>>>>>>>>>> amount of >>>>>>>>>>> meta data C* needs to keep in memory for encoding/decoding each row. >>>>>>>>>>> >>>>>>>>>>> Now, if you have a table with 1000+ columns, the problem is >>>>>>>>>>> probably your data model... >>>>>>>>>>> >>>>>>>>>>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha < >>>>>>>>>>> dorian.ho...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Is there alot of overhead with having a big number of columns >>>>>>>>>>>> in a table ? Not unbounded, but say, would 2000 be a problem(I >>>>>>>>>>>> think that's >>>>>>>>>>>> the maximum I'll need) ? >>>>>>>>>>>> >>>>>>>>>>>> Thank You >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> >>> >> > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Is to ok restart DECOMMISION
Also have a look at `nodetool netstats` to check if streaming is progressing or is halted. Cheers, Jens On Fri, Sep 16, 2016 at 3:18 AM Mark Rose <markr...@markrose.ca> wrote: > I've done that several times. Kill the process, restart it, let it > sync, decommission. > > You'll need enough space on the receiving nodes for the full set of > data, on top of the other data that was already sent earlier, plus > room to cleanup/compact it. > > Before you kill, check system.log to see if it died on anything. If > so, the decommission process will never finish. If not, let it > continue. Of particular note is that by default transferring large > sstables will timeout. You can fix that by adjusting > streaming_socket_timeout_in_ms to a sufficiently large value (I set it > to a day). > > -Mark > > On Thu, Sep 15, 2016 at 9:28 AM, laxmikanth sadula > <laxmikanth...@gmail.com> wrote: > > I started decommssioned a node in our cassandra cluster. > > But its taking too long time (more than 12 hrs) , so I would like to > > restart(stop/kill the node & restart 'node decommission' again).. > > > > Does killing node/stopping decommission and restarting decommission will > > cause any issues to cluster? > > > > Using c*-2.0.17 , 2 Data centers, each DC with 3 groups each , each group > > with 3 nodes with RF-3 > > > > -- > > Thanks...! > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: [ANNOUNCEMENT] Website update
Are there equivalent JIRAs for the TODOs somewhere? Jens On Mon, Sep 12, 2016 at 9:58 AM Brice Dutheil <brice.duth...@gmail.com> wrote: > Really nice update ! > > There's still some todos ;) > http://cassandra.apache.org/doc/latest/architecture/storage_engine.html > http://cassandra.apache.org/doc/latest/architecture/guarantees.html > http://cassandra.apache.org/doc/latest/operating/read_repair.html > ... > > > > -- Brice > > On Mon, Sep 12, 2016 at 6:38 AM, Ashish Disawal < > ashish.disa...@evivehealth.com> wrote: > >> Website looks great. >> Good job guys. >> >> -- >> Ashish Disawal >> >> On Mon, Sep 12, 2016 at 3:00 AM, Jens Rantil <jens.ran...@tink.se> wrote: >> >>> Nice! The website also feels snappier! >>> >>> >>> On Friday, July 29, 2016, Sylvain Lebresne <sylv...@datastax.com> wrote: >>> >>>> Wanted to let everyone know that if you go to the Cassandra website >>>> (cassandra.apache.org), you'll notice that there has been some change. >>>> Outside >>>> of a face lift, the main change is a much improved documentation section >>>> (http://cassandra.apache.org/doc/). As indicated, that documentation >>>> is a >>>> work-in-progress and still has a few missing section. That >>>> documentation is >>>> maintained in-tree and contributions (through JIRA as any other >>>> contribution) >>>> is more than welcome. >>>> >>>> Best, >>>> On behalf of the Apache Cassandra developers. >>>> >>> >>> >>> -- >>> Jens Rantil >>> Backend engineer >>> Tink AB >>> >>> Email: jens.ran...@tink.se >>> Phone: +46 708 84 18 32 >>> Web: www.tink.se >>> >>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin >>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> >>> Twitter <https://twitter.com/tink> >>> >>> >> > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Cassandra and Kubernetes and scaling
David, Were you the one who wrote the article? I just finished reading it. It's excellent! I'm also excited that running mutable infrastructure on containers is maturing. I have a few specific questions you (or someone else!) might be able to answer. 1. In the article you state > We deployed 1,009 minion nodes to Google Compute Engine <https://cloud.google.com/compute/> (GCE), spread across 4 zones, running a custom version of the Kubernetes 1.3 beta. Did you deploy a custom Kubernetes on GCE because 1.3 wasn't available? Or was that because Pet Sets alpha feature was disabled on Google Cloud Platform's hosted Kubernetes[1]? [1] http://serverfault.com/q/802437/37237 2. The article stated > Yes we deployed 1,000 pets, but one really did not want to join the party! Do you have any speculation why this happened? By default Cassandra doesn't allow concurrent nodes joining the cluster, but Pet Sets are added serially by definition, right? 3. The article doesn't mention downscaling. Do you have any idea on how that would/could be done? I consider myself a Kubernetes/container noob. It there an equivalent of `readinessProbe` for shutting down containers? Or would an external agent have to be deployed that orchestrates `nodetool decommission`s an instance and then reduces the number of replicas by one for the Pet Set? 4. For a smaller number of Cassandra nodes. Would you feel comfortable running it on Kubernetes 1.3? ;) Cheers, Jens On Monday, September 12, 2016, David Aronchick <aronch...@gmail.com> wrote: > Please let me know if I can help at all! > > On Sun, Sep 11, 2016 at 2:55 PM, Jens Rantil <jens.ran...@tink.se > <javascript:_e(%7B%7D,'cvml','jens.ran...@tink.se');>> wrote: > >> Hi Aiman, >> >> I noticed you never got any reply. This might be of interest: >> http://blog.kubernetes.io/2016/07/thousand-instances-of-cassandra-using- >> kubernetes-pet-set.html >> >> Cheers, >> Jens >> >> On Tuesday, May 24, 2016, Aiman Parvaiz <ai...@flipagram.com >> <javascript:_e(%7B%7D,'cvml','ai...@flipagram.com');>> wrote: >> >>> Looking forward to hearing from the community about this. >>> >>> Sent from my iPhone >>> >>> > On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz <m...@withkash.com> >>> wrote: >>> > >>> > I saw a thread from April 2016 talking about Cassandra and Kubernetes, >>> and have a few follow up questions. It seems that especially after v1.2 of >>> Kubernetes, and the upcoming 1.3 features, this would be a very viable >>> option of running Cassandra on. >>> > >>> > My questions pertain to HostIds and Scaling Up/Down, and are related: >>> > >>> > 1. If a container's host dies and is then brought up on another host, >>> can you start up with the same PersistentVolume as the original container >>> had? Which begs the question would the new container get a new HostId, >>> implying it would need to bootstrap into the environment? If it's a >>> bootstrap, does the old one get deco'd/assassinated? >>> > >>> > 2. Scaling up/down. Scaling up would be relatively easy, as it should >>> just kick off Bootstrapping the node into the cluster, but what if you need >>> to scale down? Would the Container get deco'd by the scaling down process? >>> or just terminated, leaving you with potential missing replicas >>> > >>> > 3. Scaling up and increasing the RF of a particular keyspace, would >>> there be a clean way to do this with the kubernetes tooling? >>> > >>> > In the end I'm wondering how much of the Kubernetes + Cassandra >>> involves nodetool, and how much is just a Docker image where you need to >>> manage that all yourself (painfully) >>> > >>> > -- >>> > --mike >>> >> >> >> -- >> Jens Rantil >> Backend engineer >> Tink AB >> >> Email: jens.ran...@tink.se >> <javascript:_e(%7B%7D,'cvml','jens.ran...@tink.se');> >> Phone: +46 708 84 18 32 >> Web: www.tink.se >> >> Facebook <https://www.facebook.com/#!/tink.se> Linkedin >> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> >> Twitter <https://twitter.com/tink> >> >> > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Cassandra and Kubernetes and scaling
Hi Aiman, I noticed you never got any reply. This might be of interest: http://blog.kubernetes.io/2016/07/thousand-instances-of-cassandra-using-kubernetes-pet-set.html Cheers, Jens On Tuesday, May 24, 2016, Aiman Parvaiz <ai...@flipagram.com> wrote: > Looking forward to hearing from the community about this. > > Sent from my iPhone > > > On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz <m...@withkash.com > <javascript:;>> wrote: > > > > I saw a thread from April 2016 talking about Cassandra and Kubernetes, > and have a few follow up questions. It seems that especially after v1.2 of > Kubernetes, and the upcoming 1.3 features, this would be a very viable > option of running Cassandra on. > > > > My questions pertain to HostIds and Scaling Up/Down, and are related: > > > > 1. If a container's host dies and is then brought up on another host, > can you start up with the same PersistentVolume as the original container > had? Which begs the question would the new container get a new HostId, > implying it would need to bootstrap into the environment? If it's a > bootstrap, does the old one get deco'd/assassinated? > > > > 2. Scaling up/down. Scaling up would be relatively easy, as it should > just kick off Bootstrapping the node into the cluster, but what if you need > to scale down? Would the Container get deco'd by the scaling down process? > or just terminated, leaving you with potential missing replicas > > > > 3. Scaling up and increasing the RF of a particular keyspace, would > there be a clean way to do this with the kubernetes tooling? > > > > In the end I'm wondering how much of the Kubernetes + Cassandra involves > nodetool, and how much is just a Docker image where you need to manage that > all yourself (painfully) > > > > -- > > --mike > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Schema Disagreement vs Nodetool resetlocalschema
Hi Michael, Did you ever get an answer on this? I'm curious to hear for future reference. Thanks, Jens On Monday, June 20, 2016, Michael Fong <michael.f...@ruckuswireless.com> wrote: > Hi, > > > > We have recently encountered several schema disagreement issue while > upgrading Cassandra. In one of the cases, the 2-node cluster idled for over > 30 minutes and their schema remain unsynced. Due to other logic flows, > Cassandra cannot be restarted, and hence we need to come up an alternative > on-the-fly. We are thinking to do a nodetool resetlocalschema to force the > schema synchronization. How safe is this method? Do we need to disable > thrift/gossip protocol before performing this function, and enable them > back after resync completes? > > > > Thanks in advance! > > > > Sincerely, > > > > Michael Fong > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: [ANNOUNCEMENT] Website update
Nice! The website also feels snappier! On Friday, July 29, 2016, Sylvain Lebresne <sylv...@datastax.com> wrote: > Wanted to let everyone know that if you go to the Cassandra website > (cassandra.apache.org), you'll notice that there has been some change. > Outside > of a face lift, the main change is a much improved documentation section > (http://cassandra.apache.org/doc/). As indicated, that documentation is a > work-in-progress and still has a few missing section. That documentation is > maintained in-tree and contributions (through JIRA as any other > contribution) > is more than welcome. > > Best, > On behalf of the Apache Cassandra developers. > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Bootstrapping multiple cassandra nodes simultaneously in existing dc
Yes. `nodetool setstreamthroughput` is your friend. On Sunday, September 11, 2016, sai krishnam raju potturi < pskraj...@gmail.com> wrote: > Make sure there is no spike in the load-avg on the existing nodes, as that > might affect your application read request latencies. > > On Sun, Sep 11, 2016, 17:10 Jens Rantil <jens.ran...@tink.se > <javascript:_e(%7B%7D,'cvml','jens.ran...@tink.se');>> wrote: > >> Hi Bhuvan, >> >> I have done such expansion multiple times and can really recommend >> bootstrapping a new DC and pointing your clients to it. The process is so >> much faster and the documentation you referred to has worked out fine for >> me. >> >> Cheers, >> Jens >> >> >> On Sunday, September 11, 2016, Bhuvan Rawal <bhu1ra...@gmail.com >> <javascript:_e(%7B%7D,'cvml','bhu1ra...@gmail.com');>> wrote: >> >>> Hi, >>> >>> We are running Cassandra 3.6 and want to bump up Cassandra nodes in an >>> existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to >>> leverage more memory instead of m4.2xlarge). Bootstrapping a node would >>> take 7-8 hours. >>> >>> If this activity is performed serially then it will take 5-6 days. I had >>> a look at CASSANDRA-7069 >>> <https://issues.apache.org/jira/browse/CASSANDRA-7069> and a bit of >>> discussion in the past at - http://grokbase.com/t/ >>> cassandra/user/147gcqvybg/adding-more-nodes-into-the-cluster. Wanted to >>> know if the limitation is still applicable and race condition could occur >>> in 3.6 version. >>> >>> If this is not the case can we add a new datacenter as mentioned here >>> opsAddDCToCluster >>> <https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsAddDCToCluster.html> >>> and >>> bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in >>> cassandra.yaml and rebuilding nodes simultaneously in the new dc? >>> >>> >>> Thanks & Regards, >>> Bhuvan >>> >> >> >> -- >> Jens Rantil >> Backend engineer >> Tink AB >> >> Email: jens.ran...@tink.se >> <javascript:_e(%7B%7D,'cvml','jens.ran...@tink.se');> >> Phone: +46 708 84 18 32 >> Web: www.tink.se >> >> Facebook <https://www.facebook.com/#!/tink.se> Linkedin >> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> >> Twitter <https://twitter.com/tink> >> >> -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: large number of pending compactions, sstables steadily increasing
I just want to chime in and say that we also had issues keeping up with compaction once (with vnodes/ssd disks) and I also want to recommend keeping track of your open file limit which might bite you. Cheers, Jens On Friday, August 19, 2016, Mark Rose <markr...@markrose.ca> wrote: > Hi Ezra, > > Are you making frequent changes to your rows (including TTL'ed > values), or mostly inserting new ones? If you're only inserting new > data, it's probable using size-tiered compaction would work better for > you. If you are TTL'ing whole rows, consider date-tiered. > > If leveled compaction is still the best strategy, one way to catch up > with compactions is to have less data per partition -- in other words, > use more machines. Leveled compaction is CPU expensive. You are CPU > bottlenecked currently, or from the other perspective, you have too > much data per node for leveled compaction. > > At this point, compaction is so far behind that you'll likely be > getting high latency if you're reading old rows (since dozens to > hundreds of uncompacted sstables will likely need to be checked for > matching rows). You may be better off with size tiered compaction, > even if it will mean always reading several sstables per read (higher > latency than when leveled can keep up). > > How much data do you have per node? Do you update/insert to/delete > rows? Do you TTL? > > Cheers, > Mark > > On Wed, Aug 17, 2016 at 2:39 PM, Ezra Stuetzel <ezra.stuet...@riskiq.net > <javascript:;>> wrote: > > I have one node in my cluster 2.2.7 (just upgraded from 2.2.6 hoping to > fix > > issue) which seems to be stuck in a weird state -- with a large number of > > pending compactions and sstables. The node is compacting about 500gb/day, > > number of pending compactions is going up at about 50/day. It is at about > > 2300 pending compactions now. I have tried increasing number of > compaction > > threads and the compaction throughput, which doesn't seem to help > eliminate > > the many pending compactions. > > > > I have tried running 'nodetool cleanup' and 'nodetool compact'. The > latter > > has fixed the issue in the past, but most recently I was getting OOM > errors, > > probably due to the large number of sstables. I upgraded to 2.2.7 and am > no > > longer getting OOM errors, but also it does not resolve the issue. I do > see > > this message in the logs: > > > >> INFO [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985 > >> CompactionManager.java:610 - Cannot perform a full major compaction as > >> repaired and unrepaired sstables cannot be compacted together. These > two set > >> of sstables will be compacted separately. > > > > Below are the 'nodetool tablestats' comparing a normal and the > problematic > > node. You can see problematic node has many many more sstables, and they > are > > all in level 1. What is the best way to fix this? Can I just delete those > > sstables somehow then run a repair? > >> > >> Normal node > >>> > >>> keyspace: mykeyspace > >>> > >>> Read Count: 0 > >>> > >>> Read Latency: NaN ms. > >>> > >>> Write Count: 31905656 > >>> > >>> Write Latency: 0.051713177939359714 ms. > >>> > >>> Pending Flushes: 0 > >>> > >>> Table: mytable > >>> > >>> SSTable count: 1908 > >>> > >>> SSTables in each level: [11/4, 20/10, 213/100, 1356/1000, 306, > 0, > >>> 0, 0, 0] > >>> > >>> Space used (live): 301894591442 > >>> > >>> Space used (total): 301894591442 > >>> > >>> > >>> > >>> Problematic node > >>> > >>> Keyspace: mykeyspace > >>> > >>> Read Count: 0 > >>> > >>> Read Latency: NaN ms. > >>> > >>> Write Count: 30520190 > >>> > >>> Write Latency: 0.05171286705620116 ms. > >>> > >>> Pending Flushes: 0 > >>> > >>> Table: mytable > >>> > >>> SSTable count: 14105 > >>> > >>> SSTables in each level: [13039/4, 21/10, 206/100, 831, 0, 0, 0, > >>> 0, 0] > >>> > >>> Space used (live): 561143255289 > >>> > >>> Space used (total): 561143255289 > > > > Thanks, > > > > Ezra > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Bootstrapping multiple cassandra nodes simultaneously in existing dc
Hi Bhuvan, I have done such expansion multiple times and can really recommend bootstrapping a new DC and pointing your clients to it. The process is so much faster and the documentation you referred to has worked out fine for me. Cheers, Jens On Sunday, September 11, 2016, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Hi, > > We are running Cassandra 3.6 and want to bump up Cassandra nodes in an > existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to > leverage more memory instead of m4.2xlarge). Bootstrapping a node would > take 7-8 hours. > > If this activity is performed serially then it will take 5-6 days. I had a > look at CASSANDRA-7069 > <https://issues.apache.org/jira/browse/CASSANDRA-7069> and a bit of > discussion in the past at - http://grokbase.com/t/ > cassandra/user/147gcqvybg/adding-more-nodes-into-the-cluster. Wanted to > know if the limitation is still applicable and race condition could occur > in 3.6 version. > > If this is not the case can we add a new datacenter as mentioned here > opsAddDCToCluster > <https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsAddDCToCluster.html> > and > bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in > cassandra.yaml and rebuilding nodes simultaneously in the new dc? > > > Thanks & Regards, > Bhuvan > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Isolation in case of Single Partition Writes and Batching with LWT
Hi, This might be off-topic, but you could always use Zookeeper locking and/or Apache Kafka topic keys for doing things like this. Cheers, Jens On Tuesday, September 6, 2016, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Hi, > > We are working to solve on a multi threaded distributed design which in > which a thread reads current state from Cassandra (Single partition ~ 20 > Rows), does some computation and saves it back in. But it needs to be > ensured that in between reading and writing by that thread any other thread > should not have saved any operation on that partition. > > We have thought of a solution for the same - *having a write_time column* > in the schema and making it static. Every time the thread picks up a job > read will be performed with LOCAL_QUORUM. While writing into Cassandra > batch will contain a LWT (IF write_time is read time) otherwise read will > be performed and computation will be done again and so on. This will ensure > that while saving partition is in a state it was read from. > > In order to avoid race condition we need to ensure couple of things: > > 1. While saving data in a batch with a single partition (*Rows may be > Updates, Deletes, Inserts)* are they Isolated per replica node. (Not > necessarily on a cluster as a whole). Is there a possibility of client > reading partial rows? > > 2. If we do a LOCAL_QUORUM read and LOCAL_QUORUM writes in this case could > there a chance of inconsistency in this case (When LWT is being used in > batches). > > 3. Is it possible to use multiple LWT in a single Batch? In general how > does LWT performs with Batch and is Paxos acted on before batch execution? > > Can someone help us with this? > > Thanks & Regards, > Bhuvan > > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
Re: Finding records that exist on Cassandra but not externally
Hi again Chris, Another option would be to have a look at using a Merkle Tree to quickly drill down to the differences. This is actually what Cassandra uses internally when running a repair between different nodes. Cheers, Jens On Wed, Sep 7, 2016 at 9:47 AM <ch...@cmartinit.co.uk> wrote: > First off I hope this appropriate here- I couldn't decide whether this was > a question for Cassandra users or spark users so if you think it's in the > wiring place feel free to redirect me. > > I have a system that does a load of data manipulation using spark. The > output of this program is a effectively the new state that I want my > Cassandra table to be in and the final step is to update Cassandra so that > it matches this state. > > At present I'm currently inserting all rows in my generated state into > Cassandra. This works for new rows and also for updating existing rows but > doesn't of course delete any rows that were already in Cassandra but not in > my new state. > > The problem I have now is how best to delete these missing rows. Options I > have considered are: > > 1. Setting a ttl on inserts which is roughly the same as my data refresh > period. This would probably be pretty performant but I really don't want to > do this because it would mean that all data in my database would disappear > if I had issues running my refresh task! > > 2. Every time I refresh the data I would first have to fetch all primary > keys from Cassandra and, compare them to primary keys locally to create a > list of pks to delete before the insert. This seems the most logicaly > correct option but is going to result in reading vast amounts of data from > Cassandra. > > 3. Truncating the entire table before refreshing Cassandra. This has the > benefit of being pretty simple in code but I'm not sure of the performance > implications of this and what will happen if I truncate while a node is > offline. > > For reference the table is on the order of 10s of millions of rows and for > any data refresh only a very small fraction (<.1%) will actually need > deleting. 99% of the time I'll just be overwriting existing keys. > > I'd be grateful if anyone could shed some advice on the best solution here > or whether there's some better way I haven't thought of. > > Thanks, > > Chris > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Finding records that exist on Cassandra but not externally
Hi Chris, Without fully knowing your usecase; You can't keep track of which keys have changed in the external system somehow? Otherwise 2) sounds like the way to go to me. Cheers, Jens On Wed, Sep 7, 2016 at 9:47 AM <ch...@cmartinit.co.uk> wrote: > First off I hope this appropriate here- I couldn't decide whether this was > a question for Cassandra users or spark users so if you think it's in the > wiring place feel free to redirect me. > > I have a system that does a load of data manipulation using spark. The > output of this program is a effectively the new state that I want my > Cassandra table to be in and the final step is to update Cassandra so that > it matches this state. > > At present I'm currently inserting all rows in my generated state into > Cassandra. This works for new rows and also for updating existing rows but > doesn't of course delete any rows that were already in Cassandra but not in > my new state. > > The problem I have now is how best to delete these missing rows. Options I > have considered are: > > 1. Setting a ttl on inserts which is roughly the same as my data refresh > period. This would probably be pretty performant but I really don't want to > do this because it would mean that all data in my database would disappear > if I had issues running my refresh task! > > 2. Every time I refresh the data I would first have to fetch all primary > keys from Cassandra and, compare them to primary keys locally to create a > list of pks to delete before the insert. This seems the most logicaly > correct option but is going to result in reading vast amounts of data from > Cassandra. > > 3. Truncating the entire table before refreshing Cassandra. This has the > benefit of being pretty simple in code but I'm not sure of the performance > implications of this and what will happen if I truncate while a node is > offline. > > For reference the table is on the order of 10s of millions of rows and for > any data refresh only a very small fraction (<.1%) will actually need > deleting. 99% of the time I'll just be overwriting existing keys. > > I'd be grateful if anyone could shed some advice on the best solution here > or whether there's some better way I haven't thought of. > > Thanks, > > Chris > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Ring connection timeouts with 2.2.6
Hi, Could it be garbage collection occurring on nodes that are more heavily loaded? Cheers, Jens Den sön 26 juni 2016 05:22Mike Heffner <m...@librato.com> skrev: > One thing to add, if we do a rolling restart of the ring the timeouts > disappear entirely for several hours and performance returns to normal. > It's as if something is leaking over time, but we haven't seen any > noticeable change in heap. > > On Thu, Jun 23, 2016 at 10:38 AM, Mike Heffner <m...@librato.com> wrote: > >> Hi, >> >> We have a 12 node 2.2.6 ring running in AWS, single DC with RF=3, that is >> sitting at <25% CPU, doing mostly writes, and not showing any particular >> long GC times/pauses. By all observed metrics the ring is healthy and >> performing well. >> >> However, we are noticing a pretty consistent number of connection >> timeouts coming from the messaging service between various pairs of nodes >> in the ring. The "Connection.TotalTimeouts" meter metric show 100k's of >> timeouts per minute, usually between two pairs of nodes for several hours >> at a time. It seems to occur for several hours at a time, then may stop or >> move to other pairs of nodes in the ring. The metric >> "Connection.SmallMessageDroppedTasks." will also grow for one pair of >> the nodes in the TotalTimeouts metric. >> >> Looking at the debug log typically shows a large number of messages like >> the following on one of the nodes: >> >> StorageProxy.java:1033 - Skipped writing hint for /172.26.33.177 (ttl 0) >> >> We have cross node timeouts enabled, but ntp is running on all nodes and >> no node appears to have time drift. >> >> The network appears to be fine between nodes, with iperf tests showing >> that we have a lot of headroom. >> >> Any thoughts on what to look for? Can we increase thread count/pool sizes >> for the messaging service? >> >> Thanks, >> >> Mike >> >> -- >> >> Mike Heffner <m...@librato.com> >> Librato, Inc. >> >> > > > -- > > Mike Heffner <m...@librato.com> > Librato, Inc. > > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: some questions
You forgot FROM in your CQL query. Jens Den sön 26 juni 2016 08:30lowping <lowp...@163.com> skrev: > Hi : > > > question 1: > > I got a error about this cql, have you fix it already ??? > select collection_type where id in (‘a’,’b’) > > question 2: > > I want use UDF in update, but this cql can’t execute. have some advise??? > > update table_name set field=my_function(field) where … > > > tnk u so much > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Multi DC setup question
I'm AFK, but you might be able to query the system.peers table to see which nodes are up. Cheers, Jens Den tis 28 juni 2016 06:44Charulata Sharma (charshar) <chars...@cisco.com> skrev: > Hi All, > >We are setting up another Data Center and have the following > question: > > 6 nodes in each DC Cassandra cluster. > > All key spaces have an RF of 3 > > *Our scenario is * > > > > Apps node connect to Cassandra cluster using LOCAL_QUORUM consistency. > > > > We want to ensure that If 5 nodes out of the 6 are available then > application enters the primary DC else the application URL be directed to > another DC. > > > > What is the best option to achieve this?? > > > > Thanks, > > Charu > > > > > > > > > > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Motivation for a DHT ring
Some reasons I can come up with: - it would be hard to have tunable read/consistencies/replicas when interfacing with a file system. - data locality support would require strong coupling to the distributed file system interface (if at all possible given that certain sstables should live on the same data node). - operator complexity both administering a distributed file system as well as a Cassandra cluster. This was a personal reason why I chose Cassandra instead of HBase for a project. Cheers, Jens Den ons 29 juni 2016 13:01jean paul <researche...@gmail.com> skrev: > > > 2016-06-28 22:29 GMT+01:00 jean paul <researche...@gmail.com>: > >> Hi all, >> >> Please, What is the motivation for choosing a DHT ring in cassandra? Why >> not use a normal parallel or distributed file system that supports >> replication? >> >> Thank you so much for clarification. >> >> Kind regards. >> > > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: tuning repairs and compaction options
Hi Reik, You could always throttle your repair by running smaller chunks of the repair. See https://github.com/BrianGallew/cassandra_range_repair. Regarding the compaction, you can always change the compactionthroughput using `nodetool setcompactionthroughput`. Hope this helps, Jens On Fri, May 6, 2016 at 9:47 AM Reik Schatz <reik.sch...@gmail.com> wrote: > Hi, we are running a 9 node cluster under load. The nodes are running in > EC2 on i2.2xlarge instances. Cassandra version is 2.2.4. One node was down > yesterday for more than 3 hours. So we manually started an incremental > repair this morning via nodetool (anti-entropy repair?) > > What we can see is that user CPU on that node goes up to over 95% and also > goes up on all other nodes. Also the number of SSTables is exploding, I > guess due to anticompaction. > > What are my tuning options to have a more gentle repair behaviour? Which > settings should I look at if I want CPU to stay below 50% for instance. My > worry is always to impact the read/write performance during times when we > do anti-entropy repairs. > > Cheers, > Reik > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Alternative approach to setting up new DC
Hi, I never got any response here, but just wanted to share that I went to a Cassandra meet-up in Stockholm yesterday where I talked to two knowledgable Cassandra people that verified that the approach below should work. The most important thing is that the backup must be fully imported before gc_grace_seconds after when the backup is taken. As of me, I managed to a get a more stable VPN setup and did not have to go down this path. Cheers, Jens On Mon, Apr 18, 2016 at 10:15 AM Jens Rantil <jens.ran...@tink.se> wrote: > Hi, > > I am provisioning a new datacenter for an existing cluster. A rather shaky > VPN connection is hindering me from making a "nodetool rebuild" bootstrap > on the new DC. Interestingly, I have a full fresh database snapshot/backup > at the same location as the new DC (transferred outside of the VPN). I am > now considering the following approach: > >1. Make sure my clients are using the old DC. >2. Provision the new nodes in new DC. >3. ALTER the keyspace to enable replicas on the new DC. This will >start replicating all writes from old DC to new DC. >4. Before gc_grace_seconds after operation 3) above, use sstableloader >to stream my backup to the new nodes. >5. For safety precaution, do a full repair. > > Could you see any issues with doing this? > > Cheers, > Jens > -- > > Jens Rantil > Backend Developer @ Tink > > Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden > For urgent matters you can reach me at +46-708-84 18 32. > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: When are hints written?
Hi again Bo, I assume this is the piece of documentation you are referring to? http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html?scroll=concept_ds_ifg_jqx_zj__performance > If a replica node is overloaded or unavailable, and the failure detector has not yet marked it down, then expect most or all writes to that node to fail after the timeout triggered by write_request_timeout_in_ms, <http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__write_request_timeout_in_ms> which defaults to 10 seconds. During that time, Cassandra writes the hint when the timeout is reached. I'm not an expert on this, but the way I've seen is that hints are written stored as soon as there is _any_ issues writing a mutation (insert/update/delete) to a node. By "issue", that essentially means that a node hasn't acknowledged back to the coordinator that the write succeeded within write_request_timeout_in_ms. This includes TCP/socket timeouts, connection issues or that the node is down. The hints are stored for a maximum timespan defaulting to 3 hours. Cheers, Jens On Thu, Apr 21, 2016 at 8:06 AM Bo Finnerup Madsen <bo.gunder...@gmail.com> wrote: > Hi Jens, > > Thank you for the tip! > ALL would definitely cure our hints issue, but as you note, it is not > optimal as we are unable to take down nodes without clients failing. > > I am most probably overlooking something in the documentation, but I > cannot see any description of when hints are written other than when a node > is marked as being down. And since none of our nodes have been marked as > being down (at least according to the logs), I suspect that there is some > timeout that governs when hints are written? > > Regarding your other post: Yes, 3.0.3 is pretty new. But we are new to > this cassandra game, and our schema-fu is not strong enough for us to > create a schema without using materialized views :) > > > ons. 20. apr. 2016 kl. 17.09 skrev Jens Rantil <jens.ran...@tink.se>: > >> Hi Bo, >> >> > In our case, I would like for the cluster to wait for the write to be >> persisted on the relevant nodes before returning an ok to the client. >> But I don't know which knobs to turn to accomplish this? or if it is even >> possible :) >> >> This is what write consistency option is for. Have a look at >> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html. >> Note, however that if you use ALL, your clients will fail (throw exception, >> depending on language) as soon as a single partition can't be written. This >> means you can't do online maintenance of a Cassandra node (such as >> upgrading it etc.) without experiencing write issues. >> >> Cheers, >> Jens >> >> On Wed, Apr 20, 2016 at 3:39 PM Bo Finnerup Madsen < >> bo.gunder...@gmail.com> wrote: >> >>> Hi, >>> >>> We have a small 5 node cluster of m4.xlarge clients that receives writes >>> from ~20 clients. The clients will write as fast as they can, and the whole >>> process is limited by the write performance of the cassandra cluster. >>> After we have tweaked our schema to avoid large partitions, the load is >>> going ok and we don't see any warnings or errors in the cassandra logs. But >>> we do see quite a lot of hint handoff activity. During the load, the >>> cassandra nodes are quite loaded, with linux reporting a load as high as 20. >>> >>> I have read the available documentation on how hints works, and to my >>> understanding hints should only be written if a node is down. But as far as >>> I can see, none of the nodes are marked as down during the load. So I >>> suspect I am missing something :) >>> We have configured the servers with write_request_timeout_in_ms: 12 >>> and the clients with a timeout of 13, but still get hints stored. >>> >>> In our case, I would like for the cluster to wait for the write to be >>> persisted on the relevant nodes before returning an ok to the client. But I >>> don't know which knobs to turn to accomplish this? or if it is even >>> possible :) >>> >>> We are running cassandra 3.0.3, with 8Gb heap and a replication factor >>> of 3. >>> >>> Thank you in advance! >>> >>> Yours sincerely, >>> Bo Madsen >>> >> -- >> >> Jens Rantil >> Backend Developer @ Tink >> >> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden >> For urgent matters you can reach me at +46-708-84 18 32. >> > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: When are hints written?
Hi Bo, > In our case, I would like for the cluster to wait for the write to be persisted on the relevant nodes before returning an ok to the client. But I don't know which knobs to turn to accomplish this? or if it is even possible :) This is what write consistency option is for. Have a look at https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html. Note, however that if you use ALL, your clients will fail (throw exception, depending on language) as soon as a single partition can't be written. This means you can't do online maintenance of a Cassandra node (such as upgrading it etc.) without experiencing write issues. Cheers, Jens On Wed, Apr 20, 2016 at 3:39 PM Bo Finnerup Madsen <bo.gunder...@gmail.com> wrote: > Hi, > > We have a small 5 node cluster of m4.xlarge clients that receives writes > from ~20 clients. The clients will write as fast as they can, and the whole > process is limited by the write performance of the cassandra cluster. > After we have tweaked our schema to avoid large partitions, the load is > going ok and we don't see any warnings or errors in the cassandra logs. But > we do see quite a lot of hint handoff activity. During the load, the > cassandra nodes are quite loaded, with linux reporting a load as high as 20. > > I have read the available documentation on how hints works, and to my > understanding hints should only be written if a node is down. But as far as > I can see, none of the nodes are marked as down during the load. So I > suspect I am missing something :) > We have configured the servers with write_request_timeout_in_ms: 12 > and the clients with a timeout of 13, but still get hints stored. > > In our case, I would like for the cluster to wait for the write to be > persisted on the relevant nodes before returning an ok to the client. But I > don't know which knobs to turn to accomplish this? or if it is even > possible :) > > We are running cassandra 3.0.3, with 8Gb heap and a replication factor of > 3. > > Thank you in advance! > > Yours sincerely, > Bo Madsen > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Alternative approach to setting up new DC
Hi, I am provisioning a new datacenter for an existing cluster. A rather shaky VPN connection is hindering me from making a "nodetool rebuild" bootstrap on the new DC. Interestingly, I have a full fresh database snapshot/backup at the same location as the new DC (transferred outside of the VPN). I am now considering the following approach: 1. Make sure my clients are using the old DC. 2. Provision the new nodes in new DC. 3. ALTER the keyspace to enable replicas on the new DC. This will start replicating all writes from old DC to new DC. 4. Before gc_grace_seconds after operation 3) above, use sstableloader to stream my backup to the new nodes. 5. For safety precaution, do a full repair. Could you see any issues with doing this? Cheers, Jens -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Hanging pending compactions
Hi, After executing `nodetool cleanup` on some nodes they are all showing lots (123, 97, 64) of pending compaction tasks, but not a single active task. I'm running Cassandra 2.0.14 with Leveled Compaction Strategy on most of our tables. Anyone experienced this before? Also, is there any way for me to extract debugging information to file a bug report before restarting the nodes? Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Consistent reads and first write wins
Hi John, The general answer: Each cell in a CQL table has a corresponding timestamp which is taken from the clock on the Cassandra node that orchestrates the write. When you are reading from a Cassandra cluster the node that coordinates the read will compare the timestamps of the values it fetches. Last write(=highest timestamp) wins and will be returned to the client. As you may now understand, the above is why it is crucial you NTP sync your Cassandra nodes. If time_uuid_1 comes before time_uuid_2 and if both clients follow up the writes with quorum reads, then will both clients see the value 'bar' for prop1? As you might have understood by now, the values of your timeuuid aren't really relevant here - the timestamp transparently taken from the clock of the coordinating node is. This is because you could supply your own timeuuid from the client, which might have a differing clock. However, it will basically correspond to the timestamp if you use the helper function `now()` in CQL. Anyway, if you make a quorum write (that succeeds) and then make a successful quorum read, you can be 100% that you will get the latest value. Are there situations in which clients might see different values? I can see three scenarios where that could happen: 1. If you write with a weaker consistency such as ONE and read quorum. 2. If you write with quorum and read with a weaker consistency such as ONE. 3. If you make a quorum write that fails. That write might still have been applied to some node. Cassandra does not guarantee atomic writes (that is, either applied or not at all). In other words a failed write will not roll back partially applied writes in any way. Cheers, Jens On Wed, Jul 8, 2015 at 3:35 AM, John Sanda john.sa...@gmail.com wrote: Suppose I have the following schema, CREATE TABLE foo ( id text, time timeuuid, prop1 text, PRIMARY KEY (id, time) ) WITHCLUSTERING ORDER BY (time ASC); And I have two clients who execute quorum writes, e.g., // client 1 INSERT INTO FOO (id, time, prop1) VALUES ('test', time_uuid_1, 'bar'); // client 2 INSERT INTO FOO (id, time, prop1) VALUES ('test', time_uuid_2, 'bam'); If time_uuid_1 comes before time_uuid_2 and if both clients follow up the writes with quorum reads, then will both clients see the value 'bar' for prop1? Are there situations in which clients might see different values? -- - John -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
RE: nodetool repair
Hi, For the record I've succesfully used https://github.com/BrianGallew/cassandra_range_repair to make smooth repairing. Could maybe also be of interest don't know... Cheers, Jens – Skickat från Mailbox On Fri, Jun 19, 2015 at 8:36 PM, null sean_r_dur...@homedepot.com wrote: It seems to me that running repair on any given node may also induce repairs to related replica nodes. For example, if I run repair on node A and node B has some replicas, data might stream from A to B (assuming A has newer/more data). Now, that does NOT mean that node B will be fully repaired. You still need to run repair -pr on all nodes before gc_grace_seconds. You can run repairs on multiple nodes at the same time. However, you might end up with a large amount of streaming, if many repairs are needed. So, you should be aware of a performance impact. I run weekly repairs on one node at a time, if possible. On, larger rings, though, I run repairs on multiple nodes staggered by a few hours. Once your routine maintenance is established, repairs will not run for very long. But, if you have a large ring that hasn’t been repaired, those first repairs may take days (but should get faster as you get further through the ring). Sean Durity From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] Sent: Friday, June 19, 2015 3:56 AM To: user@cassandra.apache.org Subject: Re: nodetool repair Hi, This is not necessarily true. Repair will induce compactions only if you have entropy in your cluster. If not it will just read your data to compare all the replica of each piece of data (using indeed cpu and disk IO). If there is some data missing it will repair it. Though, due to merkle tree size, you will generally stream more data than just the data needed. To limit this downside and the compactions amount, use range repairs -- http://www.datastax.com/dev/blog/advanced-repair-techniques. About tombstones, they will be evicted only after gc_grace_period and only if all the parts of the row are part of the compaction. C*heers, Alain 2015-06-19 9:08 GMT+02:00 arun sirimalla arunsi...@gmail.commailto:arunsi...@gmail.com: Yes compactions will remove tombstones On Thu, Jun 18, 2015 at 11:46 PM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Perfect thank you. So making a weekly nodetool repair -pr” on all nodes one after the other will repair my cluster. That is great. If it does a compaction, does it mean that it would also clean up my tombstone from my LeveledCompactionStrategy tables at the same time? Thanks for your help. On 19 Jun 2015, at 07:56 , arun sirimalla arunsi...@gmail.commailto:arunsi...@gmail.com wrote: Hi Jean, Running nodetool repair on a node will repair only that node in the cluster. It is recommended to run nodetool repair on one node at a time. Few things to keep in mind while running repair 1. Running repair will trigger compactions 2. Increase in CPU utilization. Run node tool repair with -pr option, so that it will repair only the range that node is responsible for. On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Thanks Jonathan. But I need to know the following: If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command? If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node? Kind regards On 18 Jun 2015, at 19:19 , Jonathan Haddad j...@jonhaddad.commailto:j...@jonhaddad.com wrote: If you're using DSE, you can schedule it automatically using the repair service. If you're open source, check out Spotify cassandra reaper, it'll manage it for you. https://github.com/spotify/cassandra-reaper On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Hi, I want to make on a regular base repairs on my cluster as suggested by the documentation. I want to do this in a way that the cluster is still responding to read requests. So I understand that I should not use the -par switch for that as it will do the repair in parallel and consume all available resources. If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command? If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node? If we had down time periods I would issue a nodetool -par, but we don’t have down time periods. Sorry for the stupid questions. Thanks for your help. -- Arun Senior Hadoop/Cassandra Engineer Cloudwick 2014 Data Impact Award Winner (Cloudera)
Re: Question regarding concurrent bootstrapping
Rob, Thanks for a great answer. While I'm at it, thanks for all the time you put into answering people on this mailing list. I'm sure I'm not the only appreciating it. Cheers, Jens – Skickat från Mailbox On Sat, Jun 13, 2015 at 12:37 AM, Robert Coli rc...@eventbrite.com wrote: On Fri, Jun 12, 2015 at 5:21 AM, Jens Rantil jens.ran...@tink.se wrote: Let's say I have an existing cluster and do the following: 1. I start a new joining node (A). It enters state Up/Joining. Streaming automatically start to this node. 2. I wait two minutes (best practise for bootstrapping). 3. I start a second node (B) to join the cluster. It allocates some of A:s previous parts of the ring and enters state Up/Joining. Streaming automatically starts to this node. Will streaming of data that A is no longer responsible (after B joined) stop immediately? That is, after (3), will data streamed to A only be what it is responsible of? It depends on the version of Cassandra. A will get data it shouldn't get in any version that doesn't contain CASSANDRA-2434 patch. If you do not run cleanup on A when A is done bootstrapping In a version containing 2434, the attempt to bootstrap B will fail and will not work until A is done bootstrapping, unless you set the property -Dcassandra.consistent.rangemovement=false while starting it. In general, one DOES NOT WANT TO SET -Dcassandra.consistent.rangemovement! It fixes 2434, and 2434 is bad for consistency. Instead, considering expanding clusters to initial size when they are empty, and disabling bootstrapping while doing so. Lots and lots of background on : https://issues.apache.org/jira/browse/CASSANDRA-2434 Related ticket : https://issues.apache.org/jira/browse/CASSANDRA-7069 =Rob PS - BTW, the fact that 2434 existed for so long, in versions where repair was often broken/unused, is the strongest single item of information in support of the Coli Conjecture...
Question about nodetool status ... output
Hi, I have one node in my 5-node cluster that effectively owns 100% and it looks like my cluster is rather imbalanced. Is it common to have it this imbalanced for 4-5 nodes? My current output for a keyspace is: $ nodetool status myks Datacenter: Cassandra = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN X.X.X.33 203.92 GB 256 41.3% 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0 RAC1 UN X.X.X.32 200.44 GB 256 34.2% d7cacd89-8613-4de5-8a5e-a2c53c41ea45 RAC1 UN X.X.X.51 197.17 GB 256 100.0% 344b0adf-2b5d-47c8-8881-9a3f56be6f3b RAC1 UN X.X.X.52 113.63 GB 1 46.3% 55daa807-af49-44c5-9742-fe456df621a1 RAC1 UN X.X.X.31 204.49 GB 256 78.3% 48cb0782-6c9a-4805-9330-38e192b6b680 RAC1 My keyspace has RF=3 and originally I added X.X.X.52 (num_tokens=1 was a mistake) and then X.X.X.51. I haven't executed `nodetool cleanup` on any nodes yet. For the curious, the full ring can be found here: https://gist.github.com/JensRantil/57ee515e647e2f154779 Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Question regarding concurrent bootstrapping
Hi, Let's say I have an existing cluster and do the following: 1. I start a new joining node (A). It enters state Up/Joining. Streaming automatically start to this node. 2. I wait two minutes (best practise for bootstrapping). 3. I start a second node (B) to join the cluster. It allocates some of A:s previous parts of the ring and enters state Up/Joining. Streaming automatically starts to this node. Will streaming of data that A is no longer responsible (after B joined) stop immediately? That is, after (3), will data streamed to A only be what it is responsible of? This is of importance for planning when one it expanding a cluster to multiple smaller nodes. Thanks, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Question about nodetool status ... output
Hi Carlos, Yes, I should have been more specific about that; basically all my primary ID:s are random UUIDs so I find that very hard to believe that my data model should be the problem here. I will run a full repair of the cluster, execute a cleanup and recommission the node, then. Thanks, Jens On Fri, Jun 12, 2015 at 2:38 PM, Carlos Rolo r...@pythian.com wrote: Your data model also contributes to the balance (or lack of) of the cluster. If you have a really bad data partitioning Cassandra will not do any magic. Regarding that cluster, I would decommission the x.52 node and add it again with the correct configuration. After the bootstrap, run a cleanup. If is still that off-balance, you need to look into your data model. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Fri, Jun 12, 2015 at 11:58 AM, Jens Rantil jens.ran...@tink.se wrote: Hi, I have one node in my 5-node cluster that effectively owns 100% and it looks like my cluster is rather imbalanced. Is it common to have it this imbalanced for 4-5 nodes? My current output for a keyspace is: $ nodetool status myks Datacenter: Cassandra = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN X.X.X.33 203.92 GB 256 41.3% 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0 RAC1 UN X.X.X.32 200.44 GB 256 34.2% d7cacd89-8613-4de5-8a5e-a2c53c41ea45 RAC1 UN X.X.X.51 197.17 GB 256 100.0% 344b0adf-2b5d-47c8-8881-9a3f56be6f3b RAC1 UN X.X.X.52 113.63 GB 1 46.3% 55daa807-af49-44c5-9742-fe456df621a1 RAC1 UN X.X.X.31 204.49 GB 256 78.3% 48cb0782-6c9a-4805-9330-38e192b6b680 RAC1 My keyspace has RF=3 and originally I added X.X.X.52 (num_tokens=1 was a mistake) and then X.X.X.51. I haven't executed `nodetool cleanup` on any nodes yet. For the curious, the full ring can be found here: https://gist.github.com/JensRantil/57ee515e647e2f154779 Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink -- -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Hbase vs Cassandra
8) Hbase can do range scans, and one can attack many problems with range scans. Cassandra can't do range scans. 9) HBase is a distributed, consistent, sorted key value store. The sorted bit allows for range scans in addition to the point gets that all K/V stores support. Nothing more, nothing less. It happens to store its data in HDFS by default, and we provide convenient input and output formats for map reduce. *Neutral:* 1) http://khangaonkar.blogspot.com/2013/09/cassandra-vs-hbase-which-nosql-store-do.html 2) The fundamental differences that come to mind are: * HBase is always consistent. Machine outages lead to inability to read or write data on that machine. With Cassandra you can always write. * Cassandra defaults to a random partitioner, so range scans are not possible (by default) * HBase has a range partitioner (if you don't want that the client has to prefix the rowkey with a prefix of a hash of the rowkey). The main feature that set HBase apart are range scans. * HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc. You can map reduce directly into HFiles and map those into HBase instantly. * Cassandra has a dedicated company supporting (and promoting) it. * Getting started is easier with Cassandra. For HBase you need to run HDFS and Zookeeper, etc. * I've heard lots of anecdotes about Cassandra working nicely with small cluster ( 50 nodes) and quick degenerating above that. * HBase does not have a query language (but you can use Phoenix for full SQL support) * HBase does not have secondary indexes (having an eventually consistent index, similar to what Cassandra has, is easy in HBase, but making it as consistent as the rest of HBase is hard) Thanks Ajay On May 29, 2015, at 12:09 PM, Ajay ajay.ga...@gmail.com wrote: Hi, I need some info on Hbase vs Cassandra as a data store (in general plus specific to time series data). The comparison in the following helps: 1: features 2: deployment and monitoring 3: performance 4: anything else Thanks Ajay -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Hbase vs Cassandra
On Mon, Jun 8, 2015 at 11:16 AM, Ajay ajay.ga...@gmail.com wrote: If I understand correctly, you mean when we write with QUORUM and Cassandra writes to few machines and fails to write to few machines and throws exception if it doesn't satisfy QUORUM, leaving it inconsistent and doesn't rollback?. Yes. /Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Decommission datacenter - repair?
Ah, that explains things. Thanks! On Fri, Jun 5, 2015 at 10:59 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Jun 5, 2015 at 5:15 AM, Jens Rantil jens.ran...@tink.se wrote: Datastax's documentation on Decommissioning a data center http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_decomission_dc_t.html tells me to run a full repair and then decommission each node. Isn't decommissioning going to hand over all data anyway? Then why is the repair necessary? In step 3 of those instructions you reduce the number of replicas in the departing DC to 0. The departing DC no longer owns ranges at this point, and no longer is responsible for replicas. It therefore does no streaming (except maybe hints?) when you decommission nodes. =Rob -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Newly added node getting more data than expected
Hi again, I should also point out that `nodetool ring ...` only has one entry for X.X.X.4 and that that token range is equally large as the other token ranges for the virtual nodes. Let me know if you need any more information from me. Cheers, Jens On Sun, Jun 7, 2015 at 11:19 PM, Jens Rantil jens.ran...@tink.se wrote: Hi, I had a 3-node (à 256 vnodes each) cluster with RF=3. I mistakenly added a fourth node with num_tokens: 1 (that is, one vnode). I've always seen number of vnodes to be proportional to the amount of data a node would receive. Therefor, I was expecting the node to receive something like 1/(1+3*256) of the cluster's data. However, this is not the case: $ nodetool status mydatacenter Datacenter: Cassandra = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN X.X.X.2 200.42 GB 256 87.6% 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0 RAC1 UN X.X.X.3 198.03 GB 256 53.7% d7cacd89-8613-4de5-8a5e-a2c53c41ea45 RAC1 UN X.X.X.4 110.57 GB 1 58.7% 55daa807-af49-44c5-9742-fe456df621a1 RAC1 UN X.X.X.5 199.81 GB 256 100.0% 48cb0782-6c9a-4805-9330-38e192b6b680 RAC1 The new node added is X.X.X.4. Note that I haven't executed `nodetool cleanup` on the old nodes yet. Additional information: * I am using GossipingPropertyFileSnitch. All nodes are the same datacenter and rack. * There are no pending compactions on the node. Could anyone explain to me my new node is receiving more data than expected? Does this have to do with the way the GossipingPropertyFileSnitch decides where to put secondary/tertiary replicas (ie. always next physical node in ring)? Do I need to execute `nodetool cleanup` also on newly commissioned nodes? Thanks, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Newly added node getting more data than expected
Hi, I had a 3-node (à 256 vnodes each) cluster with RF=3. I mistakenly added a fourth node with num_tokens: 1 (that is, one vnode). I've always seen number of vnodes to be proportional to the amount of data a node would receive. Therefor, I was expecting the node to receive something like 1/(1+3*256) of the cluster's data. However, this is not the case: $ nodetool status mydatacenter Datacenter: Cassandra = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN X.X.X.2 200.42 GB 256 87.6% 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0 RAC1 UN X.X.X.3 198.03 GB 256 53.7% d7cacd89-8613-4de5-8a5e-a2c53c41ea45 RAC1 UN X.X.X.4 110.57 GB 1 58.7% 55daa807-af49-44c5-9742-fe456df621a1 RAC1 UN X.X.X.5 199.81 GB 256 100.0% 48cb0782-6c9a-4805-9330-38e192b6b680 RAC1 The new node added is X.X.X.4. Note that I haven't executed `nodetool cleanup` on the old nodes yet. Additional information: * I am using GossipingPropertyFileSnitch. All nodes are the same datacenter and rack. * There are no pending compactions on the node. Could anyone explain to me my new node is receiving more data than expected? Does this have to do with the way the GossipingPropertyFileSnitch decides where to put secondary/tertiary replicas (ie. always next physical node in ring)? Do I need to execute `nodetool cleanup` also on newly commissioned nodes? Thanks, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Decommission datacenter - repair?
Hi, I asked this on IRC earlier today, but didn't get any response; Datastax's documentation on Decommissioning a data center http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_decomission_dc_t.html tells me to run a full repair and then decommission each node. Isn't decommissioning going to hand over all data anyway? Then why is the repair necessary? Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Decommission datacenter - repair?
Hi Kiran, So, am I understanding you correctly that a decommissioning node only will hand over its data to a single node? If it would hand it over to all other replica nodes, I see that essentially as an implicit repair. Am I wrong? Thanks, Jens On Fri, Jun 5, 2015 at 2:27 PM, Kiran mk coolkiran2...@gmail.com wrote: Hi Jens, If you decommission a data center, The data residing in the Data Center which you are planning for decommission has to be balanced to the nodes of the other data center satisfying RF. Hence Repair is required. Best Regards, Kiran.M.K. On Fri, Jun 5, 2015 at 5:45 PM, Jens Rantil jens.ran...@tink.se wrote: Hi, I asked this on IRC earlier today, but didn't get any response; Datastax's documentation on Decommissioning a data center http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_decomission_dc_t.html tells me to run a full repair and then decommission each node. Isn't decommissioning going to hand over all data anyway? Then why is the repair necessary? Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink -- Best Regards, Kiran.M.K. -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Concurrent schema creation/change strategy
Hi, Generally it can take a couple of seconds before a schema change has propagated to all nodes. The schema will in most cases converge, but as far as I've understood, concurrent schema changes are considered a bad practise and can lead to inconsistent schemas down the road. IIRC if one executes all schema changes from the same node one can be certain that the schema changes will converge. After executing a schema change you can execute `nodetool describecluster` to make sure all nodes have the same schema. What I'd suggest is that you either - introduce a queue to execute the schema changes from a single node; or - come up with the schema that works generically over time; or - somehow introduce a global lock if you are to programatically alter schema. Lock, make change, poll until all nodes have the same schema, release lock. Those are mu 5 cents. There are probably other solutions. Cheers, Jens On Mon, May 25, 2015 at 10:58 AM, Magnus Vojbacke magnus.vojba...@digitalroute.com wrote: I have a lot of clients that will try to create the same schema (a keyspace with multiple tables) concurrently during application startup. The idea is that the first time the application starts, the clients will create the schema needed to run (create if not exists, etc...) From what I’ve read, I think that Cassandra has support for concurrent schema creation and modification, but I assume there will be conflicts of some sort. Is there any known strategy for handling this? Specifically considering conflicts. In case of a conflict (e.g., two clients trying to create the exact same table), will the client call return with an error? (Datastax driver) Would a plausible strategy be (for each client) 1) try to create the table, 2) examine any error coming back to determine if a conflict happened, 3) if conflict, move on to next table? Or is it just better to add a separate step to create the schema at some point in time before the clients can be allowed to work (i.e. move schema creation out of the clients)? Thanks /Magnus -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Performance penalty of multiple UPDATEs of non-pk columns
Artur, That's not entirely true. Writes to Cassandra are first written to a memtable (in-memory table) which is periodically flushed to disk. If multiple writes are coming in before the flush, then only a single record will be written to the disk/sstable. If your have writes that aren't coming within the same flush, they will get removed when you are compacting just like you say. Unfortunately I can't answer this regarding Counters as I haven't worked with them. Hope this helped at least. Cheers, Jens On Fri, May 15, 2015 at 11:16 AM, Artur Siekielski a...@vhex.net wrote: I've seen some discussions about the topic on the list recently, but I would like to get more clear answers. Given the table: CREATE TABLE t1 ( f1 text, f2 text, f3 text, PRIMARY KEY(f1, f2) ); and assuming I will execute UPDATE of f3 multiple times (say, 1000) for the same key values k1, k2 and different values of 'newval': UPDATE t1 SET f3=newval WHERE f1=k1 AND f2=k2; How will the performance of selecting the current 'f3' value be affected?: SELECT f3 FROM t1 WHERE f1=k2 AND f2=k2; It looks like all the previous values are preserved until compaction, but does executing the SELECT reads all the values (O(n), n - number of updates) or only the current one (O(1)) ? How the situation looks for Counter types? -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Hive support on Cassandra
Hi Ajay, I just Googled your question and ended up here: http://stackoverflow.com/q/11850186/260805 The only solution seem to be Datastax Enterprise. Cheers, Jens On Wed, May 6, 2015 at 7:57 AM, Ajay ajay.ga...@gmail.com wrote: Hi, Does Apache Cassandra (not DSE) support Hive Integration? I found couple of open source efforts but nothing is available currently. Thanks Ajay -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Query returning tombstones
Hi Christian, I just know Sylvain explicitly stated he wasn't a fan of exposing tombstones here: https://issues.apache.org/jira/browse/CASSANDRA-8574?focusedCommentId=14292063page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14292063 Cheers, Jens On Wed, Apr 29, 2015 at 12:43 PM, horschi hors...@gmail.com wrote: Hi, did anybody ever raise a feature request for selecting tombstones in CQL/thrift? It would be nice if I could use CQLSH to see where my tombstones are coming from. This would much more convenient than using sstable2json. Maybe someone can point me to an existing jira-ticket, but I also appreciate any other feedback :-) regards, Christian -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: When to use STCS/DTCS/LCS
Divya, Please start a new thread for that. Or is your question related specifically to this thread? Thanks, Jens On Thu, Apr 9, 2015 at 11:34 AM, Divya Divs divya.divi2...@gmail.com wrote: hi sir.. I'm a m-tech student. my academic project is under cassandra. I have run the source code of cassandra in eclipse juno using ant build. https://github.com/apache/cassandra. i have to do some feature enhancement in cassandra and i have analyze my application in cassandra. So please tell me what kind of feature enhancementthat i can do in cassandra. tell me a simple feature enhancement thats enough.Please guide me. Thanks in advance. Thanks and Regards, Divya -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Help understanding aftermath of death by GC
Hi Robert, On Tue, Mar 31, 2015 at 2:22 PM, Robert Wille rwi...@fold3.com wrote: Can anybody help me understand why Cassandra wouldn’t recover? One issue when you are running a JVM and start running out of memory is that the JVM can start throwing `OutOfMemoryError` in any thread - not necessarily in the thread which is taking all the memory. I've seen this happen multiple times. If this happened to you, a critical Cassandra thread could have died and brought the whole Cassandra DB with itself. Just an idea - cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Really high read latency
Also, two control questions: - Are you using EBS for data storage? It might introduce additional latencies. - Are you doing proper paging when querying the keyspace? Cheers, Jens On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith david92galbra...@gmail.com wrote: Hi! So I've got a table like this: CREATE TABLE default.metrics (row_time int,attrs varchar,offset int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'} AND compression={'sstable_compression':'LZ4Compressor'}; and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB of heap space. So it's timeseries data that I'm doing so I increment row_time each day, attrs is additional identifying information about each series, and offset is the number of milliseconds into the day for each data point. So for the past 5 days, I've been inserting 3k points/second distributed across 100k distinct attrses. And now when I try to run queries on this data that look like SELECT * FROM default.metrics WHERE row_time = 5 AND attrs = 'potatoes_and_jam' it takes an absurdly long time and sometimes just times out. I did nodetool cftsats default and here's what I get: Keyspace: default Read Count: 59 Read Latency: 397.12523728813557 ms. Write Count: 155128 Write Latency: 0.3675690719921613 ms. Pending Flushes: 0 Table: metrics SSTable count: 26 Space used (live): 35146349027 Space used (total): 35146349027 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.10386468749216264 Memtable cell count: 141800 Memtable data size: 31071290 Memtable switch count: 41 Local read count: 59 Local read latency: 397.126 ms Local write count: 155128 Local write latency: 0.368 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 2856 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 36904729268 Compacted partition mean bytes: 986530969 Average live cells per slice (last five minutes): 501.66101694915255 Maximum live cells per slice (last five minutes): 502.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Ouch! 400ms of read latency, orders of magnitude higher than it has any right to be. How could this have happened? Is there something fundamentally broken about my data model? Thanks! -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Store data with cassandra
Jean, I'm not sure you will receive any reply unless you ask specific questions about those links. Cheers, Jens – Skickat från Mailbox On Fri, Mar 20, 2015 at 5:08 PM, Sibbald, Charles charles.sibb...@bskyb.com wrote: Sounds like this is a job for jackrabbit ? http://jackrabbit.apache.org From: Ali Akhtar ali.rac...@gmail.commailto:ali.rac...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, 20 March 2015 15:58 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Store data with cassandra ( I apologize, I'm only joking. To answer your question, Cassandra tends to cache the first 300MB or so of data in memory, only when it grows beyond that does it start to write it to files. But, Cassandra is not the write choice for storing files. In the screenshot you linked, its only storing the filenames, not the actual contents of the files). On Fri, Mar 20, 2015 at 8:54 PM, Ali Akhtar ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote: It has been decided that the file cannot be allowed to be stored, sorry. However, if a sacrifice to the gods is prepared, it may be possible to change things. On Fri, Mar 20, 2015 at 8:49 PM, jean paul researche...@gmail.commailto:researche...@gmail.com wrote: i'd like to store MyFile.txt using cassandra (replicat = 2) and see on what node the file and its replicas are stored on my cluster of 10 nodes it is a simple file with simple content (text) is that possible ? 2015-03-20 16:44 GMT+01:00 Ali Akhtar ali.rac...@gmail.commailto:ali.rac...@gmail.com: The files you store have to personally be vetted by the cassandra community. Only if they're found to not contain anything inappropriate, does cassandra let you store them. (A 3/4 majority vote is necessary). Please send your files for approval to j...@reallycereal.commailto:j...@reallycereal.com On Fri, Mar 20, 2015 at 8:41 PM, jean paul researche...@gmail.commailto:researche...@gmail.com wrote: What about this so http://www.datastax.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-10-at-11.21.55-AM.png i read also some documents about storing blob with cassandra !! 2015-03-20 15:04 GMT+01:00 Michael Dykman mdyk...@gmail.commailto:mdyk...@gmail.com: You seem to be missing the point here. Cassandra does not manage files, it manages data in a highly distributed cluster. If you are attempting to manage files, you are quite simply using the wrong tool and Cassandra is not for you. On Fri, Mar 20, 2015 at 9:10 AM, jean paul researche...@gmail.commailto:researche...@gmail.com wrote: I have used this tutoriel to create my data base http://planetcassandra.org/insert-select-records/ /var/lib/cassandra/data# ls demo system system_traces :/var/lib/cassandra/data# cd demo/ :/var/lib/cassandra/data/demo# ls users :/var/lib/cassandra/data/demo# cd users/ :/var/lib/cassandra/data/demo/users# ls :/var/lib/cassandra/data/demo/users# i find nothing in /var/lib/cassandra/data/demo/users! 2015-03-20 13:06 GMT+01:00 jean paul researche...@gmail.commailto:researche...@gmail.com: Hello All; Please, i have created this table. lastname | age | city | email | firstname --+-+---+-+--- Doe | 36 | Beverly Hills | jane...@email.commailto:jane...@email.com | Jane Byrne | 24 | San Diego | robby...@email.commailto:robby...@email.com | Rob Smith | 46 | Sacramento | johnsm...@email.commailto:johnsm...@email.com | John So, my question, where this data is saved ? in ./var/lib/cassandra/data ? My end goal is to store afile with cassandra and to see on which node my file is stored ? thanks a lot for help Best Regards. -- - michael dykman - mdyk...@gmail.commailto:mdyk...@gmail.com May the Source be with you. Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of Sky plc and Sky International AG and are used under licence. Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in
Re: Timeout error in fetching million rows as results using clustering keys
Hi, Try setting fetchsize before querying. Assuming you don't set it too high, and you don't have too many tombstones, that should do it. Cheers, Jens – Skickat från Mailbox On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta meme...@cs.stonybrook.edu wrote: Hi, I have requirement to fetch million row as result of my query which is giving timeout errors. I am fetching results by selecting clustering columns, then why the queries are taking so long. I can change the timeout settings but I need the data to fetched faster as per my requirement. My table definition is: *CREATE TABLE images.results (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar, Area float, submit_date timestamp, points listdouble, PRIMARY KEY ((image_caseid),Area,uuid));* Here each row is uniquely identified on the basis of unique uuid. But since my data is generally queried based upon *image_caseid *I have made it partition key. I am currently using Java Datastax api to fetch the results. But the query is taking a lot of time resulting in timeout errors: Exception in thread main com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for server response)) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) at QueryDB.queryArea(TestQuery.java:59) at TestQuery.main(TestQuery.java:35) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for server response)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Also when I try the same query on console even while using limit of 2000 rows: cqlsh:images select count(*) from results where image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 2000; errors={}, last_host=127.0.0.1 Thanks and Regards, Mehak
Re: Inconsistent count(*) and distinct results from Cassandra
Frens, What consistency are you querying with? Could be you are simply receiving result from different nodes each time. Jens – Skickat från Mailbox On Wed, Mar 4, 2015 at 7:08 PM, Mikhail Strebkov streb...@gmail.com wrote: We have observed the same issue in our production Cassandra cluster (5 nodes in one DC). We use Cassandra 2.1.3 (I joined the list too late to realize we shouldn’t user 2.1.x yet) on Amazon machines (created from community AMI). In addition to count variations with 5 to 10% we observe variations for the query “select * from table1 where time '$fromDate' and time '$toDate' allow filtering” results. We iterated through the results multiple times using official Java driver. We used that query for a huge data migration and were unpleasantly surprised that it is unreliable. In our case “nodetool repair” didn’t fix the issue. So I echo Frens questions. Thanks, Mikhail On Wed, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan m...@frensjan.nl wrote: Hi, Is it to be expected that select count(*) from ... and select distinct partition-key-columns from ... to yield inconsistent results between executions even though the table at hand isn't written to? I have a table in a keyspace with replication_factor = 1 which is something like: CREATE TABLE tbl ( id frozenid_type, bucket bigint, offset int, value double, PRIMARY KEY ((id, bucket), offset) ) The frozen udt is: CREATE TYPE id_type ( tags maptext, text ); When I do select count(*) from tbl several times the actual count varies with 5 to 10%. Also when performing select distinct id, bucket from tbl the results aren't consistent over several query executions. The table is not being written to at the time I performed the queries. Is this to be expected? Or is this a bug? Is there a alternative method / workaround? I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with Oracle Java 1.8.0_31. Thanks in advance, Frens Jan
Re: Input/Output Error
Hi, Check your Cassandra and kernel (if on Linux) log files for errors. Cheers, Jens – Skickat från Mailbox On Wed, Mar 4, 2015 at 2:18 AM, 曹志富 cao.zh...@gmail.com wrote: Some times My C* 2.1.3 cluster compaction or streaming occur this error ,do this because of disk or filesystem problem?? Thanks All. -- Ranger Tsao
Re: best practices for time-series data with massive amounts of records
Hi, I have not done something similar, however I have some comments: On Mon, Mar 2, 2015 at 8:47 PM, Clint Kelly clint.ke...@gmail.com wrote: The downside of this approach is that we can no longer do a simple continuous scan to get all of the events for a given user. Sure, but would you really do that real time anyway? :) If you have billions of events that's not going to scale anyway. Also, if you have 10 events per bucket. The latency introduced by batching should be manageable. Some users may log lots and lots of interactions every day, while others may interact with our application infrequently, This makes another reason to split them up into bucket to make the cluster partitions more manageble and homogenous. so I'd like a quick way to get the most recent interaction for a given user. For this you could actually have a second table that stores the last_time_bucket for a user. Upon event write, you could simply do an update of the last_time_bucket. You could even have an index of all time buckets per user if you want. Has anyone used different approaches for this problem? The only thing I can think of is to use the second table schema described above, but switch to an order-preserving hashing function, and then manually hash the id field. This is essentially what we would do in HBase. Like you might already know, this order preserving hashing is _not_ considered best practise in the Cassandra world. Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: using or in select query in cassandra
Hi Rahul, No, you can't do this in a single query. You will need to execute two separate queries if the requirements are on different columns. However, if you'd like to select multiple rows of with restriction on the same column you can do that using the `IN` construct: select * from table where id IN (123,124); See [1] for reference. [1] http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Cheers, Jens On Mon, Mar 2, 2015 at 7:06 AM, Rahul Srivastava srivastava.robi...@gmail.com wrote: Hi I want to make uniqueness for my data so i need to add OR clause in my WHERE clause. ex: select * from table where id =123 OR name ='abc' so in above i want that i get data if my id is 123 or my name is abc . is there any possibility in cassandra to achieve this . -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: How to extract all the user id from a single table in Cassandra?
Hi Check, Please avoid double posting on mailing lists. It leads to double work (respect people's time!) and makes it hard for people in the future having the same issue as you to follow discussions and answers. That said, if you have a lot of primary keys select user_id from testkeyspace.user_record; will most definitely timeout. Have a look at `SELECT DISTINCT` at [1]. More importantly, for larger datasets you will also need to split the token space into smaller segments and iteratively select your primary keys. See [2]. [1] http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html [2] http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__paging-through-unordered-results If you are having specific issues with the Java Driver I suggest you ask on that mailing list (only). Cheers, Jens On Sun, Mar 1, 2015 at 6:38 PM, Check Peck comptechge...@gmail.com wrote: Sending again as I didn't got any response on this. Any thoughts? On Fri, Feb 27, 2015 at 8:24 PM, Check Peck comptechge...@gmail.com wrote: I have a Cassandra table like this - create table user_record (user_id text, record_name text, record_value blob, primary key (user_id, record_name)); What is the best way to extract all the user_id from this table? As of now, I cannot change my data model to do this exercise so I need to find a way by which I can extract all the user_id from the above table. I am using Datastax Java driver in my project. Is there any other easy way apart from code to extract all the user_id from the above table through come cqlsh utility and dump it into some file? I am thinking below code might timed out after some time - public class TestCassandra { private Session session = null; private Cluster cluster = null; private static class ConnectionHolder { static final TestCassandra connection = new TestCassandra(); } public static TestCassandra getInstance() { return ConnectionHolder.connection; } private TestCassandra() { Builder builder = Cluster.builder(); builder.addContactPoints(127.0.0.1); PoolingOptions opts = new PoolingOptions(); opts.setCoreConnectionsPerHost(HostDistance.LOCAL, opts.getCoreConnectionsPerHost(HostDistance.LOCAL)); cluster = builder.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withPoolingOptions(opts) .withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy(PI))) .withReconnectionPolicy(new ConstantReconnectionPolicy(100L)) .build(); session = cluster.connect(); } private SetString getRandomUsers() { SetString userList = new HashSetString(); String sql = select user_id from testkeyspace.user_record;; try { SimpleStatement query = new SimpleStatement(sql); query.setConsistencyLevel(ConsistencyLevel.ONE); ResultSet res = session.execute(query); IteratorRow rows = res.iterator(); while (rows.hasNext()) { Row r = rows.next(); String user_id = r.getString(user_id); userList.add(user_id); } } catch (Exception e) { System.out.println(error= + e); } return userList; } } Adding java-driver group and Cassandra group as well to see whether there is any better way to execute this? -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: how many rows can one partion key hold?
Also, note that repairs will be slower for larger rows and AFAIK also require slightly more memory. Also, to avoid many tombstones it could be worth to consider bucketing your partitions by time. Cheers, Jens On Fri, Feb 27, 2015 at 7:44 AM, wateray wate...@163.com wrote: Hi all, My team is using Cassandra as our database. We have one question as below. As we know, the row with the some partition key will be stored in the some node. But how many rows can one partition key hold? What is it depend on? The node's volume or partition data size or partition rows size(the number of rows)? When one partition's data is extreme large, the write/read will slow? Can anyone show me some exist usecases. thanks! -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Added new nodes to cluster but no streams
Hi Bastranut, A few minutes between each node will do. Cheers, Jens On Fri, Feb 13, 2015 at 1:12 PM, Batranut Bogdan batra...@yahoo.com wrote: Hello, When adding a new node to the cluster I need to wait for each node to receive all the data from other nodes in the cluster or just wait a few minutes before I start each node? On Thursday, February 12, 2015 7:21 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Feb 12, 2015 at 3:20 AM, Batranut Bogdan batra...@yahoo.com wrote: I have added new nodes to the existing cluster. In Opscenter I do not see any streams... I presume that the new nodes get the data from the rest of the cluster via streams. The existing cluster has TB magnitude, and space used in the new nodes is ~90 GB. I must admit that I have restarted the new nodes several times after adding them . Does this affect boostrap? AFAIK the new nodes should start loading a part of all the data in the existing cluster. If it stays like this for a while, it sounds like your bootstraps have hung. Note that in general you should add nodes one at a time, especially if you are in a version without the fix for CASSANDRA-2434, in theory adding multiple nodes at once might contribute to their bootstraps hanging. Stop cassandra on the joining nodes, wipe/move aside their data directories, and try again one at a time. =Rob -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: How to speed up SELECT * query in Cassandra
If you are using Spark you need to be _really_ careful about your tombstones. In our experience a single partition with too many tombstones can take down the whole batch job (until something like https://issues.apache.org/jira/browse/CASSANDRA-8574 is fixed). This was a major obstacle for us to overcome when using Spark. Cheers, Jens On Wed, Feb 11, 2015 at 5:12 PM, Jiri Horky ho...@avast.com wrote: Well, I always wondered how Cassandra can by used in Hadoop-like environment where you basically need to do full table scan. I need to say that our experience is that cassandra is perfect for writing, reading specific values by key, but definitely not for reading all of the data out of it. Some of our projects found out that doing that with a not trivial in a timely manner is close to impossible in many situations. We are slowly moving to storing the data in HDFS and possibly reprocess them on a daily bases for such usecases (statistics). This is nothing against Cassandra, it can not be perfect for everything. But I am really interested how it can work well with Spark/Hadoop where you basically needs to read all the data as well (as far as I understand that). Jirka H. On 02/11/2015 01:51 PM, DuyHai Doan wrote: The very nature of cassandra's distributed nature vs partitioning data on hadoop makes spark on hdfs actually fasted than on cassandra Prove it. Did you ever have a look into the source code of the Spark/Cassandra connector to see how data locality is achieved before throwing out such statement ? On Wed, Feb 11, 2015 at 12:42 PM, Marcelo Valle (BLOOMBERG/ LONDON) mvallemil...@bloomberg.net wrote: cassandra makes a very poor datawarehouse ot long term time series store Really? This is not the impression I have... I think Cassandra is good to store larges amounts of data and historical information, it's only not good to store temporary data. Netflix has a large amount of data and it's all stored in Cassandra, AFAIK. The very nature of cassandra's distributed nature vs partitioning data on hadoop makes spark on hdfs actually fasted than on cassandra. I am not sure about the current state of Spark support for Cassandra, but I guess if you create a map reduce job, the intermediate map results will be still stored in HDFS, as it happens to hadoop, is this right? I think the problem with Spark + Cassandra or with Hadoop + Cassandra is that the hard part spark or hadoop does, the shuffling, could be done out of the box with Cassandra, but no one takes advantage on that. What if a map / reduce job used a temporary CF in Cassandra to store intermediate results? From: user@cassandra.apache.org Subject: Re: How to speed up SELECT * query in Cassandra I use spark with cassandra, and you dont need DSE. I see a lot of people ask this same question below (how do I get a lot of data out of cassandra?), and my question is always, why arent you updating both places at once? For example, we use hadoop and cassandra in conjunction with each other, we use a message bus to store every event in both, aggregrate in both, but only keep current data in cassandra (cassandra makes a very poor datawarehouse ot long term time series store) and then use services to process queries that merge data from hadoop and cassandra. Also, spark on hdfs gives more flexibility in terms of large datasets and performance. The very nature of cassandra's distributed nature vs partitioning data on hadoop makes spark on hdfs actually fasted than on cassandra -- *Colin Clark* +1 612 859 6129 Skype colin.p.clark On Feb 11, 2015, at 4:49 AM, Jens Rantil jens.ran...@tink.se wrote: On Wed, Feb 11, 2015 at 11:40 AM, Marcelo Valle (BLOOMBERG/ LONDON) mvallemil...@bloomberg.net wrote: If you use Cassandra enterprise, you can use hive, AFAIK. Even better, you can use Spark/Shark with DSE. Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#%21/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: How to speed up SELECT * query in Cassandra
On Wed, Feb 11, 2015 at 11:40 AM, Marcelo Valle (BLOOMBERG/ LONDON) mvallemil...@bloomberg.net wrote: If you use Cassandra enterprise, you can use hive, AFAIK. Even better, you can use Spark/Shark with DSE. Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Writing the same column frequently - anti pattern?
Hi, If the writes are coming from the same machine, you could potentially use request collapsing https://github.com/Netflix/Hystrix/wiki/How-To-Use#request-collapsing to avoid the duplicate writes. Just an idea, Jens On Fri, Feb 6, 2015 at 1:15 AM, Andreas Finke andreas.fi...@solvians.com wrote: Hi, we are currently writing the same column within a row multiple times (up to 10 times a second). I am familiar with the concept of tombstones in SSTables. My question is: I assume that in our case in most cases when a column gets overwritten it still resides in the memtable. So I assume for that particular case no tombstone is set but the column is replaced in memory and then the 'newest' version is flushed to disk. Is this assumption correct? Or Is writing the same column an an anti-pattern? I am thankful for any input. Regards Andi -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: how to batch the select query to reduce network communication
As an alternative, you could always execute the async to Cassandra and then iterate over the results as they come in. Cheers, Jens On Fri, Feb 6, 2015 at 12:39 PM, Carlos Rolo r...@pythian.com wrote: Hi, You can't. Batches are only available for INSERT, UPDATE and DELETE operations. Batches exist to give Cassandra some atomicity, as in, or all operations succeed or all fail. Regards, Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Tel: 1649 www.pythian.com On Fri, Feb 6, 2015 at 12:21 PM, diwayou diwa...@vip.qq.com wrote: create table t { a int, b int, c int } if i want to execute select * from t where a = 1 and b = 2 limit 10; select * from t where a = 1 and b = 3 limit 10; how can i batch this, and only execute once to get the result -- -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Controlling the MAX SIZE of sstables after compaction
Parth, So are you saying that I should query cassandra right away? Well, don’t take my word for it, but it definitely sounds like a more simple approach. If yes, like I mentioned, I have to run this during traffic hours. Isnt there a possibility then that my traffic to the db may get impacted? Absolutely, it could. But so will converting your sstables to JSON. But a database is also made to be read from ;) I suggest you set up a test cluster and try the load impact before you try other ways (such as dumping database etc.). If load is too high you could also incorporate some kind of rate limiting and/or concurrency limit on your report generation. I also know that people have succesfully used Spark or similar infrastructure for batch processing of Cassandra data. Not sure, but could be useful for you to look into. also is it okay to use hector to this? I have no personal experience with Hector, but I suppose so. Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Mon, Jan 26, 2015 at 9:57 AM, Parth Setya setya.pa...@gmail.com wrote: hey Jens Thank you so much for the advise and reading through. So are you saying that I should query cassandra right away? If yes, like I mentioned, I have to run this during traffic hours. Isnt there a possibility then that my traffic to the db may get impacted? also is it okay to use hector to this? Best On Mon, Jan 26, 2015 at 2:19 PM, Jens Rantil jens.ran...@tink.se wrote: Hi Parth, I’ll take your questions in order: 1. Have a look at the compaction subproperties for STCS: http://datastax.com/documentation/cql/3.1/cql/cql_reference/compactSubprop.html 2. Why not talk to Cassandra when generating the report? It will be waaay faster (and easier!); Cassandra will use bloom filters, handle shadowed (overwritten) columns, handle tombstones for you, not the mention the fact that it uses sstables that are hot in OS file cache. 3. See 2) above. Also, your approach requires you to implement handling of shadowed columns as well as tombstone handling which could be pretty messy. Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Mon, Jan 26, 2015 at 7:40 AM, Parth Setya setya.pa...@gmail.com wrote: Hi *Setup* *3 Node Cluster* Api- * Hector*CL- * QUORUM* RF- *3* Compaction Strategy- *Size Tiered Compaction* *Use Case* I have about *320 million rows*(~12 to 15 columns each) worth of data stored in Cassandra. In order to generate a report containing ALL that data, I do the following: 1. Run Compaction 2. Take a snapshot of the db 3. Run sstable2json on all the *Data.db files 4. Read those jsons and write to a csv. *Problem*: The *sstable2json* utility takes about 350-400 hours (~85% of the total time) thereby lengthening the process. (I am running sstable2json sequentially on all the *Data.db files but the size of those is inconsistent so making it run concurrently doesn't help either E.G one file is of size 25 GB while another of 500 MB) *My Thought Process:* Is there a way to put a cap on the maximum size of the sstables that are generated after compaction such that i have multiple sstables of uniform size. Then I can run sstable2json utility on the same concurrently *Questions:* 1. Is there a way to configure the size of sstables created after compaction? 2. Is there a better approach to generate the report? 3. What are the flaws with this approach? Best Parth
Re: Controlling the MAX SIZE of sstables after compaction
Hi Parth, I’ll take your questions in order: 1. Have a look at the compaction subproperties for STCS: http://datastax.com/documentation/cql/3.1/cql/cql_reference/compactSubprop.html 2. Why not talk to Cassandra when generating the report? It will be waaay faster (and easier!); Cassandra will use bloom filters, handle shadowed (overwritten) columns, handle tombstones for you, not the mention the fact that it uses sstables that are hot in OS file cache. 3. See 2) above. Also, your approach requires you to implement handling of shadowed columns as well as tombstone handling which could be pretty messy. Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Mon, Jan 26, 2015 at 7:40 AM, Parth Setya setya.pa...@gmail.com wrote: Hi *Setup* *3 Node Cluster* Api- * Hector*CL- * QUORUM* RF- *3* Compaction Strategy- *Size Tiered Compaction* *Use Case* I have about *320 million rows*(~12 to 15 columns each) worth of data stored in Cassandra. In order to generate a report containing ALL that data, I do the following: 1. Run Compaction 2. Take a snapshot of the db 3. Run sstable2json on all the *Data.db files 4. Read those jsons and write to a csv. *Problem*: The *sstable2json* utility takes about 350-400 hours (~85% of the total time) thereby lengthening the process. (I am running sstable2json sequentially on all the *Data.db files but the size of those is inconsistent so making it run concurrently doesn't help either E.G one file is of size 25 GB while another of 500 MB) *My Thought Process:* Is there a way to put a cap on the maximum size of the sstables that are generated after compaction such that i have multiple sstables of uniform size. Then I can run sstable2json utility on the same concurrently *Questions:* 1. Is there a way to configure the size of sstables created after compaction? 2. Is there a better approach to generate the report? 3. What are the flaws with this approach? Best Parth
Re: How to know disk utilization by each row on a node
Hi, Datastax comes with sstablekeys that does that. You could also use sstable2json script to find keys. Cheers, Jens On Tue, Jan 20, 2015 at 2:53 PM, Edson Marquezani Filho edsonmarquez...@gmail.com wrote: Hello, everybody. Does anyone know a way to list, for an arbitrary column family, all the rows owned (including replicas) by a given node and the data size (real size or disk occupation) of each one of them on that node? I would like to do that because I have data on one of my nodes growing faster than the others, although rows (and replicas) seem evenly distributed across the cluster. So, I would like to verify if I have some specific rows growing too much. Thank you.
Re: keyspace not exists?
Hi Jason, Have you checked the Cassandra log? Cheers, Jens On Fri, Jan 16, 2015 at 10:59 AM, Jason Wee peich...@gmail.com wrote: $ cqlsh 192.168.0.2 9042 Connected to just4fun at 192.168.0.2:9042. [cqlsh 5.0.1 | Cassandra 2.1.1 | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh DESCRIBE KEYSPACES empty cqlsh create keyspace foobar with replication = {'class':'SimpleStrategy', 'replication_factor':3}; errors={}, last_host=192.168.0.2 cqlsh DESCRIBE KEYSPACES; empty cqlsh use foobar; cqlsh:foobar DESCRIBE TABLES; Keyspace 'foobar' not found. Just trying cassandra 2.1 and encounter the above erorr, can anyone explain why is this and where to even begin troubleshooting? Jason
Script to count tombstones by partition key
Hi all I just recently put together a small script to count the number of tombstones grouped by partition id, for one or multiple sstables: https://gist.github.com/JensRantil/063b7c56ca4a8dfe1c50 I needed this for debugging purposes and thought I’d share it with you guys in case anyone is interested. Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
TombstoneOverwhelmingException for few tombstones
Hi, I have a single partition key that been nagging me because I am receiving org.apache.cassandra.db.filter.TombstoneOverwhelmingException. After filing https://issues.apache.org/jira/browse/CASSANDRA-8561 I managed to find the partition key in question and which machine it was located on (by looking in system.log). Since I wanted to see how many tombstones the partition key actually had I did: nodetool flush mykeyspace mytable to make sure all changes were written to sstables (not sure this was necessary), then nodetool getsstables mykeyspace mytable PARTITIONKEY which listed two sstables. I then had a look at both sstables for my key in question using sstable2json MYSSTABLE1 -k PARTITIONKEY | jq . MYSSTABLE1.json sstable2json MYSSTABLE2 -k PARTITIONKEY | jq . MYSSTABLE2.json (piping through jq to format the json). Both JSON files contains data (so I have selected the right key). Only one of the files contains any tombstones $ cat MYSSTABLE1.json | grep 't'|wc -l 4281 $ cat MYSSTABLE2.json | grep 't'|wc -l 0 But to my surprise, the number of tombstones are nowhere near tombstone_failure_threshold: 10 Can anyone explain why Cassandra is overwhelmed when I’m nowhere near the hard limit? Thanks, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
Re: Implications of ramping up max_hint_window_in_ms
Thanks for input, Rob. Just making sure, is older version the same as less than version 2? On Mon, Jan 5, 2015 at 8:13 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Jan 5, 2015 at 2:52 AM, Jens Rantil jens.ran...@tink.se wrote: Since repair is a slow and daunting process*, I am considering increasing max_hint_window_in_ms from its default value of one (1) hour to something like 24-48 hours. ... Are there any other implications of making this change that I haven’t thought of? Not really, though 24-48 hours of hints could be an awful lot of hints. I personally run with at least a 6 hour max_h_w_i_m. In older versions of Cassandra, 24-48 hours of hints could hose your node via ineffective constant compaction. =Rob
Implications of ramping up max_hint_window_in_ms
Hi, Since repair is a slow and daunting process*, I am considering increasing max_hint_window_in_ms from its default value of one (1) hour to something like 24-48 hours. This will give me and my team more time to fix the underlying problem of a node. I understand that - repair is the only way to avoid hardware failure/bit rot scenarios. I will still be running repair on a weekly basis. - disk usage obviously will increase before data has been handed off. Disk usage shouldn’t be an issue in this case. Are there any other implications of making this change that I haven’t thought of? * I know incremental repair is coming up, but I don’t consider it stable enough. Thanks, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
Re: is primary key( foo, bar) the same as primary key ( foo ) with a ‘set' of bars?
...they have a somewhat different conflict/repair resolutions, too. On Thu, Jan 1, 2015 at 8:06 PM, DuyHai Doan doanduy...@gmail.com wrote: Storage-engine wise, they are almost equivalent, thought there are some minor differences: 1) with Set structure, you cannot store more that 64kb worth of data 2) collections and maps are loaded entirely by Cassandra for each query, whereas with clustering columns you can select a slice of columns On Thu, Jan 1, 2015 at 7:46 PM, Kevin Burton bur...@spinn3r.com wrote: I think the two tables are the same. Correct? create table foo ( source text, target text, primary key( source, target ) ) vs create table foo ( source text, target settext, primary key( source ) ) … meaning that the first one, under the covers is represented the same as the second. As a slice. Am I correct? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
How many tombstones for deleted CQL row?
Hi, I am considering tuning the tombstone warn/error threshold. Just making sure; If I INSERT one (CQL) row populating all six columns and then DELETE the inserted row, will Cassandra write 1 range tombstone or seven tombstones (one per columns plus row marker)? Thanks, Jens
Re: How many tombstones for deleted CQL row?
Great. Also, if I issue DELETE my_table WHERE partition_key=xxx AND compound_key=yyy I understand only a single tombstone will be created? On Fri, Dec 26, 2014 at 10:59 AM, DuyHai Doan doanduy...@gmail.com wrote: If you issue DELETE my_table WHERE partition_key = xxx Cassandra will create a row tomstone and not one tombstone per column, fortunately On Fri, Dec 26, 2014 at 10:50 AM, Jens Rantil jens.ran...@tink.se wrote: Hi, I am considering tuning the tombstone warn/error threshold. Just making sure; If I INSERT one (CQL) row populating all six columns and then DELETE the inserted row, will Cassandra write 1 range tombstone or seven tombstones (one per columns plus row marker)? Thanks, Jens
Re: Sqoop Free Form Import Query Breaks off
Hi, Does this have anything to do with Cassandra? Also, please try to avoid cross posting; It makes it hard for - future readers to read the full thread. - anyone to follow the full thread. - anyone to respond. I assume there are few who are enrolled to both mailing lists at the same time. Thank you and merry Christmas, Jens On Thu, Dec 25, 2014 at 2:24 PM, Vineet Mishra clearmido...@gmail.com wrote: Hi All, I am facing a issue while Sqoop(Sqoop version: 1.4.3-cdh4.7.0) Import, I am having a Java threaded code to import data from multiple databases running at different servers. Currently I am doing a Java Process Execute something like to execute sqoop job, Runtime.getRuntime().exec(/usr/bin/sqoop import --driver com.vertica.jdbc.Driver --connect jdbc:vertica://host:port/db --username user --password pwd --query 'select * from table WHERE $CONDITIONS' --split-by id --target-dir /user/hive/warehouse/data/db.db/table --fields-terminated-by '\t' --hive-drop-import-delims -m 1) I am executing the above command as it is and running into exception saying, WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Error parsing arguments for import: 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: * 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: from 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: table 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: WHERE 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: $CONDITIONS' 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: --split-by 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: id . . . Although I can easily understand the error and reason of the fact as sqoop is internally splitting the command by space and taking the KV which is splitting the free form query as otherwise its runs fine with the table parameter instead, but if I run the same command directly from the command line it works like a charm. wanted to know is there's something that I am missing while going this way, If no, then why is this issue hitting and what's the work around? Urgent Call! Thanks!
Re: Multi DC informations (sync)
Alain, AFAIK, the DC replication is not linearizable. That is, writes are are not replicated according to a binlog or similar like MySQL. They are replicated concurrently. To answer you questions: 1 - Replication lag in Cassandra terms is probably “Hinted handoff”. You’d want to check the status of that. 2 - `nodetool status` is your friend. It will tell you whether the cluster considers other nodes reachable or not. Run it on a node in the datacenter that you’d like to test connectivity from. Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Fri, Dec 19, 2014 at 11:16 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi guys, We expanded our cluster to a multiple DC configuration. Now I am wondering if there is any way to know: 1 - The replication lag between these 2 DC (Opscenter, nodetool, other ?) 2 - Make sure that sync is ok at any time I guess big companies running Cassandra are interested in these kind of info, so I think something exist but I am not aware of it. Any other important information or advice you can give me about best practices or tricks while running a multi DC (cross regions US - EU) is welcome of course ! cheers, Alain
Re: Understanding tombstone WARN log output
Hi again, A follow-up question (to my yet unanswered question): How come the first localDeletion is Integer.MAX_VALUE above? Should it be? Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Thu, Dec 18, 2014 at 2:48 PM, Jens Rantil jens.ran...@tink.se wrote: Hi, I am occasionally seeing: WARN [ReadStage:9576] 2014-12-18 11:16:19,042 SliceQueryFilter.java (line 225) Read 756 live and 17027 tombstoned cells in mykeyspace.mytable (see tombstone_warn_threshold). 5001 columns was requested, slices=[73c31274-f45c-4ba5-884a-6d08d20597e7:myfield-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647, ranges=[73f0b59e-7525-4a18-a84f-d2a2f0505503-73f0b59e-7525-4a18-a84f-d2a2f0505503:!, deletedAt=141872018676, localDeletion=1418720186][74374d72-2688-4e64-bb0b-f51a956b0529-74374d72-2688-4e64-bb0b-f51a956b0529:!, deletedAt=1418720184675000, localDeletion=1418720184] ... in system.log. My primary key is ((userid uuid), id uuid). Is it possible for me to see from this output which partition key and/or ranges that has all of these tombstones? Thanks, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Understanding tombstone WARN log output
Hi, I am occasionally seeing: WARN [ReadStage:9576] 2014-12-18 11:16:19,042 SliceQueryFilter.java (line 225) Read 756 live and 17027 tombstoned cells in mykeyspace.mytable (see tombstone_warn_threshold). 5001 columns was requested, slices=[73c31274-f45c-4ba5-884a-6d08d20597e7:myfield-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647, ranges=[73f0b59e-7525-4a18-a84f-d2a2f0505503-73f0b59e-7525-4a18-a84f-d2a2f0505503:!, deletedAt=141872018676, localDeletion=1418720186][74374d72-2688-4e64-bb0b-f51a956b0529-74374d72-2688-4e64-bb0b-f51a956b0529:!, deletedAt=1418720184675000, localDeletion=1418720184] ... in system.log. My primary key is ((userid uuid), id uuid). Is it possible for me to see from this output which partition key and/or ranges that has all of these tombstones? Thanks, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Replacing nodes disks
Hi Or, You don't have another machine on the network that would temporarily be able to host your /var/lib/cassandra content? That way you would simply be scp:ing the files temporarily to another machine and copy them back when done. You obviously want to do a repair afterwards just in case, but this could save you some time. Just an idea, Jens On Thu, Dec 18, 2014 at 4:17 PM, Or Sher or.sh...@gmail.com wrote: Hi all, We have a situation where some of our nodes have smaller disks and we would like to align all nodes by replacing the smaller disks to bigger ones without replacing nodes. We don't have enough space to put data on / disk and copy it back to the bigger disks so we would like to rebuild the nodes data from other replicas. What do you think should be the procedure here? I'm guessing it should be something like this but I'm pretty sure it's not enough. 1. shutdown C* node and server. 2. replace disks + create the same vg lv etc. 3. start C* (Normally?) 4. nodetool repair/rebuild? *I think I might get some consistency issues for use cases relying on Quorum reads and writes for strong consistency. What do you say? Another question is (and I know it depends on many factors but I'd like to hear an experienced estimation): How much time would take to rebuild a 250G data node? Thanks in advance, Or. -- Or Sher -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Query strategy with respect to tombstones
Hi, I have a table with composite primary id ((userid), id). Some patterns about my table: * Each user generally has 0-3000 rows. But there is currently no upper limit. * Deleting rows for a user is extremely rare, but when done it can be done thousands of rows at a time. * The absolutely most common query is to select all rows for a user. Recently I saw a user that previously had 65000 tombstones when querying for all his rows. system.log was printing TombstoneOverwhelmingException. What are my options to avoid this overwhelming tombstone exception? I am willing to have slower queries than actually not being able to query at all. I see a couple of options: * Using an anti-column to mark rows as deleted. I could then control the rate of which I am writing tombstones by occasionally deleting anti-columns/rows with their equivalent rows. * Simply raise tombstone_failure_threshold. AFAIK, this will eventually make me run into possible GC issues. * Use fetchSize to limit the number of rows paged through. This would make every single query slower, and would not entirely avoid the possibility of getting TombstoneOverwhelmingException. Have I missed any alternatives here? In the best of worlds, the fetchSize property would also honour the number of tombstones, but I don’t think that would be possible, right? Thanks, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
Re: Understanding what is key and partition key
For the first row, the key is: (2014, N, 1, සියළුම, යුද්ධ) and the value-part is (664). Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Dec 16, 2014 at 2:25 PM, Chamila Wijayarathna cdwijayarat...@gmail.com wrote: Hi Jack, So what will be the keys and values of the following CF instance? year | category | frequency | word1| word2 | id --+--+---+--+-+--- 2014 |N | 1 |සියළුම | යුද්ධ | 664 2014 |N | 1 |එච් | කාණ්ඩය | 12526 2014 |N | 1 |ගජබා | සුපර්ක්රොස් | 25779 2014 |N | 1 | බී| කාණ්ඩය | 12505 Thank You! On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky j...@basetechnology.com wrote: Correction: year and category form a “composite partition key”. frequency, word1, and word2 are “clustering columns”. The combination of a partition key with clustering columns is a “compound primary key”. Every CQL row will have a partition key by definition, and may optionally have clustering columns. “The key” should just be a synonym for “primary key”, although sometimes people are loosely speaking about “the partition” (which should be “the partition key”) rather than the CQL “row”. -- Jack Krupansky *From:* Chamila Wijayarathna cdwijayarat...@gmail.com *Sent:* Tuesday, December 16, 2014 8:03 AM *To:* user@cassandra.apache.org *Subject:* Understanding what is key and partition key Hello all, I have read a lot about Cassandra and I read about key-value pairs, partition keys, clustering keys, etc.. Is key mentioned in key-value pair and partition key refers to same or are they different? CREATE TABLE corpus.bigram_time_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, year int, category varchar, frequency int, PRIMARY KEY((year, category),frequency,word1,word2)); In this schema, I know (year, category) is the compound partition key and frequency is the clustering key. What is the key here? Thank You! -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa. -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.
Re: 100% CPU utilization, ParNew and never completing compactions
Maybe checking which thread(s) would hint what's going on? (see http://www.boxjar.com/using-top-and-jstack-to-find-the-java-thread-that-is-hogging-the-cpu/). On Wed, Dec 17, 2014 at 1:51 AM, Arne Claassen a...@emotient.com wrote: Cassandra 2.0.10 and Datastax Java Driver 2.1.1 On Dec 16, 2014, at 4:48 PM, Ryan Svihla rsvi...@datastax.com wrote: What version of Cassandra? On Dec 16, 2014 6:36 PM, Arne Claassen a...@emotient.com wrote: That's just the thing. There is nothing in the logs except the constant ParNew collections like DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888 But the load is staying continuously high. There's always some compaction on just that one table, media_tracks_raw going on and those values rarely changed (certainly the remaining time is meaningless) pending tasks: 17 compaction typekeyspace table completed total unit progress Compaction mediamedia_tracks_raw 444294932 1310653468 bytes33.90% Compaction mediamedia_tracks_raw 131931354 3411631999 bytes 3.87% Compaction mediamedia_tracks_raw30308970 23097672194 bytes 0.13% Compaction mediamedia_tracks_raw 899216961 1815591081 bytes49.53% Active compaction remaining time : 0h27m56s Here's a sample of a query trace: activity | timestamp| source| source_elapsed --+--+---+ execute_cql3_query | 00:11:46,612 | 10.140.22.236 | 0 Parsing select * from media_tracks_raw where id =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 10.140.22.236 | 47 Preparing statement | 00:11:46,612 | 10.140.22.236 |234 Sending message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 | 7190 Message received from /10.140.22.236 | 00:11:46,622 | 10.140.21.54 | 12 Executing single-partition query on media_tracks_raw | 00:11:46,644 | 10.140.21.54 | 21971 Acquiring sstable references | 00:11:46,644 | 10.140.21.54 | 22029 Merging memtable tombstones | 00:11:46,644 | 10.140.21.54 | 22131 Bloom filter allows skipping sstable 1395 | 00:11:46,644 | 10.140.21.54 | 22245 Bloom filter allows skipping sstable 1394 | 00:11:46,644 | 10.140.21.54 | 22279 Bloom filter allows skipping sstable 1391 | 00:11:46,644 | 10.140.21.54 | 22293 Bloom filter allows skipping sstable 1381 | 00:11:46,644 | 10.140.21.54 | 22304 Bloom filter allows skipping sstable 1376 | 00:11:46,644 | 10.140.21.54 | 22317 Bloom filter allows skipping sstable 1368 | 00:11:46,644 | 10.140.21.54 | 22328 Bloom filter allows skipping sstable 1365 | 00:11:46,644 | 10.140.21.54 | 22340 Bloom filter allows skipping sstable 1351 | 00:11:46,644 | 10.140.21.54 | 22352 Bloom filter allows skipping sstable 1367 | 00:11:46,644 | 10.140.21.54 | 22363 Bloom filter allows skipping sstable 1380 | 00:11:46,644 | 10.140.21.54 | 22374 Bloom filter allows skipping sstable 1343 | 00:11:46,644 | 10.140.21.54 | 22386 Bloom filter allows skipping sstable 1342 | 00:11:46,644 | 10.140.21.54 | 22397 Bloom filter allows skipping sstable 1334 | 00:11:46,644 |
Re: Hinted handoff not working
Hi Robert , Maybe you need to flush your memtables to actually see the disk usage increase? This applies to both hosts. Cheers, Jens On Sun, Dec 14, 2014 at 3:52 PM, Robert Wille rwi...@fold3.com wrote: I have a cluster with RF=3. If I shut down one node, add a bunch of data to the cluster, I don’t see a bunch of records added to system.hints. Also, du of /var/lib/cassandra/data/system/hints of the nodes that are up shows that hints aren’t being stored. When I start the down node, its data doesn’t grow until I run repair, which then takes a really long time because it is significantly out of date. Is there some magic setting I cannot find in the documentation to enable hinted handoff? I’m running 2.0.11. Any insights would be greatly appreciated. Thanks Robert
`nodetool cfhistogram` utility script
Hi, I just quickly put together a tiny utility script to estimate average/mean/min/max/percentiles for `nodetool cfhistogram` latency output. Maybe could be useful to someone else, don’t know. You can find it here: https://gist.github.com/JensRantil/3da67e39f50aaf4f5bce Future improvements would obviously be to not hardcode `us:` and support the other histograms. Also, this logic should maybe even be moved into the `nodetool cfhistogram` since these are fairly common metrics for latency. Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
Re: batch_size_warn_threshold_in_kb
Maybe slightly off-topic, but what is a mutation? Is it equivalent to a CQL row? Or maybe a column in a row? Does include tombstones within the selected range? Thanks, Jens On Thu, Dec 11, 2014 at 9:56 PM, Ryan Svihla rsvi...@datastax.com wrote: Nothing magic, just put in there based on experience. You can find the story behind the original recommendation here https://issues.apache.org/jira/browse/CASSANDRA-6487 Key reasoning for the desire comes from Patrick McFadden: Yes that was in bytes. Just in my own experience, I don't recommend more than ~100 mutations per batch. Doing some quick math I came up with 5k as 100 x 50 byte mutations. Totally up for debate. It's totally changeable, however, it's there in no small part because so many people confuse the BATCH keyword as a performance optimization, this helps flag those cases of misuse. On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller moham...@glassbeam.com wrote: Hi – The cassandra.yaml file has property called *batch_size_warn_threshold_in_kb. * The default size is 5kb and according to the comments in the yaml file, it is used to log WARN on any batch size exceeding this value in kilobytes. It says caution should be taken on increasing the size of this threshold as it can lead to node instability. Does anybody know the significance of this magic number 5kb? Why would a higher number (say 10kb) lead to node instability? Mohammed -- [image: datastax_logo.png] http://www.datastax.com/ Ryan Svihla Solution Architect [image: twitter.png] https://twitter.com/foundev [image: linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: Best practice for emulating a Cassandra timeout during unit tests?
Hi, I don’t know if this is “best practice”, but you could do this using mocking if nothing else. Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Dec 9, 2014 at 8:42 PM, Clint Kelly clint.ke...@gmail.com wrote: Hi all, I'd like to write some tests for my code that uses the Cassandra Java driver to see how it behaves if there is a read timeout while accessing Cassandra. Is there a best-practice for getting this done? I was thinking about adjusting the settings in the cluster builder to adjust the timeout settings to be something impossibly low (like 1ms), but I'd rather do something to my test Cassandra instance (using the EmbeddedCassandraService) to temporarily slow it down. Any suggestions? Best regards, Clint
Re: Cassandra backup via snapshots in production
On Mon, Dec 1, 2014 at 8:39 PM, Robert Coli rc...@eventbrite.com wrote: Why not use the much more robustly designed and maintained community based project, tablesnap? For two reasons: - Because I am tired of the deployment model of Python apps which require me to set up virtual environments. - Because it did, AFAIK, not support (asymmetric) encryption before uploading. -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Cassandra add a node and remove a node
Hi Neha, Generally best practice is to add the new node before removing the old one. This is especially important if the cluster’s resources (such as available disk space) are low. Also, adding a node usually asserts that the node is functioning correctly (check logs) before decommisioning the old node. See [1]. [1] http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_live_node.html Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Mon, Dec 1, 2014 at 7:15 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hi, I need to Add new Node and remove existing node. Should I first remove the node and then add a new node or Add new node and then remove existing node. Which practice is better and things I need to take care? regards Neha
Re: Cassandra backup via snapshots in production
Late answer; You can find my backup script here: https://gist.github.com/JensRantil/a8150e998250edfcd1a3 Basically you need to set S3_BUCKET, PGP_KEY_RECIPIENT, configure s3cmd (using s3cmd --configure) and then issue `./backup-keyspace.sh your-keyspace` to backup it to S3. We run the script is run periodically on every node. Regarding “s3cmd --configure”, I executed it once and then copied “~/.s3cfg” to all nodes. Like I said, there’s lots of love that can be put into a backup system. Note that the script has the following limitations: * It does not checksum the files. However s3cmd website states that it by default compares MD5 and file size on upload. * It does not do purging of files on S3 (which you could configure using “Object Lifecycles”). * It does not warn you that a backup fails. Check your logs periodically. * It does not do any advanced logging. Make sure to pipe the output to a file or the `syslog` utility. * It does not do continuous/point-in-time backup. That said, it does its job for us for now. Feel free to propose improvements! Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Fri, Nov 21, 2014 at 7:36 PM, William Arbaugh w...@cs.umd.edu wrote: Jens, I'd be interested in seeing your script. We've been thinking of doing exactly that but uploading to Glacier instead. Thanks, Bill On Nov 21, 2014, at 11:40 AM, Jens Rantil jens.ran...@tink.se wrote: The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). If that is the main purpose, having auto_snapshot: true” in cassandra.yaml will be enough to protect you. Regarding backup, I have a small script that creates a named snapshot and for each sstable; encrypts, uploads to S3 and deletes the snapshotted sstable. It took me an hour to write and roll out to all our nodes. The whole process is currently logged, but eventually I will also send an e-mail if backup fails. ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO ngocminh...@bnpparibas.com wrote: Hello all, We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters). The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). We are thinking of: - Backup: add a 2TB HDD on each node for C* daily/weekly snapshots. - Restore: load the most recent snapshots or latest “non-corrupted” ones and replay missing data imports from other data source. We would like to know if somebody are using Cassandra’s backup feature in production and could share your experience with us. Your help would be greatly appreciated. Best regards, Minh This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
Re: Cassandra backup via snapshots in production
Truncate does trigger snapshot creation though Doesn’t it? With “auto_snapshot: true” it should. ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan doanduy...@gmail.com wrote: True Delete in CQL just create tombstone so from the storage engine pov it's just adding some physical columns Truncate does trigger snapshot creation though Le 21 nov. 2014 19:29, Robert Coli rc...@eventbrite.com a écrit : On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil jens.ran...@tink.se wrote: The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). If that is the main purpose, having auto_snapshot: true” in cassandra.yaml will be enough to protect you. OP includes delete in their list of unexpected manipulations, and auto_snapshot: true will not protect you in any way from DELETE. =Rob http://twitter.com/rcolidba
Cassandra schema migrator
Hi, Anyone who is using, or could recommend, a tool for versioning schemas/migrating in Cassandra? My list of requirements is: * Support for adding tables. * Support for versioning of table properties. All our tables are to be defaulted to LeveledCompactionStrategy. * Support for adding non-existing columns. * Optional: Support for removing columns. * Optional: Support for removing tables. We are preferably a Java shop, but could potentially integrate something non-Java. I understand I could write a tool that would make these decisions using system.schema_columnfamilies and system.schema_columns, but as always reusing a proven tool would be preferable. So far I only know of Spring Data Cassandra that handles creating tables and adding columns. However, it does not handle table properties in any way. Thanks, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
Re: Problem with performance, memory consumption, and RLIMIT_MEMLOCK
Hi Dmitri, I have not used the CPP driver, but maybe you have forgotten set the equivalent of the Iava driver's fetchsize to something sensible? Just an idea, Jens — Sent from Mailbox On Sun, Nov 16, 2014 at 6:09 PM, Dmitri Dmitrienko ddmit...@gmail.com wrote: Hi, I have a very simple table in cassandra that contains only three columns: id, time and blob with data. I added 1M rows of data and now the database is about 12GB on disk. 1M is only part of data I want to store in the database, it's necessary to synchronize this table with external source. In order to do this, I have to read id and time columns of all the rows and compare them with what I see in the external source and insert/update/delete the rows where I see a difference. So, I'm trying to fetch id and time columns from cassandra. All of sudden in all 100% my attempts, server hangs for ~ 1minute, while doing so it loads 100% CPU, then abnormally terminates with error saying I have to run cassandra as root or increase RLIMIT_MEMLOCK. I increased RLIMIT_MEMLOCK to 1GB and seems it still is not sufficient. It seems cassandra tries to read and lock whole the table in memory, ignoring the fact that I need only two tiny columns (~12MB of data). This is how it works when I use the latest cpp-driver. With cqlsh it works differently -- it show first page of data almost immediately, without any sensible delay. Is there a way to have cpp-driver working like cqlsh? I'd like to have data sent to the client immediately upon availability without any attempts to lock huge chunks of virtual memory. My platform is 64bit linux (centos) with all necessary updates installed, openjdk. I also tried macosx with oracle jdk. In this case I don't get RLIMIT_MEMLOCK, but regular out of memory error in system.log, although I provided server with sufficiently large heap, as recommended, 8GB.
Re: Cassandra backup via snapshots in production
The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). If that is the main purpose, having auto_snapshot: true” in cassandra.yaml will be enough to protect you. Regarding backup, I have a small script that creates a named snapshot and for each sstable; encrypts, uploads to S3 and deletes the snapshotted sstable. It took me an hour to write and roll out to all our nodes. The whole process is currently logged, but eventually I will also send an e-mail if backup fails. ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO ngocminh...@bnpparibas.com wrote: Hello all, We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters). The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). We are thinking of: - Backup: add a 2TB HDD on each node for C* daily/weekly snapshots. - Restore: load the most recent snapshots or latest “non-corrupted” ones and replay missing data imports from other data source. We would like to know if somebody are using Cassandra’s backup feature in production and could share your experience with us. Your help would be greatly appreciated. Best regards, Minh This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.