Re: Are Cassandra writes are faster than reads?
Awesome! For a full explanation of what you are seeing (we call it micro batching) check out Adam Zegelins talk on it https://www.youtube.com/watch?v=wF3Ec1rdWgc On Tue, 8 Nov 2016 at 02:21 Rajesh Radhakrishnan < rajesh.radhakrish...@phe.gov.uk> wrote: > > Hi, > > Just found that reducing the batch size below 20 also increases the > writing speed and reduction in memory usage(especially for Python driver). > > Kind regards, > Rajesh R > > -- > *From:* Ben Bromhead [b...@instaclustr.com] > *Sent:* 07 November 2016 05:44 > *To:* user@cassandra.apache.org > *Subject:* Re: Are Cassandra writes are faster than reads? > > They can be and it depends on your compaction strategy :) > > On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac...@gmail.com > <http://redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> > wrote: > > tl;dr? I just want to know if updates are bad for performance, and if so, > for how long. > > On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <b...@instaclustr.com > <http://redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..> > > wrote: > > Check out https://wiki.apache.org/cassandra/WritePathForUsers > <http://redir.aspx?REF=z6gebtTM9Bi4b1ZEZqnpcgJOwnifCWloccEOX28F8UC8Cvd6wAfUCAFodHRwczovL3dpa2kuYXBhY2hlLm9yZy9jYXNzYW5kcmEvV3JpdGVQYXRoRm9yVXNlcnM.> > for > the full gory details. > > On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com > <http://redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> > wrote: > > How long does it take for updates to get merged / compacted into the main > data file? > > On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com > <http://redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..> > > wrote: > > To add some flavor as to how the commitlog implementation is so quick. > > It only flushes to disk every 10s by default. So writes are effectively > done to memory and then to disk asynchronously later on. This is generally > accepted to be OK, as the write is also going to other nodes. > > You can of course change this behavior to flush on each write or to skip > the commitlog altogether (danger!). This however will change how "safe" > things are from a durability perspective. > > On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com > <http://redir.aspx?REF=CSJmlUdwjTSoe3NQdZNlO6pFPeaI_KxNpZweB-GbDYO8Cvd6wAfUCAFtYWlsdG86amVmZi5qaXJzYUBjcm93ZHN0cmlrZS5jb20.>> > wrote: > > Cassandra writes are particularly fast, for a few reasons: > > > > 1) Most writes go to a commitlog (append-only file, written > linearly, so particularly fast in terms of disk operations) and then pushed > to the memTable. Memtable is flushed in batches to the permanent data > files, so it buffers many mutations and then does a sequential write to > persist that data to disk. > > 2) Reads may have to merge data from many data tables on disk. > Because the writes (described very briefly in step 1) write to immutable > files, updates/deletes have to be merged on read – this is extra effort for > the read path. > > > > If you don’t do much in terms of overwrites/deletes, and your partitions > are particularly small, and your data fits in RAM (probably mmap/page cache > of data files, unless you’re using the row cache), reads may be very fast > for you. Certainly individual reads on low-merge workloads can be < 0.1ms. > > > > - Jeff > > > > *From: *Vikas Jaiman <er.vikasjai...@gmail.com > <http://redir.aspx?REF=VgqqnBUEzP6sLWofnDxFp3iyHQ4TGCTJL8MbqH0NOUK8Cvd6wAfUCAFtYWlsdG86ZXIudmlrYXNqYWltYW5AZ21haWwuY29t> > > > *Reply-To: *"user@cassandra.apache.org > <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" > <user@cassandra.apache.org > <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..> > > > *Date: *Sunday, November 6, 2016 at 12:42 PM > *To: *"user@cassandra.apache.org > <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" > <user@cassandra.apache.org > <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..> > > > *Subject: *Are Cassandra writes are faster than reads? &
RE: Are Cassandra writes are faster than reads?
Hi, Just found that reducing the batch size below 20 also increases the writing speed and reduction in memory usage(especially for Python driver). Kind regards, Rajesh R From: Ben Bromhead [b...@instaclustr.com] Sent: 07 November 2016 05:44 To: user@cassandra.apache.org Subject: Re: Are Cassandra writes are faster than reads? They can be and it depends on your compaction strategy :) On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac...@gmail.com<redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> wrote: tl;dr? I just want to know if updates are bad for performance, and if so, for how long. On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <b...@instaclustr.com<redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>> wrote: Check out https://wiki.apache.org/cassandra/WritePathForUsers<redir.aspx?REF=z6gebtTM9Bi4b1ZEZqnpcgJOwnifCWloccEOX28F8UC8Cvd6wAfUCAFodHRwczovL3dpa2kuYXBhY2hlLm9yZy9jYXNzYW5kcmEvV3JpdGVQYXRoRm9yVXNlcnM.> for the full gory details. On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com<redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> wrote: How long does it take for updates to get merged / compacted into the main data file? On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com<redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>> wrote: To add some flavor as to how the commitlog implementation is so quick. It only flushes to disk every 10s by default. So writes are effectively done to memory and then to disk asynchronously later on. This is generally accepted to be OK, as the write is also going to other nodes. You can of course change this behavior to flush on each write or to skip the commitlog altogether (danger!). This however will change how "safe" things are from a durability perspective. On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com<redir.aspx?REF=CSJmlUdwjTSoe3NQdZNlO6pFPeaI_KxNpZweB-GbDYO8Cvd6wAfUCAFtYWlsdG86amVmZi5qaXJzYUBjcm93ZHN0cmlrZS5jb20.>> wrote: Cassandra writes are particularly fast, for a few reasons: 1) Most writes go to a commitlog (append-only file, written linearly, so particularly fast in terms of disk operations) and then pushed to the memTable. Memtable is flushed in batches to the permanent data files, so it buffers many mutations and then does a sequential write to persist that data to disk. 2) Reads may have to merge data from many data tables on disk. Because the writes (described very briefly in step 1) write to immutable files, updates/deletes have to be merged on read – this is extra effort for the read path. If you don’t do much in terms of overwrites/deletes, and your partitions are particularly small, and your data fits in RAM (probably mmap/page cache of data files, unless you’re using the row cache), reads may be very fast for you. Certainly individual reads on low-merge workloads can be < 0.1ms. - Jeff From: Vikas Jaiman <er.vikasjai...@gmail.com<redir.aspx?REF=VgqqnBUEzP6sLWofnDxFp3iyHQ4TGCTJL8MbqH0NOUK8Cvd6wAfUCAFtYWlsdG86ZXIudmlrYXNqYWltYW5AZ21haWwuY29t>> Reply-To: "user@cassandra.apache.org<redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" <user@cassandra.apache.org<redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>> Date: Sunday, November 6, 2016 at 12:42 PM To: "user@cassandra.apache.org<redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" <user@cassandra.apache.org<redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>> Subject: Are Cassandra writes are faster than reads? Hi all, Are Cassandra writes are faster than reads ?? If yes, why is this so? I am using consistency 1 and data is in memory. Vikas -- Ben Bromhead CTO | Instaclustr<redir.aspx?REF=N46JHXr59B026V3xSfBozh2xZoVS0DwdAV5Sm_LybJG8Cvd6wAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v> +1 650 284 9692<tel:%2B1%20650%20284%209692> Managed Cassandra / Spark on AWS, Azure and Softlayer -- Ben Bromhead CTO | Instaclustr<redir.aspx?REF=Y61HittTE07k3NR47zwHMClylS3zrPdxkOXCEQRVNWUdbPl6wAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v> +1 650 284 9692<tel:%2B1%20650%20284%209692> Managed Cassandra / Spark on AWS, Azure and Softlayer -- Ben Bromhead CTO | Instaclustr<redir.aspx?REF=Y61HittTE07k3NR47zwHMClylS3zrPdxkOXCEQRVNWUdbPl6wAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v> +1 650
RE: Are Cassandra writes are faster than reads?
Hi, In my case writing is slower using Python driver, using Batch execution and prepared statements. I am looking at different ways to speed it up, as I am trying to write 100 * 200 Million records . Cheers Rajesh R From: Vikas Jaiman [er.vikasjai...@gmail.com] Sent: 07 November 2016 10:43 To: user@cassandra.apache.org Subject: Re: Are Cassandra writes are faster than reads? Thanks Jeff and Ben for the info. On Mon, Nov 7, 2016 at 6:44 AM, Ben Bromhead <b...@instaclustr.com<redir.aspx?REF=ey2mzijIPfoP38WPORdqr52eAUdx5abbX0dZnLhBi3ZpQd62uAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>> wrote: They can be and it depends on your compaction strategy :) On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac...@gmail.com<redir.aspx?REF=y2UbHNoyvav6lpbIQuVob9scj_-eADBmQptG4Uvt5C5pQd62uAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> wrote: tl;dr? I just want to know if updates are bad for performance, and if so, for how long. On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <b...@instaclustr.com<redir.aspx?REF=ey2mzijIPfoP38WPORdqr52eAUdx5abbX0dZnLhBi3ZpQd62uAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>> wrote: Check out https://wiki.apache.org/cassandra/WritePathForUsers<redir.aspx?REF=Oqikfm09AEccf_SL9_unEbJh198hCTPzdyEOxatdaXBpQd62uAfUCAFodHRwczovL3dpa2kuYXBhY2hlLm9yZy9jYXNzYW5kcmEvV3JpdGVQYXRoRm9yVXNlcnM.> for the full gory details. On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com<redir.aspx?REF=y2UbHNoyvav6lpbIQuVob9scj_-eADBmQptG4Uvt5C5pQd62uAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> wrote: How long does it take for updates to get merged / compacted into the main data file? On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com<redir.aspx?REF=ey2mzijIPfoP38WPORdqr52eAUdx5abbX0dZnLhBi3ZpQd62uAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>> wrote: To add some flavor as to how the commitlog implementation is so quick. It only flushes to disk every 10s by default. So writes are effectively done to memory and then to disk asynchronously later on. This is generally accepted to be OK, as the write is also going to other nodes. You can of course change this behavior to flush on each write or to skip the commitlog altogether (danger!). This however will change how "safe" things are from a durability perspective. On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com<redir.aspx?REF=10sTR-XC53MCnCxaOGffDwnNLsWSpMDBGUFRYenqeSxpQd62uAfUCAFtYWlsdG86amVmZi5qaXJzYUBjcm93ZHN0cmlrZS5jb20.>> wrote: Cassandra writes are particularly fast, for a few reasons: 1) Most writes go to a commitlog (append-only file, written linearly, so particularly fast in terms of disk operations) and then pushed to the memTable. Memtable is flushed in batches to the permanent data files, so it buffers many mutations and then does a sequential write to persist that data to disk. 2) Reads may have to merge data from many data tables on disk. Because the writes (described very briefly in step 1) write to immutable files, updates/deletes have to be merged on read – this is extra effort for the read path. If you don’t do much in terms of overwrites/deletes, and your partitions are particularly small, and your data fits in RAM (probably mmap/page cache of data files, unless you’re using the row cache), reads may be very fast for you. Certainly individual reads on low-merge workloads can be < 0.1ms. - Jeff From: Vikas Jaiman <er.vikasjai...@gmail.com<redir.aspx?REF=qhOUWUvNa2wfs0uwEblsPbLhZd7IlBDrvIA51F6ZYpBpQd62uAfUCAFtYWlsdG86ZXIudmlrYXNqYWltYW5AZ21haWwuY29t>> Reply-To: "user@cassandra.apache.org<redir.aspx?REF=lHYm4DqZQSPlb4r_E0nu-vqqh9-x0l01cgX0d9aUWUFpQd62uAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" <user@cassandra.apache.org<redir.aspx?REF=lHYm4DqZQSPlb4r_E0nu-vqqh9-x0l01cgX0d9aUWUFpQd62uAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>> Date: Sunday, November 6, 2016 at 12:42 PM To: "user@cassandra.apache.org<redir.aspx?REF=lHYm4DqZQSPlb4r_E0nu-vqqh9-x0l01cgX0d9aUWUFpQd62uAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" <user@cassandra.apache.org<redir.aspx?REF=lHYm4DqZQSPlb4r_E0nu-vqqh9-x0l01cgX0d9aUWUFpQd62uAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>> Subject: Are Cassandra writes are faster than reads? Hi all, Are Cassandra writes are faster than reads ?? If yes, why is this so? I am using consistency 1 and data is in memory. Vikas -- Ben Bromhead CTO | Instaclustr<redir.aspx?REF=LADWmNdI1Cf1wI5U3Sp0ZqCWl66NSFTd0-qqd0iNvPdpQd62uAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v> +1 650 284 9692<tel:%2B1%20650%20284%209692> Managed Cassandra / Spark on AWS, Azure and Softlayer -- Ben Bromhead CTO | Instaclustr<redir.aspx?REF=LADWmNdI1Cf1wI5U3Sp0ZqCWl66NSFTd0-qqd0iNvPdpQd62uAfUCAFodHRwczovL3d3d
Re: Are Cassandra writes are faster than reads?
Thanks Jeff and Ben for the info. On Mon, Nov 7, 2016 at 6:44 AM, Ben Bromhead <b...@instaclustr.com> wrote: > They can be and it depends on your compaction strategy :) > > On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac...@gmail.com> wrote: > >> tl;dr? I just want to know if updates are bad for performance, and if so, >> for how long. >> >> On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <b...@instaclustr.com> >> wrote: >> >> Check out https://wiki.apache.org/cassandra/WritePathForUsers for the >> full gory details. >> >> On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com> wrote: >> >> How long does it take for updates to get merged / compacted into the main >> data file? >> >> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com> wrote: >> >> To add some flavor as to how the commitlog implementation is so quick. >> >> It only flushes to disk every 10s by default. So writes are effectively >> done to memory and then to disk asynchronously later on. This is generally >> accepted to be OK, as the write is also going to other nodes. >> >> You can of course change this behavior to flush on each write or to skip >> the commitlog altogether (danger!). This however will change how "safe" >> things are from a durability perspective. >> >> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: >> >> Cassandra writes are particularly fast, for a few reasons: >> >> >> >> 1) Most writes go to a commitlog (append-only file, written >> linearly, so particularly fast in terms of disk operations) and then pushed >> to the memTable. Memtable is flushed in batches to the permanent data >> files, so it buffers many mutations and then does a sequential write to >> persist that data to disk. >> >> 2) Reads may have to merge data from many data tables on disk. >> Because the writes (described very briefly in step 1) write to immutable >> files, updates/deletes have to be merged on read – this is extra effort for >> the read path. >> >> >> >> If you don’t do much in terms of overwrites/deletes, and your partitions >> are particularly small, and your data fits in RAM (probably mmap/page cache >> of data files, unless you’re using the row cache), reads may be very fast >> for you. Certainly individual reads on low-merge workloads can be < 0.1ms. >> >> >> >> - Jeff >> >> >> >> *From: *Vikas Jaiman <er.vikasjai...@gmail.com> >> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Date: *Sunday, November 6, 2016 at 12:42 PM >> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *Are Cassandra writes are faster than reads? >> >> >> >> Hi all, >> >> >> >> Are Cassandra writes are faster than reads ?? If yes, why is this so? I >> am using consistency 1 and data is in memory. >> >> >> >> Vikas >> >> -- >> Ben Bromhead >> CTO | Instaclustr <https://www.instaclustr.com/> >> +1 650 284 9692 >> Managed Cassandra / Spark on AWS, Azure and Softlayer >> >> >> -- >> Ben Bromhead >> CTO | Instaclustr <https://www.instaclustr.com/> >> +1 650 284 9692 >> Managed Cassandra / Spark on AWS, Azure and Softlayer >> >> >> -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer >
Re: Are Cassandra writes are faster than reads?
They can be and it depends on your compaction strategy :) On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac...@gmail.com> wrote: > tl;dr? I just want to know if updates are bad for performance, and if so, > for how long. > > On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <b...@instaclustr.com> wrote: > > Check out https://wiki.apache.org/cassandra/WritePathForUsers for the > full gory details. > > On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com> wrote: > > How long does it take for updates to get merged / compacted into the main > data file? > > On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com> wrote: > > To add some flavor as to how the commitlog implementation is so quick. > > It only flushes to disk every 10s by default. So writes are effectively > done to memory and then to disk asynchronously later on. This is generally > accepted to be OK, as the write is also going to other nodes. > > You can of course change this behavior to flush on each write or to skip > the commitlog altogether (danger!). This however will change how "safe" > things are from a durability perspective. > > On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > > Cassandra writes are particularly fast, for a few reasons: > > > > 1) Most writes go to a commitlog (append-only file, written > linearly, so particularly fast in terms of disk operations) and then pushed > to the memTable. Memtable is flushed in batches to the permanent data > files, so it buffers many mutations and then does a sequential write to > persist that data to disk. > > 2) Reads may have to merge data from many data tables on disk. > Because the writes (described very briefly in step 1) write to immutable > files, updates/deletes have to be merged on read – this is extra effort for > the read path. > > > > If you don’t do much in terms of overwrites/deletes, and your partitions > are particularly small, and your data fits in RAM (probably mmap/page cache > of data files, unless you’re using the row cache), reads may be very fast > for you. Certainly individual reads on low-merge workloads can be < 0.1ms. > > > > - Jeff > > > > *From: *Vikas Jaiman <er.vikasjai...@gmail.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Sunday, November 6, 2016 at 12:42 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Are Cassandra writes are faster than reads? > > > > Hi all, > > > > Are Cassandra writes are faster than reads ?? If yes, why is this so? I am > using consistency 1 and data is in memory. > > > > Vikas > > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer > > > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer > > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Are Cassandra writes are faster than reads?
tl;dr? I just want to know if updates are bad for performance, and if so, for how long. On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <b...@instaclustr.com> wrote: > Check out https://wiki.apache.org/cassandra/WritePathForUsers for the > full gory details. > > On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com> wrote: > >> How long does it take for updates to get merged / compacted into the main >> data file? >> >> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com> wrote: >> >> To add some flavor as to how the commitlog implementation is so quick. >> >> It only flushes to disk every 10s by default. So writes are effectively >> done to memory and then to disk asynchronously later on. This is generally >> accepted to be OK, as the write is also going to other nodes. >> >> You can of course change this behavior to flush on each write or to skip >> the commitlog altogether (danger!). This however will change how "safe" >> things are from a durability perspective. >> >> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: >> >> Cassandra writes are particularly fast, for a few reasons: >> >> >> >> 1) Most writes go to a commitlog (append-only file, written >> linearly, so particularly fast in terms of disk operations) and then pushed >> to the memTable. Memtable is flushed in batches to the permanent data >> files, so it buffers many mutations and then does a sequential write to >> persist that data to disk. >> >> 2) Reads may have to merge data from many data tables on disk. >> Because the writes (described very briefly in step 1) write to immutable >> files, updates/deletes have to be merged on read – this is extra effort for >> the read path. >> >> >> >> If you don’t do much in terms of overwrites/deletes, and your partitions >> are particularly small, and your data fits in RAM (probably mmap/page cache >> of data files, unless you’re using the row cache), reads may be very fast >> for you. Certainly individual reads on low-merge workloads can be < 0.1ms. >> >> >> >> - Jeff >> >> >> >> *From: *Vikas Jaiman <er.vikasjai...@gmail.com> >> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Date: *Sunday, November 6, 2016 at 12:42 PM >> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *Are Cassandra writes are faster than reads? >> >> >> >> Hi all, >> >> >> >> Are Cassandra writes are faster than reads ?? If yes, why is this so? I >> am using consistency 1 and data is in memory. >> >> >> >> Vikas >> >> -- >> Ben Bromhead >> CTO | Instaclustr <https://www.instaclustr.com/> >> +1 650 284 9692 >> Managed Cassandra / Spark on AWS, Azure and Softlayer >> >> >> -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer >
Re: Are Cassandra writes are faster than reads?
Check out https://wiki.apache.org/cassandra/WritePathForUsers for the full gory details. On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com> wrote: > How long does it take for updates to get merged / compacted into the main > data file? > > On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com> wrote: > > To add some flavor as to how the commitlog implementation is so quick. > > It only flushes to disk every 10s by default. So writes are effectively > done to memory and then to disk asynchronously later on. This is generally > accepted to be OK, as the write is also going to other nodes. > > You can of course change this behavior to flush on each write or to skip > the commitlog altogether (danger!). This however will change how "safe" > things are from a durability perspective. > > On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > > Cassandra writes are particularly fast, for a few reasons: > > > > 1) Most writes go to a commitlog (append-only file, written > linearly, so particularly fast in terms of disk operations) and then pushed > to the memTable. Memtable is flushed in batches to the permanent data > files, so it buffers many mutations and then does a sequential write to > persist that data to disk. > > 2) Reads may have to merge data from many data tables on disk. > Because the writes (described very briefly in step 1) write to immutable > files, updates/deletes have to be merged on read – this is extra effort for > the read path. > > > > If you don’t do much in terms of overwrites/deletes, and your partitions > are particularly small, and your data fits in RAM (probably mmap/page cache > of data files, unless you’re using the row cache), reads may be very fast > for you. Certainly individual reads on low-merge workloads can be < 0.1ms. > > > > - Jeff > > > > *From: *Vikas Jaiman <er.vikasjai...@gmail.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Sunday, November 6, 2016 at 12:42 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Are Cassandra writes are faster than reads? > > > > Hi all, > > > > Are Cassandra writes are faster than reads ?? If yes, why is this so? I am > using consistency 1 and data is in memory. > > > > Vikas > > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer > > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Are Cassandra writes are faster than reads?
How long does it take for updates to get merged / compacted into the main data file? On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com> wrote: > To add some flavor as to how the commitlog implementation is so quick. > > It only flushes to disk every 10s by default. So writes are effectively > done to memory and then to disk asynchronously later on. This is generally > accepted to be OK, as the write is also going to other nodes. > > You can of course change this behavior to flush on each write or to skip > the commitlog altogether (danger!). This however will change how "safe" > things are from a durability perspective. > > On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > >> Cassandra writes are particularly fast, for a few reasons: >> >> >> >> 1) Most writes go to a commitlog (append-only file, written >> linearly, so particularly fast in terms of disk operations) and then pushed >> to the memTable. Memtable is flushed in batches to the permanent data >> files, so it buffers many mutations and then does a sequential write to >> persist that data to disk. >> >> 2) Reads may have to merge data from many data tables on disk. >> Because the writes (described very briefly in step 1) write to immutable >> files, updates/deletes have to be merged on read – this is extra effort for >> the read path. >> >> >> >> If you don’t do much in terms of overwrites/deletes, and your partitions >> are particularly small, and your data fits in RAM (probably mmap/page cache >> of data files, unless you’re using the row cache), reads may be very fast >> for you. Certainly individual reads on low-merge workloads can be < 0.1ms. >> >> >> >> - Jeff >> >> >> >> *From: *Vikas Jaiman <er.vikasjai...@gmail.com> >> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Date: *Sunday, November 6, 2016 at 12:42 PM >> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *Are Cassandra writes are faster than reads? >> >> >> >> Hi all, >> >> >> >> Are Cassandra writes are faster than reads ?? If yes, why is this so? I >> am using consistency 1 and data is in memory. >> >> >> >> Vikas >> > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer >
Re: Are Cassandra writes are faster than reads?
To add some flavor as to how the commitlog implementation is so quick. It only flushes to disk every 10s by default. So writes are effectively done to memory and then to disk asynchronously later on. This is generally accepted to be OK, as the write is also going to other nodes. You can of course change this behavior to flush on each write or to skip the commitlog altogether (danger!). This however will change how "safe" things are from a durability perspective. On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > Cassandra writes are particularly fast, for a few reasons: > > > > 1) Most writes go to a commitlog (append-only file, written > linearly, so particularly fast in terms of disk operations) and then pushed > to the memTable. Memtable is flushed in batches to the permanent data > files, so it buffers many mutations and then does a sequential write to > persist that data to disk. > > 2) Reads may have to merge data from many data tables on disk. > Because the writes (described very briefly in step 1) write to immutable > files, updates/deletes have to be merged on read – this is extra effort for > the read path. > > > > If you don’t do much in terms of overwrites/deletes, and your partitions > are particularly small, and your data fits in RAM (probably mmap/page cache > of data files, unless you’re using the row cache), reads may be very fast > for you. Certainly individual reads on low-merge workloads can be < 0.1ms. > > > > - Jeff > > > > *From: *Vikas Jaiman <er.vikasjai...@gmail.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Sunday, November 6, 2016 at 12:42 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Are Cassandra writes are faster than reads? > > > > Hi all, > > > > Are Cassandra writes are faster than reads ?? If yes, why is this so? I am > using consistency 1 and data is in memory. > > > > Vikas > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Are Cassandra writes are faster than reads?
Cassandra writes are particularly fast, for a few reasons: 1) Most writes go to a commitlog (append-only file, written linearly, so particularly fast in terms of disk operations) and then pushed to the memTable. Memtable is flushed in batches to the permanent data files, so it buffers many mutations and then does a sequential write to persist that data to disk. 2) Reads may have to merge data from many data tables on disk. Because the writes (described very briefly in step 1) write to immutable files, updates/deletes have to be merged on read – this is extra effort for the read path. If you don’t do much in terms of overwrites/deletes, and your partitions are particularly small, and your data fits in RAM (probably mmap/page cache of data files, unless you’re using the row cache), reads may be very fast for you. Certainly individual reads on low-merge workloads can be < 0.1ms. - Jeff From: Vikas Jaiman <er.vikasjai...@gmail.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Sunday, November 6, 2016 at 12:42 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Are Cassandra writes are faster than reads? Hi all, Are Cassandra writes are faster than reads ?? If yes, why is this so? I am using consistency 1 and data is in memory. Vikas smime.p7s Description: S/MIME cryptographic signature
Are Cassandra writes are faster than reads?
Hi all, Are Cassandra writes are faster than reads ?? If yes, why is this so? I am using consistency 1 and data is in memory. Vikas