[ https://issues.apache.org/jira/browse/BOOKKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600592#comment-15600592 ]
Venkateswararao Jujjuri (JV) commented on BOOKKEEPER-934: --------------------------------------------------------- Email discussion: Gmail Venkateswara Rao Jujjuri <jujj...@gmail.com> Improve Write performance with Relax durability. Jia Zhai <zhaiji...@gmail.com> Thu, Aug 18, 2016 at 8:56 AM To: dev@bookkeeper.apache.org Cc: Enrico Olivelli <eolive...@gmail.com>, Venkateswara Rao Jujjuri <jujj...@gmail.com>, distributedlog-user <distributedlog-u...@googlegroups.com> Thanks a lot for taking care and providing this use case. On Wed, Aug 10, 2016 at 3:53 AM, Sijie Guo <si...@apache.org> wrote: On Wed, Aug 3, 2016 at 12:51 PM, Enrico Olivelli <eolive...@gmail.com> wrote: > Hi Jia, > I have another similar use case for this feature. > Let it be a ledger a db transaction log. > The client issues a sequence of data manipulation instructions inside the > scope of the transaction, if everything goes well a commit is finally added > to the sequence. From the client perspective it is important to wait for > sync only for the last entry, that is the 'commit'. > In my case all the entries will be added with sync=false and then the last > with sync=true. But it is important that the addentry with sync returns > only if all the previous entries of the same sequence or of the same ledger > have been written to stable storage. > Yup, I think that's a common usage pattern. > In this case I see the real challenge is that entries span multiple > bookies and it will be very hard to coordinate such a sync > Does making ensemble size equal to ack quorum size work here? > At the moment for my projects is not very urgent but I think that it could > be an useful feature > > Enrico > > Il Gio 9 Giu 2016 16:07 Jia Zhai <zhaiji...@gmail.com> ha scritto: > >> Thanks a lot for all of your suggestions,I would like to have a try, and >> will open a jira ticket, and make the proposal, discussion and testing >> there. >> >> On Wed, Jun 8, 2016 at 1:40 PM, Sijie Guo <guosi...@gmail.com> wrote: >> >> > I think that's a fair consideration. However I am thinking if we allow >> > non-durable ledger, that means 1) application needs to handle the >> missing >> > entries; 2) the re-replication should handle non-durable ledger by >> ignoring >> > the non-existing entries if they are missing. >> > >> > But Let's see how Jia is proposing. >> > >> > - Sijie >> > >> > On Fri, Jun 3, 2016 at 8:57 AM, Venkateswara Rao Jujjuri < >> > jujj...@gmail.com> wrote: >> > >> >> @sijie let me expand what I mean by " this changes something >> fundamental " >> >> >> >> Everything starts that we are not persisting. Also I share lot of the >> >> points raised by @Matteo. >> >> >> >> - In theory, we could loose all copies of EntryId X but persist EntryId >> >> X+Y. How does reads,replication, consistency cope up with it? >> >> - We could advance LAC, but loose last last set of entries. What do we >> >> do? do we adjust LAC? at what boundaries? >> >> - One of the core principles of LOG is, if entry X is there , all the >> >> entries up until X are available too, with this we may need to deal >> with >> >> sparse / missing entries. >> >> >> >> I believe this is more of a direction towards making BooKKeeper >> in-memory >> >> log, but I am afraid it is more of a core change. >> >> >> >> Thanks, >> >> JV >> >> >> >> On Fri, Jun 3, 2016 at 12:05 AM, Matteo Merli <mme...@apache.org> >> wrote: >> >> >> >>> I was interested in trying something in this area, but never actually >> got >> >>> to do it. >> >>> >> >>> A few random notes: >> >>> >> >>> 1. My suspicion, with no backing data at this point, is that simply >> >>> skipping the fsync >> >>> for "non-durable" ledgers might not give a big improvement, just a >> >>> bit >> >>> less latency >> >>> for non-fsynced writes but roughly the same throughput. Imagine a >> >>> bookie >> >>> receiving writes for 2 ledgers, 1 durable and the other >> non-durable. >> >>> Since the entries are appended to the journal as they come in, the >> >>> fsync() for the >> >>> durable ledger write will also carry on the data for the previous >> >>> non-durable ledger >> >>> write, causing more IOPS if that was spanning a different disk >> block. >> >>> Given that the bookie throughput is typically limited by the IOPS >> >>> capacity of the >> >>> journal device, having non-durable write might help that much. >> >>> >> >>> 2. The other options I was thinking were : >> >>> - Do not append the non-durable entries to journal (redundancy >> is >> >>> anyway given by >> >>> writing to multiple bookies). In this case though, a single >> >>> bookie >> >>> could loose more >> >>> entries depending on flushTime, and also could loose entries >> even >> >>> in case of >> >>> process crash, not just kernel-panic or power-outage. >> >>> >> >>> - Use a separate journal for non-durable writes which will not be >> >>> fsynced() >> >>> >> >>> - Configure the durability at the bookie level and then use >> >>> placement/isolation policy to choose the >> >>> appropriate set of bookies for a non-durable ledger. >> >>> >> >>> 3. How do bookie replication will operate when getting read-errors? >> >>> >> >>> Matteo >> >>> >> >>> On Thu, Jun 2, 2016 at 11:09 PM Sijie Guo <si...@apache.org> wrote: >> >>> >> >>> > I think if a ledger is configured to be non-durable, it is kind of >> >>> > application's responsibility to tolerant the data loss. >> >>> > So I don't think it actually will have to change any in the >> bookkeeper >> >>> > client side. >> >>> > >> >>> > - Sijie >> >>> > >> >>> > On Thu, Jun 2, 2016 at 7:29 AM, Venkateswara Rao Jujjuri < >> >>> > jujj...@gmail.com> >> >>> > wrote: >> >>> > >> >>> > > I agree that we must make this ledger property not perEntry write >> >>> > property. >> >>> > > >> >>> > > But, biggest doubt in my mind is - this changes something >> >>> fundamental. >> >>> > LAC. >> >>> > > Are we allowing sparse ledger? in failure scenario? Handling read >> >>> side >> >>> > may >> >>> > > become more complex. >> >>> > > >> >>> > > On Thu, Jun 2, 2016 at 12:19 AM, Sijie Guo <guosi...@gmail.com> >> >>> wrote: >> >>> > > >> >>> > >> This seems interesting to me. However, it might be safe to start >> >>> with a >> >>> > >> flag configured per ledger, rather than per entry. Also, it >> would be >> >>> > good >> >>> > >> to hear the opinions from other people. JV, Matteo? (If I >> remembered >> >>> > >> correctly, Matteo mentioned that Yahoo might be working on >> similar >> >>> > thing) >> >>> > >> >> >>> > >> +1 for creating a BOOKKEEPER jira to track this. >> >>> > >> >> >>> > >> - Sijie >> >>> > >> >> >>> > >> On Wed, Jun 1, 2016 at 6:37 PM, Jia Zhai <zhaiji...@gmail.com> >> >>> wrote: >> >>> > >> >> >>> > >> > + distributedlog-user >> >>> > >> > For more input and comments. :) >> >>> > >> > >> >>> > >> > Thanks. >> >>> > >> > >> >>> > >> > On Thu, Jun 2, 2016 at 9:34 AM, Jia Zhai <zhaiji...@gmail.com> >> >>> wrote: >> >>> > >> > >> >>> > >> >> Hello all, >> >>> > >> >> >> >>> > >> >> I am wondering do you guys have any plans on supporting relax >> >>> > >> durability. >> >>> > >> >> Is it a good feature to have in bookkeeper (also for >> >>> DistributedLog)? >> >>> > >> >> >> >>> > >> >> I am thinking adding a new flag to bookkeeper#addEntry(..., >> >>> Boolean >> >>> > >> >> sync). So the application can control whether to sync or not >> for >> >>> > >> individual >> >>> > >> >> entries. >> >>> > >> >> >> >>> > >> >> - On the write protocol, adding a flag to indicate whether >> this >> >>> write >> >>> > >> >> should sync to disk or not. >> >>> > >> >> - On the bookie side, if the addEntry request is sync, going >> >>> through >> >>> > >> >> original pipeline. If the addEntry disables sync, complete >> >>> the add >> >>> > >> >> callbacks after writing to the journal file and before >> flushing >> >>> > >> journal. >> >>> > >> >> - Those add entries (disabled syncs) will be flushed to disks >> >>> with >> >>> > >> >> subsequent sync add entries. >> >>> > >> >> >> >>> > >> >> To my use cases on DistributedLog, this feature can be used >> for >> >>> > >> >> supporting streams that don't have strong durability >> >>> requirements. >> >>> > >> >> >> >>> > >> >> What do you guys think? Shall I create a jira to implement >> this? >> >>> > >> >> >> >>> > >> >> Thanks a lot >> >>> > >> >> -Jia >> >>> > >> >> >> >>> > >> > >> >>> > >> > -- >> >>> > >> > You received this message because you are subscribed to the >> Google >> >>> > >> Groups >> >>> > >> > "distributedlog-user" group. >> >>> > >> > To unsubscribe from this group and stop receiving emails from >> it, >> >>> send >> >>> > >> an >> >>> > >> > email to distributedlog-user+unsubscr...@googlegroups.com. >> >>> > >> > To post to this group, send email to >> >>> > >> distributedlog-u...@googlegroups.com. >> >>> > >> > To view this discussion on the web visit >> >>> > >> > >> >>> > >> >> >>> > >> >>> https://groups.google.com/d/msgid/distributedlog-user/CALsc% >> 2BXpJj3YT47bognhmEhHmahJkCgJUUY6Un4HVczfK_1MxPQ%40mail.gmail.com >> >>> > >> > < >> >>> > >> >> >>> > >> >>> https://groups.google.com/d/msgid/distributedlog-user/CALsc% >> 2BXpJj3YT47bognhmEhHmahJkCgJUUY6Un4HVczfK_1MxPQ%40mail. >> gmail.com?utm_medium=email&utm_source=footer >> >>> > >> > >> >>> > >> > . >> >>> > >> > For more options, visit https://groups.google.com/d/optout. >> >>> > >> > >> >>> > >> >> >>> > > >> >>> > > >> >>> > > >> >>> > > -- >> >>> > > Jvrao >> >>> > > --- >> >>> > > First they ignore you, then they laugh at you, then they fight >> you, >> >>> then >> >>> > > you win. - Mahatma Gandhi >> >>> > > >> >>> > > >> >>> > > -- >> >>> > > You received this message because you are subscribed to the Google >> >>> Groups >> >>> > > "distributedlog-user" group. >> >>> > > To unsubscribe from this group and stop receiving emails from it, >> >>> send an >> >>> > > email to distributedlog-user+unsubscr...@googlegroups.com. >> >>> > > To post to this group, send email to >> >>> > distributedlog-u...@googlegroups.com. >> >>> > > To view this discussion on the web visit >> >>> > > >> >>> > >> >>> https://groups.google.com/d/msgid/distributedlog-user/ >> CAKKTCLXLqqW6q3V%2Br%3Dt%3DdOhq-gue_fWNpAgaFrMXw% >> 3DaCHUFomQ%40mail.gmail.com >> >>> > > < >> >>> > >> >>> https://groups.google.com/d/msgid/distributedlog-user/ >> CAKKTCLXLqqW6q3V%2Br%3Dt%3DdOhq-gue_fWNpAgaFrMXw% >> 3DaCHUFomQ%40mail.gmail.com?utm_medium=email&utm_source=footer >> >>> > > >> >>> > > . >> >>> > > >> >>> > > For more options, visit https://groups.google.com/d/optout. >> >>> > > >> >>> > >> >>> >> >> >> >> >> >> >> >> -- >> >> Jvrao >> >> --- >> >> First they ignore you, then they laugh at you, then they fight you, >> then >> >> you win. - Mahatma Gandhi >> >> >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> Groups >> >> "distributedlog-user" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> an >> >> email to distributedlog-user+unsubscr...@googlegroups.com. >> >> To post to this group, send email to distributedlog-user@ >> googlegroups.com >> >> . >> >> To view this discussion on the web visit >> >> https://groups.google.com/d/msgid/distributedlog-user/CAKKTCLXs42QqZY- >> pw0YeL6uYqmDCEiFOxo5%3DRkXwcSg%3DEgrMJA%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/distributedlog-user/ >> CAKKTCLXs42QqZY-pw0YeL6uYqmDCEiFOxo5%3DRkXwcSg%3DEgrMJA%40mail. >> gmail.com?utm_medium=email&utm_source=footer> >> >> . >> >> >> >> For more options, visit https://groups.google.com/d/optout. >> >> >> > >> > >> > -- > > > -- Enrico Olivelli > > Relax durability > ---------------- > > Key: BOOKKEEPER-934 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-934 > Project: Bookkeeper > Issue Type: Improvement > Reporter: Jia Zhai > Assignee: Jia Zhai > > I am thinking adding a new flag to bookkeeper#addEntry(..., Boolean sync). So > the application can control whether to sync or not for individual entries. > - On the write protocol, adding a flag to indicate whether this write should > sync to disk or not. > - On the bookie side, if the addEntry request is sync, going through original > pipeline. If the addEntry disables sync, complete the add callbacks after > writing to the journal file and before flushing journal. > - Those add entries (disabled syncs) will be flushed to disks with subsequent > sync add entries. > There is already a discussion in mail thread, here this ticket could gather > ideas, and provide the discussion materials -- This message was sent by Atlassian JIRA (v6.3.4#6332)