[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600592#comment-15600592
 ] 

Venkateswararao Jujjuri (JV) commented on BOOKKEEPER-934:
---------------------------------------------------------

Email discussion:

Gmail   Venkateswara Rao Jujjuri <jujj...@gmail.com>
Improve Write performance with Relax durability.
Jia Zhai <zhaiji...@gmail.com>  Thu, Aug 18, 2016 at 8:56 AM
To: dev@bookkeeper.apache.org
Cc: Enrico Olivelli <eolive...@gmail.com>, Venkateswara Rao Jujjuri 
<jujj...@gmail.com>, distributedlog-user <distributedlog-u...@googlegroups.com>
Thanks a lot for taking care and providing this use case.

On Wed, Aug 10, 2016 at 3:53 AM, Sijie Guo <si...@apache.org> wrote:

    On Wed, Aug 3, 2016 at 12:51 PM, Enrico Olivelli <eolive...@gmail.com>
    wrote:

    > Hi Jia,
    > I have another similar use case for this feature.
    > Let it be a ledger a db transaction log.
    > The client issues a sequence of data manipulation instructions inside the
    > scope of the transaction, if everything goes well a commit is finally 
added
    > to the sequence. From the client perspective it is important to  wait for
    > sync only for the last entry, that is the 'commit'.
    > In my case all the entries will be added with sync=false and then the last
    > with sync=true. But it is important that the addentry with sync  returns
    > only if all the previous entries of the same sequence or of the same 
ledger
    > have been written to stable storage.
    >
    Yup, I think that's a common usage pattern.



    > In this case I see the real challenge is that entries span multiple
    > bookies and it will be very hard to coordinate such a sync
    >

    Does making ensemble size equal to ack quorum size work here?


    > At the moment for my projects is not very urgent but I think that it could
    > be an useful feature
    >
    > Enrico
    >
    > Il Gio 9 Giu 2016 16:07 Jia Zhai <zhaiji...@gmail.com> ha scritto:
    >
    >> Thanks a lot for all of your suggestions,I would like to have a try, and
    >> will open a jira ticket, and make the proposal, discussion and testing
    >> there.
    >>
    >> On Wed, Jun 8, 2016 at 1:40 PM, Sijie Guo <guosi...@gmail.com> wrote:
    >>
    >> > I think that's a fair consideration. However I am thinking if we allow
    >> > non-durable ledger, that means 1) application needs to handle the
    >> missing
    >> > entries; 2) the re-replication should handle non-durable ledger by
    >> ignoring
    >> > the non-existing entries if they are missing.
    >> >
    >> > But Let's see how Jia is proposing.
    >> >
    >> > - Sijie
    >> >
    >> > On Fri, Jun 3, 2016 at 8:57 AM, Venkateswara Rao Jujjuri <
    >> > jujj...@gmail.com> wrote:
    >> >
    >> >> @sijie let me expand what I mean by " this changes something
    >> fundamental "
    >> >>
    >> >> Everything starts that we are not persisting. Also I share lot of the
    >> >> points raised by @Matteo.
    >> >>
    >> >> - In theory, we could loose all copies of EntryId X but persist 
EntryId
    >> >> X+Y.  How does reads,replication, consistency cope up with it?
    >> >> - We could advance LAC, but loose last last set of entries. What do we
    >> >> do? do we adjust LAC? at what boundaries?
    >> >> - One of the core principles of LOG is, if entry X is there , all the
    >> >> entries up until X are available too, with this we may need to deal
    >> with
    >> >>    sparse / missing entries.
    >> >>
    >> >> I believe this is more of a direction towards making BooKKeeper
    >> in-memory
    >> >> log, but I am afraid it is more of a core change.
    >> >>
    >> >> Thanks,
    >> >> JV
    >> >>
    >> >> On Fri, Jun 3, 2016 at 12:05 AM, Matteo Merli <mme...@apache.org>
    >> wrote:
    >> >>
    >> >>> I was interested in trying something in this area, but never actually
    >> got
    >> >>> to do it.
    >> >>>
    >> >>> A few random notes:
    >> >>>
    >> >>> 1. My suspicion, with no backing data at this point, is that simply
    >> >>> skipping the fsync
    >> >>>     for "non-durable" ledgers might not give a big improvement, just 
a
    >> >>> bit
    >> >>> less latency
    >> >>>     for non-fsynced writes but roughly the same throughput. Imagine a
    >> >>> bookie
    >> >>>     receiving writes for 2 ledgers, 1 durable and the other
    >> non-durable.
    >> >>>     Since the entries are appended to the journal as they come in, 
the
    >> >>> fsync() for the
    >> >>>     durable ledger write will also carry on the data for the previous
    >> >>> non-durable ledger
    >> >>>     write, causing more IOPS if that was spanning a different disk
    >> block.
    >> >>>     Given that the bookie throughput is typically limited by the IOPS
    >> >>> capacity of the
    >> >>>     journal device, having non-durable write might help that much.
    >> >>>
    >> >>> 2.  The other options I was thinking were :
    >> >>>       - Do not append the non-durable entries to journal (redundancy
    >> is
    >> >>> anyway given by
    >> >>>         writing to multiple bookies). In this case though, a single
    >> >>> bookie
    >> >>> could loose more
    >> >>>         entries depending on flushTime, and also could loose entries
    >> even
    >> >>> in case of
    >> >>>         process crash, not just kernel-panic or power-outage.
    >> >>>
    >> >>>     - Use a separate journal for non-durable writes which will not be
    >> >>> fsynced()
    >> >>>
    >> >>>     - Configure the durability at the bookie level and then use
    >> >>> placement/isolation policy to choose the
    >> >>>       appropriate set of bookies for a non-durable ledger.
    >> >>>
    >> >>> 3. How do bookie replication will operate when getting read-errors?
    >> >>>
    >> >>> Matteo
    >> >>>
    >> >>> On Thu, Jun 2, 2016 at 11:09 PM Sijie Guo <si...@apache.org> wrote:
    >> >>>
    >> >>> > I think if a ledger is configured to be non-durable, it is kind of
    >> >>> > application's responsibility to tolerant the data loss.
    >> >>> > So I don't think it actually will have to change any in the
    >> bookkeeper
    >> >>> > client side.
    >> >>> >
    >> >>> > - Sijie
    >> >>> >
    >> >>> > On Thu, Jun 2, 2016 at 7:29 AM, Venkateswara Rao Jujjuri <
    >> >>> > jujj...@gmail.com>
    >> >>> > wrote:
    >> >>> >
    >> >>> > > I agree that we must make this ledger property not perEntry write
    >> >>> > property.
    >> >>> > >
    >> >>> > > But, biggest doubt in my mind is - this changes something
    >> >>> fundamental.
    >> >>> > LAC.
    >> >>> > > Are we allowing sparse ledger? in failure scenario? Handling read
    >> >>> side
    >> >>> > may
    >> >>> > > become more complex.
    >> >>> > >
    >> >>> > > On Thu, Jun 2, 2016 at 12:19 AM, Sijie Guo <guosi...@gmail.com>
    >> >>> wrote:
    >> >>> > >
    >> >>> > >> This seems interesting to me. However, it might be safe to start
    >> >>> with a
    >> >>> > >> flag configured per ledger, rather than per entry. Also, it
    >> would be
    >> >>> > good
    >> >>> > >> to hear the opinions from other people. JV, Matteo? (If I
    >> remembered
    >> >>> > >> correctly, Matteo mentioned that Yahoo might be working on
    >> similar
    >> >>> > thing)
    >> >>> > >>
    >> >>> > >> +1 for creating a BOOKKEEPER jira to track this.
    >> >>> > >>
    >> >>> > >> - Sijie
    >> >>> > >>
    >> >>> > >> On Wed, Jun 1, 2016 at 6:37 PM, Jia Zhai <zhaiji...@gmail.com>
    >> >>> wrote:
    >> >>> > >>
    >> >>> > >> > + distributedlog-user
    >> >>> > >> > For more input and comments. :)
    >> >>> > >> >
    >> >>> > >> > Thanks.
    >> >>> > >> >
    >> >>> > >> > On Thu, Jun 2, 2016 at 9:34 AM, Jia Zhai <zhaiji...@gmail.com>
    >> >>> wrote:
    >> >>> > >> >
    >> >>> > >> >> Hello all,
    >> >>> > >> >>
    >> >>> > >> >> I am wondering do you guys have any plans on supporting relax
    >> >>> > >> durability.
    >> >>> > >> >> Is it a good feature to have in bookkeeper (also for
    >> >>> DistributedLog)?
    >> >>> > >> >>
    >> >>> > >> >> I am thinking adding a new flag to bookkeeper#addEntry(...,
    >> >>> Boolean
    >> >>> > >> >> sync). So the application can control whether to sync or not
    >> for
    >> >>> > >> individual
    >> >>> > >> >> entries.
    >> >>> > >> >>
    >> >>> > >> >> - On the write protocol, adding a flag to indicate whether
    >> this
    >> >>> write
    >> >>> > >> >> should sync to disk or not.
    >> >>> > >> >> - On the bookie side, if the addEntry request is sync, going
    >> >>> through
    >> >>> > >> >> original pipeline. If the addEntry disables sync,    complete
    >> >>> the add
    >> >>> > >> >> callbacks after writing to the journal file and before
    >> flushing
    >> >>> > >> journal.
    >> >>> > >> >> - Those add entries (disabled syncs) will be flushed to disks
    >> >>> with
    >> >>> > >> >> subsequent sync add entries.
    >> >>> > >> >>
    >> >>> > >> >> To my use cases on DistributedLog, this feature can be used
    >> for
    >> >>> > >> >> supporting streams that don't have strong durability
    >> >>> requirements.
    >> >>> > >> >>
    >> >>> > >> >> What do you guys think? Shall I create a jira to implement
    >> this?
    >> >>> > >> >>
    >> >>> > >> >> Thanks a lot
    >> >>> > >> >> -Jia
    >> >>> > >> >>
    >> >>> > >> >
    >> >>> > >> > --
    >> >>> > >> > You received this message because you are subscribed to the
    >> Google
    >> >>> > >> Groups
    >> >>> > >> > "distributedlog-user" group.
    >> >>> > >> > To unsubscribe from this group and stop receiving emails from
    >> it,
    >> >>> send
    >> >>> > >> an
    >> >>> > >> > email to distributedlog-user+unsubscr...@googlegroups.com.
    >> >>> > >> > To post to this group, send email to
    >> >>> > >> distributedlog-u...@googlegroups.com.
    >> >>> > >> > To view this discussion on the web visit
    >> >>> > >> >
    >> >>> > >>
    >> >>> >
    >> >>> https://groups.google.com/d/msgid/distributedlog-user/CALsc%
    >> 2BXpJj3YT47bognhmEhHmahJkCgJUUY6Un4HVczfK_1MxPQ%40mail.gmail.com
    >> >>> > >> > <
    >> >>> > >>
    >> >>> >
    >> >>> https://groups.google.com/d/msgid/distributedlog-user/CALsc%
    >> 2BXpJj3YT47bognhmEhHmahJkCgJUUY6Un4HVczfK_1MxPQ%40mail.
    >> gmail.com?utm_medium=email&utm_source=footer
    >> >>> > >> >
    >> >>> > >> > .
    >> >>> > >> > For more options, visit https://groups.google.com/d/optout.
    >> >>> > >> >
    >> >>> > >>
    >> >>> > >
    >> >>> > >
    >> >>> > >
    >> >>> > > --
    >> >>> > > Jvrao
    >> >>> > > ---
    >> >>> > > First they ignore you, then they laugh at you, then they fight
    >> you,
    >> >>> then
    >> >>> > > you win. - Mahatma Gandhi
    >> >>> > >
    >> >>> > >
    >> >>> > > --
    >> >>> > > You received this message because you are subscribed to the 
Google
    >> >>> Groups
    >> >>> > > "distributedlog-user" group.
    >> >>> > > To unsubscribe from this group and stop receiving emails from it,
    >> >>> send an
    >> >>> > > email to distributedlog-user+unsubscr...@googlegroups.com.
    >> >>> > > To post to this group, send email to
    >> >>> > distributedlog-u...@googlegroups.com.
    >> >>> > > To view this discussion on the web visit
    >> >>> > >
    >> >>> >
    >> >>> https://groups.google.com/d/msgid/distributedlog-user/
    >> CAKKTCLXLqqW6q3V%2Br%3Dt%3DdOhq-gue_fWNpAgaFrMXw%
    >> 3DaCHUFomQ%40mail.gmail.com
    >> >>> > > <
    >> >>> >
    >> >>> https://groups.google.com/d/msgid/distributedlog-user/
    >> CAKKTCLXLqqW6q3V%2Br%3Dt%3DdOhq-gue_fWNpAgaFrMXw%
    >> 3DaCHUFomQ%40mail.gmail.com?utm_medium=email&utm_source=footer
    >> >>> > >
    >> >>> > > .
    >> >>> > >
    >> >>> > > For more options, visit https://groups.google.com/d/optout.
    >> >>> > >
    >> >>> >
    >> >>>
    >> >>
    >> >>
    >> >>
    >> >> --
    >> >> Jvrao
    >> >> ---
    >> >> First they ignore you, then they laugh at you, then they fight you,
    >> then
    >> >> you win. - Mahatma Gandhi
    >> >>
    >> >>
    >> >> --
    >> >> You received this message because you are subscribed to the Google
    >> Groups
    >> >> "distributedlog-user" group.
    >> >> To unsubscribe from this group and stop receiving emails from it, send
    >> an
    >> >> email to distributedlog-user+unsubscr...@googlegroups.com.
    >> >> To post to this group, send email to distributedlog-user@
    >> googlegroups.com
    >> >> .
    >> >> To view this discussion on the web visit
    >> >> https://groups.google.com/d/msgid/distributedlog-user/CAKKTCLXs42QqZY-
    >> pw0YeL6uYqmDCEiFOxo5%3DRkXwcSg%3DEgrMJA%40mail.gmail.com
    >> >> <https://groups.google.com/d/msgid/distributedlog-user/
    >> CAKKTCLXs42QqZY-pw0YeL6uYqmDCEiFOxo5%3DRkXwcSg%3DEgrMJA%40mail.
    >> gmail.com?utm_medium=email&utm_source=footer>
    >> >> .
    >> >>
    >> >> For more options, visit https://groups.google.com/d/optout.
    >> >>
    >> >
    >> >
    >>
    > --
    >
    >
    > -- Enrico Olivelli
    >




> Relax durability
> ----------------
>
>                 Key: BOOKKEEPER-934
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-934
>             Project: Bookkeeper
>          Issue Type: Improvement
>            Reporter: Jia Zhai
>            Assignee: Jia Zhai
>
> I am thinking adding a new flag to bookkeeper#addEntry(..., Boolean sync). So 
> the application can control whether to sync or not for individual entries.
> - On the write protocol, adding a flag to indicate whether this write should 
> sync to disk or not.
> - On the bookie side, if the addEntry request is sync, going through original 
> pipeline. If the addEntry disables sync,    complete the add callbacks after 
> writing to the journal file and before flushing journal.
> - Those add entries (disabled syncs) will be flushed to disks with subsequent 
> sync add entries.
> There is already a discussion in mail thread, here this ticket could gather 
> ideas, and provide the discussion materials



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to