[
https://issues.apache.org/jira/browse/BOOKKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600592#comment-15600592
]
Venkateswararao Jujjuri (JV) commented on BOOKKEEPER-934:
---------------------------------------------------------
Email discussion:
Gmail Venkateswara Rao Jujjuri <[email protected]>
Improve Write performance with Relax durability.
Jia Zhai <[email protected]> Thu, Aug 18, 2016 at 8:56 AM
To: [email protected]
Cc: Enrico Olivelli <[email protected]>, Venkateswara Rao Jujjuri
<[email protected]>, distributedlog-user <[email protected]>
Thanks a lot for taking care and providing this use case.
On Wed, Aug 10, 2016 at 3:53 AM, Sijie Guo <[email protected]> wrote:
On Wed, Aug 3, 2016 at 12:51 PM, Enrico Olivelli <[email protected]>
wrote:
> Hi Jia,
> I have another similar use case for this feature.
> Let it be a ledger a db transaction log.
> The client issues a sequence of data manipulation instructions inside the
> scope of the transaction, if everything goes well a commit is finally
added
> to the sequence. From the client perspective it is important to wait for
> sync only for the last entry, that is the 'commit'.
> In my case all the entries will be added with sync=false and then the last
> with sync=true. But it is important that the addentry with sync returns
> only if all the previous entries of the same sequence or of the same
ledger
> have been written to stable storage.
>
Yup, I think that's a common usage pattern.
> In this case I see the real challenge is that entries span multiple
> bookies and it will be very hard to coordinate such a sync
>
Does making ensemble size equal to ack quorum size work here?
> At the moment for my projects is not very urgent but I think that it could
> be an useful feature
>
> Enrico
>
> Il Gio 9 Giu 2016 16:07 Jia Zhai <[email protected]> ha scritto:
>
>> Thanks a lot for all of your suggestions,I would like to have a try, and
>> will open a jira ticket, and make the proposal, discussion and testing
>> there.
>>
>> On Wed, Jun 8, 2016 at 1:40 PM, Sijie Guo <[email protected]> wrote:
>>
>> > I think that's a fair consideration. However I am thinking if we allow
>> > non-durable ledger, that means 1) application needs to handle the
>> missing
>> > entries; 2) the re-replication should handle non-durable ledger by
>> ignoring
>> > the non-existing entries if they are missing.
>> >
>> > But Let's see how Jia is proposing.
>> >
>> > - Sijie
>> >
>> > On Fri, Jun 3, 2016 at 8:57 AM, Venkateswara Rao Jujjuri <
>> > [email protected]> wrote:
>> >
>> >> @sijie let me expand what I mean by " this changes something
>> fundamental "
>> >>
>> >> Everything starts that we are not persisting. Also I share lot of the
>> >> points raised by @Matteo.
>> >>
>> >> - In theory, we could loose all copies of EntryId X but persist
EntryId
>> >> X+Y. How does reads,replication, consistency cope up with it?
>> >> - We could advance LAC, but loose last last set of entries. What do we
>> >> do? do we adjust LAC? at what boundaries?
>> >> - One of the core principles of LOG is, if entry X is there , all the
>> >> entries up until X are available too, with this we may need to deal
>> with
>> >> sparse / missing entries.
>> >>
>> >> I believe this is more of a direction towards making BooKKeeper
>> in-memory
>> >> log, but I am afraid it is more of a core change.
>> >>
>> >> Thanks,
>> >> JV
>> >>
>> >> On Fri, Jun 3, 2016 at 12:05 AM, Matteo Merli <[email protected]>
>> wrote:
>> >>
>> >>> I was interested in trying something in this area, but never actually
>> got
>> >>> to do it.
>> >>>
>> >>> A few random notes:
>> >>>
>> >>> 1. My suspicion, with no backing data at this point, is that simply
>> >>> skipping the fsync
>> >>> for "non-durable" ledgers might not give a big improvement, just
a
>> >>> bit
>> >>> less latency
>> >>> for non-fsynced writes but roughly the same throughput. Imagine a
>> >>> bookie
>> >>> receiving writes for 2 ledgers, 1 durable and the other
>> non-durable.
>> >>> Since the entries are appended to the journal as they come in,
the
>> >>> fsync() for the
>> >>> durable ledger write will also carry on the data for the previous
>> >>> non-durable ledger
>> >>> write, causing more IOPS if that was spanning a different disk
>> block.
>> >>> Given that the bookie throughput is typically limited by the IOPS
>> >>> capacity of the
>> >>> journal device, having non-durable write might help that much.
>> >>>
>> >>> 2. The other options I was thinking were :
>> >>> - Do not append the non-durable entries to journal (redundancy
>> is
>> >>> anyway given by
>> >>> writing to multiple bookies). In this case though, a single
>> >>> bookie
>> >>> could loose more
>> >>> entries depending on flushTime, and also could loose entries
>> even
>> >>> in case of
>> >>> process crash, not just kernel-panic or power-outage.
>> >>>
>> >>> - Use a separate journal for non-durable writes which will not be
>> >>> fsynced()
>> >>>
>> >>> - Configure the durability at the bookie level and then use
>> >>> placement/isolation policy to choose the
>> >>> appropriate set of bookies for a non-durable ledger.
>> >>>
>> >>> 3. How do bookie replication will operate when getting read-errors?
>> >>>
>> >>> Matteo
>> >>>
>> >>> On Thu, Jun 2, 2016 at 11:09 PM Sijie Guo <[email protected]> wrote:
>> >>>
>> >>> > I think if a ledger is configured to be non-durable, it is kind of
>> >>> > application's responsibility to tolerant the data loss.
>> >>> > So I don't think it actually will have to change any in the
>> bookkeeper
>> >>> > client side.
>> >>> >
>> >>> > - Sijie
>> >>> >
>> >>> > On Thu, Jun 2, 2016 at 7:29 AM, Venkateswara Rao Jujjuri <
>> >>> > [email protected]>
>> >>> > wrote:
>> >>> >
>> >>> > > I agree that we must make this ledger property not perEntry write
>> >>> > property.
>> >>> > >
>> >>> > > But, biggest doubt in my mind is - this changes something
>> >>> fundamental.
>> >>> > LAC.
>> >>> > > Are we allowing sparse ledger? in failure scenario? Handling read
>> >>> side
>> >>> > may
>> >>> > > become more complex.
>> >>> > >
>> >>> > > On Thu, Jun 2, 2016 at 12:19 AM, Sijie Guo <[email protected]>
>> >>> wrote:
>> >>> > >
>> >>> > >> This seems interesting to me. However, it might be safe to start
>> >>> with a
>> >>> > >> flag configured per ledger, rather than per entry. Also, it
>> would be
>> >>> > good
>> >>> > >> to hear the opinions from other people. JV, Matteo? (If I
>> remembered
>> >>> > >> correctly, Matteo mentioned that Yahoo might be working on
>> similar
>> >>> > thing)
>> >>> > >>
>> >>> > >> +1 for creating a BOOKKEEPER jira to track this.
>> >>> > >>
>> >>> > >> - Sijie
>> >>> > >>
>> >>> > >> On Wed, Jun 1, 2016 at 6:37 PM, Jia Zhai <[email protected]>
>> >>> wrote:
>> >>> > >>
>> >>> > >> > + distributedlog-user
>> >>> > >> > For more input and comments. :)
>> >>> > >> >
>> >>> > >> > Thanks.
>> >>> > >> >
>> >>> > >> > On Thu, Jun 2, 2016 at 9:34 AM, Jia Zhai <[email protected]>
>> >>> wrote:
>> >>> > >> >
>> >>> > >> >> Hello all,
>> >>> > >> >>
>> >>> > >> >> I am wondering do you guys have any plans on supporting relax
>> >>> > >> durability.
>> >>> > >> >> Is it a good feature to have in bookkeeper (also for
>> >>> DistributedLog)?
>> >>> > >> >>
>> >>> > >> >> I am thinking adding a new flag to bookkeeper#addEntry(...,
>> >>> Boolean
>> >>> > >> >> sync). So the application can control whether to sync or not
>> for
>> >>> > >> individual
>> >>> > >> >> entries.
>> >>> > >> >>
>> >>> > >> >> - On the write protocol, adding a flag to indicate whether
>> this
>> >>> write
>> >>> > >> >> should sync to disk or not.
>> >>> > >> >> - On the bookie side, if the addEntry request is sync, going
>> >>> through
>> >>> > >> >> original pipeline. If the addEntry disables sync, complete
>> >>> the add
>> >>> > >> >> callbacks after writing to the journal file and before
>> flushing
>> >>> > >> journal.
>> >>> > >> >> - Those add entries (disabled syncs) will be flushed to disks
>> >>> with
>> >>> > >> >> subsequent sync add entries.
>> >>> > >> >>
>> >>> > >> >> To my use cases on DistributedLog, this feature can be used
>> for
>> >>> > >> >> supporting streams that don't have strong durability
>> >>> requirements.
>> >>> > >> >>
>> >>> > >> >> What do you guys think? Shall I create a jira to implement
>> this?
>> >>> > >> >>
>> >>> > >> >> Thanks a lot
>> >>> > >> >> -Jia
>> >>> > >> >>
>> >>> > >> >
>> >>> > >> > --
>> >>> > >> > You received this message because you are subscribed to the
>> Google
>> >>> > >> Groups
>> >>> > >> > "distributedlog-user" group.
>> >>> > >> > To unsubscribe from this group and stop receiving emails from
>> it,
>> >>> send
>> >>> > >> an
>> >>> > >> > email to [email protected].
>> >>> > >> > To post to this group, send email to
>> >>> > >> [email protected].
>> >>> > >> > To view this discussion on the web visit
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>> https://groups.google.com/d/msgid/distributedlog-user/CALsc%
>> 2BXpJj3YT47bognhmEhHmahJkCgJUUY6Un4HVczfK_1MxPQ%40mail.gmail.com
>> >>> > >> > <
>> >>> > >>
>> >>> >
>> >>> https://groups.google.com/d/msgid/distributedlog-user/CALsc%
>> 2BXpJj3YT47bognhmEhHmahJkCgJUUY6Un4HVczfK_1MxPQ%40mail.
>> gmail.com?utm_medium=email&utm_source=footer
>> >>> > >> >
>> >>> > >> > .
>> >>> > >> > For more options, visit https://groups.google.com/d/optout.
>> >>> > >> >
>> >>> > >>
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > > --
>> >>> > > Jvrao
>> >>> > > ---
>> >>> > > First they ignore you, then they laugh at you, then they fight
>> you,
>> >>> then
>> >>> > > you win. - Mahatma Gandhi
>> >>> > >
>> >>> > >
>> >>> > > --
>> >>> > > You received this message because you are subscribed to the
Google
>> >>> Groups
>> >>> > > "distributedlog-user" group.
>> >>> > > To unsubscribe from this group and stop receiving emails from it,
>> >>> send an
>> >>> > > email to [email protected].
>> >>> > > To post to this group, send email to
>> >>> > [email protected].
>> >>> > > To view this discussion on the web visit
>> >>> > >
>> >>> >
>> >>> https://groups.google.com/d/msgid/distributedlog-user/
>> CAKKTCLXLqqW6q3V%2Br%3Dt%3DdOhq-gue_fWNpAgaFrMXw%
>> 3DaCHUFomQ%40mail.gmail.com
>> >>> > > <
>> >>> >
>> >>> https://groups.google.com/d/msgid/distributedlog-user/
>> CAKKTCLXLqqW6q3V%2Br%3Dt%3DdOhq-gue_fWNpAgaFrMXw%
>> 3DaCHUFomQ%40mail.gmail.com?utm_medium=email&utm_source=footer
>> >>> > >
>> >>> > > .
>> >>> > >
>> >>> > > For more options, visit https://groups.google.com/d/optout.
>> >>> > >
>> >>> >
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Jvrao
>> >> ---
>> >> First they ignore you, then they laugh at you, then they fight you,
>> then
>> >> you win. - Mahatma Gandhi
>> >>
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups
>> >> "distributedlog-user" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an
>> >> email to [email protected].
>> >> To post to this group, send email to distributedlog-user@
>> googlegroups.com
>> >> .
>> >> To view this discussion on the web visit
>> >> https://groups.google.com/d/msgid/distributedlog-user/CAKKTCLXs42QqZY-
>> pw0YeL6uYqmDCEiFOxo5%3DRkXwcSg%3DEgrMJA%40mail.gmail.com
>> >> <https://groups.google.com/d/msgid/distributedlog-user/
>> CAKKTCLXs42QqZY-pw0YeL6uYqmDCEiFOxo5%3DRkXwcSg%3DEgrMJA%40mail.
>> gmail.com?utm_medium=email&utm_source=footer>
>> >> .
>> >>
>> >> For more options, visit https://groups.google.com/d/optout.
>> >>
>> >
>> >
>>
> --
>
>
> -- Enrico Olivelli
>
> Relax durability
> ----------------
>
> Key: BOOKKEEPER-934
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-934
> Project: Bookkeeper
> Issue Type: Improvement
> Reporter: Jia Zhai
> Assignee: Jia Zhai
>
> I am thinking adding a new flag to bookkeeper#addEntry(..., Boolean sync). So
> the application can control whether to sync or not for individual entries.
> - On the write protocol, adding a flag to indicate whether this write should
> sync to disk or not.
> - On the bookie side, if the addEntry request is sync, going through original
> pipeline. If the addEntry disables sync, complete the add callbacks after
> writing to the journal file and before flushing journal.
> - Those add entries (disabled syncs) will be flushed to disks with subsequent
> sync add entries.
> There is already a discussion in mail thread, here this ticket could gather
> ideas, and provide the discussion materials
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)