Ping

2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eolive...@gmail.com>:

> Hi all,
>
>
> You can find the revised proposal here
> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> BP-14+Relax+durability
>
> The link to the document open for comments is this:
> https://docs.google.com/document/d/1yNi9t2_deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> ERH7LM/edit?usp=sharing
>
> Please check it out
> We are going to review this Proposal at the meeting
>
> -- Enrico
>
>
> 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eolive...@gmail.com>:
>
>> Thank you Sijie for summarizing and thanks to the community for helping
>> in this important enhancement to BookKeeper
>>
>> I am convinced that as JV pointed out we need to declare at ledger
>> creation time that the ledger is going to perform no-sync writes.
>>
>> I think we need an explicit declaration currently to make things "clear"
>> to the developer which is using the LedgerHandle API even and ledger
>> creation tyime.
>>
>> The case is that we are going to forbid "striping" ledgers (ensemble size
>> > quorum size) for no-sync writes in the first implementation:
>> - one option is to  fail at the first no-sync addEntry, but this will be
>> really uncomfortable because usually the ack/write/ensemble sizes are
>> configured by the admin, and there will be configurations in which errors
>> will come out only after starting the system.
>> - the second option is to make the developer explicitly enable no-sync
>> writes at creation time and fail the creation of the ledger if the
>> requested combination of options if not possible
>>
>> I am not sure that the changes to the bookie internals are a Client-API
>> matter, maybe we can leverage custom metadata (as JV said) in order to make
>> the bookie handle ledgers in a different manner, this way will be always
>> open as custom metadata are already here.
>>
>> JV preferred the ledger-type approach, the dual solution is to introduce
>> a list of "capabilities" or "ledger options".
>> I think that this ability to perform no-syc writes is so important that
>> "custom metadata" is not the good place to declare it, same for "ledger
>> type"
>>
>> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger creation
>> time, without writing in to ledger metadata on ZK,
>> I think that if further improvements will need ledger metadata changes we
>> will do.
>>
>> I have updated the BP-14 document, I have added an "Open issues" footer
>> with the open points,
>> please add comments and I will correct the document as soon as possible.
>>
>>
>> Enrico
>>
>>
>>
>>
>> 2017-08-30 1:24 GMT+02:00 Sijie Guo <guosi...@gmail.com>:
>>
>>> Thank you, Enrico, JV.
>>>
>>> These are great discussions.
>>>
>>> After reading these two proposals, I have a few very high-level comments,
>>> dividing into three categories.
>>>
>>>
>>> *API*
>>>
>>> - I think there are not fundamentally differences between these two
>>> proposals.
>>> They are trying to achieve similar goals by exposing durability levels in
>>> different way.
>>> So this will be a discussion on what API/interface should look like from
>>> user / admin perspective.
>>> I would suggest focusing what would be the API itself, putting the
>>> implementation design aside when talking about this.
>>>
>>> *Core*
>>>
>>> - Both proposals need to deal with a core function - what happen to LAC
>>> and
>>> what semantic that bookkeeper provides.
>>> JV did a good summary in his proposal. However I am not a fan of
>>> maintaining two different semantics. So I am looking for
>>> a solution that bookkeeper can only maintain one semantic. The semantic
>>> is
>>> basically:
>>>
>>> 1) LAC only advanced when entries before LAC are committed to the
>>> persistent storage
>>> 2) All the entries until LAC are successfully committed to the
>>> persistence
>>> storage
>>> 3) Entries until LAC: all the entries must be readable all the time.
>>>
>>> If we maintain such semantic, there is no need to change the auto
>>> recovery
>>> protocol in bookkeeper. All what we guarantee are the entries durably
>>> persistent.
>>>
>>> In order to maintain such semantic, I think both me and JV proposed
>>> similar
>>> solution in either proposal. I am trying to finalize one here:
>>>
>>> * bookie maintains a LAS (Last Add Synced) point for each entry.
>>> * LAS can be piggybacked on AddResponses
>>> * Client uses the LAS to advance LAC.
>>>
>>> If we can agree on the core semantic we are going to provide, the other
>>> things are just logistics.
>>>
>>> *Others*
>>>
>>> - Regarding separating journal or bypassing journal, there is no
>>> difference
>>> when we talking from the core semantic. They are all non-durably writes
>>> (acknowledging before fsyncing).
>>> We can start with same journal approach (but just acknowledge before
>>> fsyncing), implement the core and add other options later on.
>>>
>>>
>>> From my point of view, I'd be more interesting in providing a single
>>> consistent durable semantic that application can rely on for both durable
>>> writes and non-durable writes. The other stuffs seem to be more logistics
>>> things.
>>>
>>>
>>> - Sijie
>>>
>>>
>>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eolive...@gmail.com>
>>> wrote:
>>>
>>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <jujj...@gmail.com
>>> >:
>>> >
>>> > > I don't believe I fully followed your second case. But even in this
>>> case,
>>> > > your major concern is about the additional 'sync' RPC?
>>> > >
>>> >
>>> > yes apart from that I am fine with your proposal too, that is to have a
>>> > LedgerType which drives durability
>>> > and I think we need to add per-entry durability options
>>> >
>>> > I think that at least for the 'simple' no-sync addEntry we do not need
>>> to
>>> > change many things, I am drafting a prototype, I will share it as soon
>>> as
>>> > we all agree on the roadmap
>>> >
>>> > The first implementation can cover the first cases (no-sync addEntry)
>>> and
>>> > change the way the writer advances the LAC in order to support 'relaxed
>>> > durability writes'.
>>> > This change will be compatible with future improvements and it will
>>> open
>>> > the door for big changes on the bookie side like bypassing the journal
>>> or
>>> > leveraging multiple journals.....
>>> >
>>> > -- Enrico
>>> >
>>> > or something else that the LedgerType proposal won't work?
>>> > >
>>> >
>>> > >
>>> > >
>>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
>>> eolive...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > I think that having a set of options on the ledger metadata will
>>> be a
>>> > > good
>>> > > > enhancement and I am sure we will do it as soon as it will be
>>> needed,
>>> > > maybe
>>> > > > we do not need it now.
>>> > > >
>>> > > > Actually I think we will need to declare this durability-level at
>>> entry
>>> > > > level to support some uses cases in BP-14 document, let me explain
>>> two
>>> > of
>>> > > > my usecases for which I need it:
>>> > > >
>>> > > > At higher level we have to choices:
>>> > > >
>>> > > > A) per-ledger durability options (JV proposal)
>>> > > > all addEntry operations are durable or non-durable and there is an
>>> > > explicit
>>> > > > 'sync' API (+ forced sync at close)
>>> > > >
>>> > > > B) per-entry durability options (original BP-14 proposal)
>>> > > > every addEntry has an own durable/non-durable option
>>> (sync/no-sync),
>>> > with
>>> > > > the ability to call 'sync' without addEntry (+ forced sync at
>>> close)
>>> > > >
>>> > > > I am speaking about the the database WAL case, I am using the
>>> ledger as
>>> > > > segment for the WAL of a database and I am writing all data
>>> changes in
>>> > > the
>>> > > > scope of a 'transaction' with the relaxed-durability flag, then I
>>> am
>>> > > > writing the 'transaction committed' entry with "strict durability"
>>> > > > requirement, this will in fact require that all previous entries
>>> are
>>> > > > persisted durably and so that the transaction will never be lost.
>>> > > >
>>> > > > In this scenario we would need an addEntry + sync API in fact:
>>> > > >
>>> > > > using option  A) the WAL will look like:
>>> > > > - open ledger no-sync = true
>>> > > > - addEntry (set foo=bar)  (this will be no-sync)
>>> > > > - addEntry (set foo=bar2) (this will be no-sync)
>>> > > > - addEntry (commit)
>>> > > > - sync
>>> > > >
>>> > > > using option B) the WAL will look like
>>> > > > - open ledger
>>> > > > - addEntry (set foo=bar), no-sync
>>> > > > - addEntry (set foo=bar2), no-sync
>>> > > > - addEntry (commit), sync
>>> > > >
>>> > > > in case B) we are "saving" one RPC call to every bookie (the 'sync'
>>> > one)
>>> > > > same for single data change entries, like updating a single record
>>> on
>>> > the
>>> > > > database, this with BK 4.5 "costs" only a single RPC to every
>>> bookie
>>> > > >
>>> > > > Second case:
>>> > > > I am using BookKeeper to store binary objects, so I am packing more
>>> > > > 'objects' (named sequences of bytes) into a single ledger, like
>>> you do
>>> > > when
>>> > > > you write many records to a file in a streaming fashion and keep
>>> track
>>> > of
>>> > > > offsets of the beginning of every record (LedgerHandeAdv is
>>> perfect for
>>> > > > this case).
>>> > > > I am not using a single ledger per 'file' because it kills
>>> zookeeper to
>>> > > > create many ledgers very fast, in my systems I have big busts of
>>> > writes,
>>> > > > which need to be really "fast", so I am writing multiple 'files' to
>>> > every
>>> > > > single ledger. So the close-to-open consistency at ledger level is
>>> not
>>> > > > suitable for this case.
>>> > > > I have to write as fast as possible to this 'ledger-backed'
>>> stream, and
>>> > > as
>>> > > > with a 'traditional'  filesystem I am writing parts of each file
>>> and
>>> > than
>>> > > > requiring 'sync' at the end of each file.
>>> > > > Using BookKeeper you need to split big 'files' into "little"
>>> parts, you
>>> > > > cannot transmit the contents as to "real" stream on network.
>>> > > >
>>> > > > I am not talking about bookie level implementation details I would
>>> like
>>> > > to
>>> > > > define the high level API in order to support all the relevant
>>> known
>>> > use
>>> > > > cases and keep space for the future,
>>> > > > at this moment adding a per-entry 'durability option' seems to be
>>> very
>>> > > > flexible and simple to implement, it does not prevent us from doing
>>> > > further
>>> > > > improvements, like namely skipping the journal.
>>> > > >
>>> > > > Enrico
>>> > > >
>>> > > >
>>> > > >
>>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eolive...@gmail.com>:
>>> > > >
>>> > > > >
>>> > > > >
>>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
>>> > jujj...@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > >> Hi all,
>>> > > > >>
>>> > > > >> As promised during Thursday call, here is my proposal.
>>> > > > >>
>>> > > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
>>> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
>>> > > > >> is
>>> > > > >> making the durability a property of the ledger(type) as opposed
>>> to
>>> > > > >> addEntry(). Rest of the technical details have a lot of
>>> > similarities.
>>> > > > >>
>>> > > > >
>>> > > > > Thank you JV. I have just read quickly the doc and your view is
>>> > > centantly
>>> > > > > broader.
>>> > > > > I will dig into the doc as soon as possible on Monday.
>>> > > > > For me it is ok to have a ledger wide configuration I think that
>>> the
>>> > > most
>>> > > > > important decision is about the API we will provide as in the
>>> future
>>> > it
>>> > > > > will be difficult to change it.
>>> > > > >
>>> > > > >
>>> > > > > Cheers
>>> > > > > Enrico
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqv
>>> Wpq43
>>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
>>> > > > >>
>>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
>>> > eolive...@gmail.com
>>> > > >
>>> > > > >> wrote:
>>> > > > >>
>>> > > > >> > Thank you all for the comments and for taking a look to the
>>> > document
>>> > > > so
>>> > > > >> > soon.
>>> > > > >> > I have updated the doc, we will discuss the document at the
>>> > meeting,
>>> > > > >> >
>>> > > > >> >
>>> > > > >> > Enrico
>>> > > > >> >
>>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <guosi...@gmail.com>:
>>> > > > >> >
>>> > > > >> > > Enrico,
>>> > > > >> > >
>>> > > > >> > > Thank you so much! It is a great effort for putting this up.
>>> > > Overall
>>> > > > >> > looks
>>> > > > >> > > good. I made some comments, we can discuss at tomorrow's
>>> > community
>>> > > > >> > meeting.
>>> > > > >> > >
>>> > > > >> > > - Sijie
>>> > > > >> > >
>>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
>>> > > > eolive...@gmail.com
>>> > > > >> >
>>> > > > >> > > wrote:
>>> > > > >> > >
>>> > > > >> > > > Hi all,
>>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
>>> Durability
>>> > > > >> > > >
>>> > > > >> > > > We are talking about limiting the number of fsync to the
>>> > journal
>>> > > > >> while
>>> > > > >> > > > preserving the correctness of the LAC protocol.
>>> > > > >> > > >
>>> > > > >> > > > This is the link to the wiki page, but as the issue is
>>> huge we
>>> > > > >> prefer
>>> > > > >> > to
>>> > > > >> > > > use Google Documents for sharing comments
>>> > > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>>> > > > >> > > > BP+-+14+Relax+durability
>>> > > > >> > > >
>>> > > > >> > > > This is the document
>>> > > > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
>>> > > > >> > > >
>>> > > > >> > > > All comments are welcome
>>> > > > >> > > >
>>> > > > >> > > > I have added DL dev list in cc as the discussion is
>>> > interesting
>>> > > > for
>>> > > > >> > both
>>> > > > >> > > > groups
>>> > > > >> > > >
>>> > > > >> > > > Enrico Olivelli
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >> --
>>> > > > >> Jvrao
>>> > > > >> ---
>>> > > > >> First they ignore you, then they laugh at you, then they fight
>>> you,
>>> > > then
>>> > > > >> you win. - Mahatma Gandhi
>>> > > > >>
>>> > > > > --
>>> > > > >
>>> > > > >
>>> > > > > -- Enrico Olivelli
>>> > > > >
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Jvrao
>>> > > ---
>>> > > First they ignore you, then they laugh at you, then they fight you,
>>> then
>>> > > you win. - Mahatma Gandhi
>>> > >
>>> >
>>>
>>
>>
>

Reply via email to