Re: [Discuss] Transaction Support

Sijie Guo Sat, 08 Oct 2016 06:59:14 -0700

Xi,

I just granted you the edit permission.


- Sijie

On Fri, Oct 7, 2016 at 10:34 AM, Xi Liu <[email protected]> wrote:

> I still can not edit the wiki. Can any of the pmc members grant me the
> permissions?
>
> - Xi
>
> On Sat, Sep 17, 2016 at 10:35 PM, Xi Liu <[email protected]> wrote:
>
> > Sijie,
> >
> > I attempted to create a wiki page under that space. I found that I am not
> > authorized with edit permission.
> >
> > Can any of the committers grant me the wiki edit permission? My account
> is
> > "xi.liu.ant".
> >
> > - Xi
> >
> >
> > On Tue, Sep 13, 2016 at 9:26 AM, Sijie Guo <[email protected]> wrote:
> >
> >> This sounds interesting ... I will take a closer look and give my
> comments
> >> later.
> >>
> >> At the same time, do you mind creating a wiki page to put your idea
> there?
> >> You can add your wiki page under
> >> https://cwiki.apache.org/confluence/display/DL/Project+Proposals
> >>
> >> You might need to ask in the dev list to grant the wiki edit permissions
> >> to
> >> you once you have a wiki account.
> >>
> >> - Sijie
> >>
> >>
> >> On Mon, Sep 12, 2016 at 2:20 AM, Xi Liu <[email protected]> wrote:
> >>
> >> > Hello,
> >> >
> >> > I asked the transaction support in distributedlog user group two
> months
> >> > ago. I want to raise this up again, as we are looking for using
> >> > distributedlog for building a transactional data service. It is a
> major
> >> > feature that is missing in distributedlog. We have some ideas to add
> >> this
> >> > to distributedlog and want to know if they make sense or not. If they
> >> are
> >> > good, we'd like to contribute and develop with the community.
> >> >
> >> > Here are the thoughts:
> >> >
> >> > -------------------------------------------------
> >> >
> >> > From our understanding, DL can provide "at-least-once" delivery
> semantic
> >> > (if not, please correct me) but not "exactly-once" delivery semantic.
> >> That
> >> > means that a message can be delivered one or more times if the reader
> >> > doesn't handle duplicates.
> >> >
> >> > The duplicates come from two places, one is at writer side (this
> assumes
> >> > using write proxy not the core library), while the other one is at
> >> reader
> >> > side.
> >> >
> >> > - writer side: if the client attempts to write a record to the write
> >> > proxies and gets a network error (e.g timeouts) then retries, the
> >> retrying
> >> > will potentially result in duplicates.
> >> > - reader side:if the reader reads a message from a stream and then
> >> crashes,
> >> > when the reader restarts it would restart from last known position
> >> (DLSN).
> >> > If the reader fails after processing a record and before recording the
> >> > position, the processed record will be delivered again.
> >> >
> >> > The reader problem can be properly addressed by making use of the
> >> sequence
> >> > numbers of records and doing proper checkpointing. For example, in
> >> > database, it can checkpoint the indexed data with the sequence number
> of
> >> > records; in flink, it can checkpoint the state with the sequence
> >> numbers.
> >> >
> >> > The writer problem can be addressed by implementing an idempotent
> >> writer.
> >> > However, an alternative and more powerful approach is to support
> >> > transactions.
> >> >
> >> > *What does transaction mean?*
> >> >
> >> > A transaction means a collection of records can be written
> >> transactionally
> >> > within a stream or across multiple streams. They will be consumed by
> the
> >> > reader together when a transaction is committed, or will never be
> >> consumed
> >> > by the reader when the transaction is aborted.
> >> >
> >> > The transaction will expose following guarantees:
> >> >
> >> > - The reader should not be exposed to records written from uncommitted
> >> > transactions (mandatory)
> >> > - The reader should consume the records in the transaction commit
> order
> >> > rather than the record written order (mandatory)
> >> > - No duplicated records within a transaction (mandatory)
> >> > - Allow interleaving transactional writes and non-transactional writes
> >> > (optional)
> >> >
> >> > *Stream Transaction & Namespace Transaction*
> >> >
> >> > There will be two types of transaction, one is Stream level
> transaction
> >> > (local transaction), while the other one is Namespace level
> transaction
> >> > (global transaction).
> >> >
> >> > The stream level transaction is a transactional operation on writing
> >> > records to one stream; the namespace level transaction is a
> >> transactional
> >> > operation on writing records to multiple streams.
> >> >
> >> > *Implementation Thoughts*
> >> >
> >> > - A transaction is consist of begin control record, a series of data
> >> > records and commit/abort control record.
> >> > - The begin/commit/abort control record is written to a `commit` log
> >> > stream, while the data records will be written to normal data log
> >> streams.
> >> > - The `commit` log stream will be the same log stream for stream-level
> >> > transaction,  while it will be a *system* stream (or multiple system
> >> > streams) for namespace-level transactions.
> >> > - The transaction code looks like as below:
> >> >
> >> > <code>
> >> >
> >> > Transaction txn = client.transaction();
> >> > Future<DLSN> result1 = txn.write(stream-0, record);
> >> > Future<DLSN> result2 = txn.write(stream-1, record);
> >> > Future<DLSN> result3 = txn.write(stream-2, record);
> >> > Future<Pair<DLSN, DLSN>> result = txn.commit();
> >> >
> >> > </code>
> >> >
> >> > if the txn is committed, all the write futures will be satisfied with
> >> their
> >> > written DLSNs. if the txn is aborted, all the write futures will be
> >> failed
> >> > together. there is no partial failure state.
> >> >
> >> > - The actually data flow will be:
> >> >
> >> > 1. writer get a transaction id from the owner of the `commit' log
> stream
> >> > 1. write the begin control record (synchronously) with the transaction
> >> id
> >> > 2. for each write within the same txn, it will be assigned a local
> >> sequence
> >> > number starting from 0. the combination of transaction id and local
> >> > sequence number will be used later on by the readers to de-duplicate
> >> > records.
> >> > 3. the commit/abort control record will be written based on the
> results
> >> > from 2.
> >> >
> >> > - Application can supply a timeout for the transaction when #begin() a
> >> > transaction. The owner of the `commit` log stream can abort
> transactions
> >> > that never be committed/aborted within their timeout.
> >> >
> >> > - Failures:
> >> >
> >> > * all the log records can be simply retried as they will be
> >> de-duplicated
> >> > probably at the reader side.
> >> >
> >> > - Reader:
> >> >
> >> > * Reader can be configured to read uncommitted records or committed
> >> records
> >> > only (by default read uncommitted records)
> >> > * If reader is configured to read committed records only, the read
> ahead
> >> > cache will be changed to maintain one additional pending committed
> >> records.
> >> > the pending committed records map is bounded and records will be
> dropped
> >> > when read ahead is moving.
> >> > * when the reader hits a commit record, it will rewind to the begin
> >> record
> >> > and start reading from there. leveraging the proper read ahead cache
> and
> >> > pending commit records cache, it would be good for both short
> >> transactions
> >> > and long transactions.
> >> >
> >> > - DLSN, SequenceId:
> >> >
> >> > * We will add a fourth field to DLSN. It is `local sequence number`
> >> within
> >> > a transaction session. So the new DLSN of records in a transaction
> will
> >> be
> >> > the DLSN of commit control record plus its local sequence number.
> >> > * The sequence id will be still the position of the commit record plus
> >> its
> >> > local sequence number. The position will be advanced with total number
> >> of
> >> > written records on writing the commit control record.
> >> >
> >> > - Transaction Group & Namespace Transaction
> >> >
> >> > using one single log stream for namespace transaction can cause the
> >> > bottleneck problem since all the begin/commit/end control records will
> >> have
> >> > to go through one log stream.
> >> >
> >> > the idea of 'transaction group' is to allow partitioning the writers
> >> into
> >> > different transaction groups.
> >> >
> >> > clients can specify the `group-name` when starting the transaction. if
> >> > there is no `group-name` specified, it will use the default `commit`
> >> log in
> >> > the namespace for creating transactions.
> >> >
> >> > -------------------------------------------------
> >> >
> >> > I'd like to collect feedbacks on this idea. Appreciate any comments
> and
> >> if
> >> > anyone is also interested in this idea, we'd like to collaborate with
> >> the
> >> > community.
> >> >
> >> >
> >> > - Xi
> >> >
> >>
> >
> >
>

Re: [Discuss] Transaction Support

Reply via email to