Re: [Discuss] Transaction Support

Xi Liu Fri, 07 Oct 2016 07:35:48 -0700

I still can not edit the wiki. Can any of the pmc members grant me the
permissions?


- Xi

On Sat, Sep 17, 2016 at 10:35 PM, Xi Liu <xi.liu....@gmail.com> wrote:

> Sijie,
>
> I attempted to create a wiki page under that space. I found that I am not
> authorized with edit permission.
>
> Can any of the committers grant me the wiki edit permission? My account is
> "xi.liu.ant".
>
> - Xi
>
>
> On Tue, Sep 13, 2016 at 9:26 AM, Sijie Guo <si...@apache.org> wrote:
>
>> This sounds interesting ... I will take a closer look and give my comments
>> later.
>>
>> At the same time, do you mind creating a wiki page to put your idea there?
>> You can add your wiki page under
>> https://cwiki.apache.org/confluence/display/DL/Project+Proposals
>>
>> You might need to ask in the dev list to grant the wiki edit permissions
>> to
>> you once you have a wiki account.
>>
>> - Sijie
>>
>>
>> On Mon, Sep 12, 2016 at 2:20 AM, Xi Liu <xi.liu....@gmail.com> wrote:
>>
>> > Hello,
>> >
>> > I asked the transaction support in distributedlog user group two months
>> > ago. I want to raise this up again, as we are looking for using
>> > distributedlog for building a transactional data service. It is a major
>> > feature that is missing in distributedlog. We have some ideas to add
>> this
>> > to distributedlog and want to know if they make sense or not. If they
>> are
>> > good, we'd like to contribute and develop with the community.
>> >
>> > Here are the thoughts:
>> >
>> > -------------------------------------------------
>> >
>> > From our understanding, DL can provide "at-least-once" delivery semantic
>> > (if not, please correct me) but not "exactly-once" delivery semantic.
>> That
>> > means that a message can be delivered one or more times if the reader
>> > doesn't handle duplicates.
>> >
>> > The duplicates come from two places, one is at writer side (this assumes
>> > using write proxy not the core library), while the other one is at
>> reader
>> > side.
>> >
>> > - writer side: if the client attempts to write a record to the write
>> > proxies and gets a network error (e.g timeouts) then retries, the
>> retrying
>> > will potentially result in duplicates.
>> > - reader side:if the reader reads a message from a stream and then
>> crashes,
>> > when the reader restarts it would restart from last known position
>> (DLSN).
>> > If the reader fails after processing a record and before recording the
>> > position, the processed record will be delivered again.
>> >
>> > The reader problem can be properly addressed by making use of the
>> sequence
>> > numbers of records and doing proper checkpointing. For example, in
>> > database, it can checkpoint the indexed data with the sequence number of
>> > records; in flink, it can checkpoint the state with the sequence
>> numbers.
>> >
>> > The writer problem can be addressed by implementing an idempotent
>> writer.
>> > However, an alternative and more powerful approach is to support
>> > transactions.
>> >
>> > *What does transaction mean?*
>> >
>> > A transaction means a collection of records can be written
>> transactionally
>> > within a stream or across multiple streams. They will be consumed by the
>> > reader together when a transaction is committed, or will never be
>> consumed
>> > by the reader when the transaction is aborted.
>> >
>> > The transaction will expose following guarantees:
>> >
>> > - The reader should not be exposed to records written from uncommitted
>> > transactions (mandatory)
>> > - The reader should consume the records in the transaction commit order
>> > rather than the record written order (mandatory)
>> > - No duplicated records within a transaction (mandatory)
>> > - Allow interleaving transactional writes and non-transactional writes
>> > (optional)
>> >
>> > *Stream Transaction & Namespace Transaction*
>> >
>> > There will be two types of transaction, one is Stream level transaction
>> > (local transaction), while the other one is Namespace level transaction
>> > (global transaction).
>> >
>> > The stream level transaction is a transactional operation on writing
>> > records to one stream; the namespace level transaction is a
>> transactional
>> > operation on writing records to multiple streams.
>> >
>> > *Implementation Thoughts*
>> >
>> > - A transaction is consist of begin control record, a series of data
>> > records and commit/abort control record.
>> > - The begin/commit/abort control record is written to a `commit` log
>> > stream, while the data records will be written to normal data log
>> streams.
>> > - The `commit` log stream will be the same log stream for stream-level
>> > transaction,  while it will be a *system* stream (or multiple system
>> > streams) for namespace-level transactions.
>> > - The transaction code looks like as below:
>> >
>> > <code>
>> >
>> > Transaction txn = client.transaction();
>> > Future<DLSN> result1 = txn.write(stream-0, record);
>> > Future<DLSN> result2 = txn.write(stream-1, record);
>> > Future<DLSN> result3 = txn.write(stream-2, record);
>> > Future<Pair<DLSN, DLSN>> result = txn.commit();
>> >
>> > </code>
>> >
>> > if the txn is committed, all the write futures will be satisfied with
>> their
>> > written DLSNs. if the txn is aborted, all the write futures will be
>> failed
>> > together. there is no partial failure state.
>> >
>> > - The actually data flow will be:
>> >
>> > 1. writer get a transaction id from the owner of the `commit' log stream
>> > 1. write the begin control record (synchronously) with the transaction
>> id
>> > 2. for each write within the same txn, it will be assigned a local
>> sequence
>> > number starting from 0. the combination of transaction id and local
>> > sequence number will be used later on by the readers to de-duplicate
>> > records.
>> > 3. the commit/abort control record will be written based on the results
>> > from 2.
>> >
>> > - Application can supply a timeout for the transaction when #begin() a
>> > transaction. The owner of the `commit` log stream can abort transactions
>> > that never be committed/aborted within their timeout.
>> >
>> > - Failures:
>> >
>> > * all the log records can be simply retried as they will be
>> de-duplicated
>> > probably at the reader side.
>> >
>> > - Reader:
>> >
>> > * Reader can be configured to read uncommitted records or committed
>> records
>> > only (by default read uncommitted records)
>> > * If reader is configured to read committed records only, the read ahead
>> > cache will be changed to maintain one additional pending committed
>> records.
>> > the pending committed records map is bounded and records will be dropped
>> > when read ahead is moving.
>> > * when the reader hits a commit record, it will rewind to the begin
>> record
>> > and start reading from there. leveraging the proper read ahead cache and
>> > pending commit records cache, it would be good for both short
>> transactions
>> > and long transactions.
>> >
>> > - DLSN, SequenceId:
>> >
>> > * We will add a fourth field to DLSN. It is `local sequence number`
>> within
>> > a transaction session. So the new DLSN of records in a transaction will
>> be
>> > the DLSN of commit control record plus its local sequence number.
>> > * The sequence id will be still the position of the commit record plus
>> its
>> > local sequence number. The position will be advanced with total number
>> of
>> > written records on writing the commit control record.
>> >
>> > - Transaction Group & Namespace Transaction
>> >
>> > using one single log stream for namespace transaction can cause the
>> > bottleneck problem since all the begin/commit/end control records will
>> have
>> > to go through one log stream.
>> >
>> > the idea of 'transaction group' is to allow partitioning the writers
>> into
>> > different transaction groups.
>> >
>> > clients can specify the `group-name` when starting the transaction. if
>> > there is no `group-name` specified, it will use the default `commit`
>> log in
>> > the namespace for creating transactions.
>> >
>> > -------------------------------------------------
>> >
>> > I'd like to collect feedbacks on this idea. Appreciate any comments and
>> if
>> > anyone is also interested in this idea, we'd like to collaborate with
>> the
>> > community.
>> >
>> >
>> > - Xi
>> >
>>
>
>

Re: [Discuss] Transaction Support

Reply via email to