Assigned to you. and also added you to the role for future tickets,,

On Thu, Aug 29, 2019 at 11:57 PM Pratyaksh Sharma <[email protected]>
wrote:

> Hi Vinoth,
>
> The jira is HUDI-207 <https://issues.apache.org/jira/browse/HUDI-207>.
>
> On Thu, Aug 29, 2019 at 10:17 PM Vinoth Chandar <[email protected]> wrote:
>
> > HI,
> >
> > whats your JIRA id? if you could share that, will add you the
> contributors
> > role.
> >
> > On Thu, Aug 29, 2019 at 12:02 AM Pratyaksh Sharma <[email protected]
> >
> > wrote:
> >
> > > Sure Balaji,
> > >
> > > Please give me permissions so I can assign this jira
> > > <https://issues.apache.org/jira/browse/HUDI-207> to me and start
> working
> > > on
> > > it.
> > >
> > > On Wed, Aug 28, 2019 at 7:23 PM [email protected] <[email protected]
> >
> > > wrote:
> > >
> > > >  Sure Pratyaksh, Whatever field works for your use-case is good
> enough.
> > > > You do have the flexibility to generate a derived field or use one of
> > the
> > > > source fields
> > > > Balaji.V    On Wednesday, August 28, 2019, 06:48:44 AM PDT, Pratyaksh
> > > > Sharma <[email protected]> wrote:
> > > >
> > > >  Hi Balaji,
> > > >
> > > > Sure I can do that. However after a considerable amount of time, the
> > > > bin-log position will get exhausted. To handle this, we can have
> > > secondary
> > > > ordering field as the ingestion_timestamp (the time when I am pushing
> > the
> > > > event to Kafka to be consumed by DeltaStreamer) which will work
> always.
> > > >
> > > > Please suggest.
> > > >
> > > > On Thu, Aug 22, 2019 at 9:49 PM [email protected] <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > >  Hi Pratyaksh,
> > > > > The usual way we support this is to make use of
> > > > > com.uber.hoodie.utilities.transform.Transformer plugin in
> > > > > HoodieDeltaStreamer.  You can implement your own Transformer to
> add a
> > > new
> > > > > derived field which could be a combination of timestamp and
> > > > > binlog-position. You can then configure this new field to be used
> as
> > > > source
> > > > > ordering field.
> > > > > Balaji.V
> > > > >
> > > > >    On Wednesday, August 21, 2019, 07:35:40 AM PDT, Pratyaksh
> Sharma <
> > > > > [email protected]> wrote:
> > > > >
> > > > >  Hi,
> > > > >
> > > > > While building a CDC pipeline for capturing data changes in SQL
> using
> > > > > HoodieDeltaStreamer, I came across the following problem. We need
> to
> > > read
> > > > > SQL's bin log file for fetching all the modifications made to a
> > > > particular
> > > > > table. However in production environment where we are handling
> > hundreds
> > > > > of transactions per second (TPS), it is possible to have the same
> > table
> > > > row
> > > > > getting modified multiple times within a second.
> > > > >
> > > > > Here comes the problem with Mysql binlog as it has 32 bit timestamp
> > > upto
> > > > > seconds resolution. If we build CDC pipeline on top of such a table
> > > > > with huge TPS, then breaking ties between records with the same
> > Hoodie
> > > > key
> > > > > will not be possible with a single source-ordering-field (mentioned
> > in
> > > > > HoodieDeltaStreamer.Config), which is binlog timestamp in this
> case.
> > > > >
> > > > > Example -  https://github.com/zendesk/maxwell/issues/925.
> > > > >
> > > > > Hence as a part of Hudi improvement, the proposal is to add one
> > > > > secondary-source-ordering-field for breaking ties among incoming
> > > records
> > > > in
> > > > > such cases.  For example, we could have ingestion_timestamp or
> > > > > binlog_position as the secondary field.
> > > > >
> > > > > Please suggest. I have raised the issue here
> > > > > <https://issues.apache.org/jira/browse/HUDI-207>.
> > > > >
> > > >
> > >
> >
>

Reply via email to