Sure Balaji,

Please give me permissions so I can assign this jira
<https://issues.apache.org/jira/browse/HUDI-207> to me and start working on
it.

On Wed, Aug 28, 2019 at 7:23 PM [email protected] <[email protected]>
wrote:

>  Sure Pratyaksh, Whatever field works for your use-case is good enough.
> You do have the flexibility to generate a derived field or use one of the
> source fields
> Balaji.V    On Wednesday, August 28, 2019, 06:48:44 AM PDT, Pratyaksh
> Sharma <[email protected]> wrote:
>
>  Hi Balaji,
>
> Sure I can do that. However after a considerable amount of time, the
> bin-log position will get exhausted. To handle this, we can have secondary
> ordering field as the ingestion_timestamp (the time when I am pushing the
> event to Kafka to be consumed by DeltaStreamer) which will work always.
>
> Please suggest.
>
> On Thu, Aug 22, 2019 at 9:49 PM [email protected] <[email protected]>
> wrote:
>
> >  Hi Pratyaksh,
> > The usual way we support this is to make use of
> > com.uber.hoodie.utilities.transform.Transformer plugin in
> > HoodieDeltaStreamer.  You can implement your own Transformer to add a new
> > derived field which could be a combination of timestamp and
> > binlog-position. You can then configure this new field to be used as
> source
> > ordering field.
> > Balaji.V
> >
> >    On Wednesday, August 21, 2019, 07:35:40 AM PDT, Pratyaksh Sharma <
> > [email protected]> wrote:
> >
> >  Hi,
> >
> > While building a CDC pipeline for capturing data changes in SQL using
> > HoodieDeltaStreamer, I came across the following problem. We need to read
> > SQL's bin log file for fetching all the modifications made to a
> particular
> > table. However in production environment where we are handling hundreds
> > of transactions per second (TPS), it is possible to have the same table
> row
> > getting modified multiple times within a second.
> >
> > Here comes the problem with Mysql binlog as it has 32 bit timestamp upto
> > seconds resolution. If we build CDC pipeline on top of such a table
> > with huge TPS, then breaking ties between records with the same Hoodie
> key
> > will not be possible with a single source-ordering-field (mentioned in
> > HoodieDeltaStreamer.Config), which is binlog timestamp in this case.
> >
> > Example -  https://github.com/zendesk/maxwell/issues/925.
> >
> > Hence as a part of Hudi improvement, the proposal is to add one
> > secondary-source-ordering-field for breaking ties among incoming records
> in
> > such cases.  For example, we could have ingestion_timestamp or
> > binlog_position as the secondary field.
> >
> > Please suggest. I have raised the issue here
> > <https://issues.apache.org/jira/browse/HUDI-207>.
> >
>

Reply via email to