Hi Balaji, Sure I can do that. However after a considerable amount of time, the bin-log position will get exhausted. To handle this, we can have secondary ordering field as the ingestion_timestamp (the time when I am pushing the event to Kafka to be consumed by DeltaStreamer) which will work always.
Please suggest. On Thu, Aug 22, 2019 at 9:49 PM vbal...@apache.org <vbal...@apache.org> wrote: > Hi Pratyaksh, > The usual way we support this is to make use of > com.uber.hoodie.utilities.transform.Transformer plugin in > HoodieDeltaStreamer. You can implement your own Transformer to add a new > derived field which could be a combination of timestamp and > binlog-position. You can then configure this new field to be used as source > ordering field. > Balaji.V > > On Wednesday, August 21, 2019, 07:35:40 AM PDT, Pratyaksh Sharma < > pratyaks...@gmail.com> wrote: > > Hi, > > While building a CDC pipeline for capturing data changes in SQL using > HoodieDeltaStreamer, I came across the following problem. We need to read > SQL's bin log file for fetching all the modifications made to a particular > table. However in production environment where we are handling hundreds > of transactions per second (TPS), it is possible to have the same table row > getting modified multiple times within a second. > > Here comes the problem with Mysql binlog as it has 32 bit timestamp upto > seconds resolution. If we build CDC pipeline on top of such a table > with huge TPS, then breaking ties between records with the same Hoodie key > will not be possible with a single source-ordering-field (mentioned in > HoodieDeltaStreamer.Config), which is binlog timestamp in this case. > > Example - https://github.com/zendesk/maxwell/issues/925. > > Hence as a part of Hudi improvement, the proposal is to add one > secondary-source-ordering-field for breaking ties among incoming records in > such cases. For example, we could have ingestion_timestamp or > binlog_position as the secondary field. > > Please suggest. I have raised the issue here > <https://issues.apache.org/jira/browse/HUDI-207>. >