Hi Syed, Apologies for the delay. If you are using copy-on-write, you can look into savepoints (although I realize its only exposed at the rdd api level).. We do have a tool called HoodieSnapshotCopier in hoodie-utilities, to take periodic copies/snapshots of a table for backup purposes, as of a given commit. Raymond (if you arr here) , has an RFC to enhance that even.. Running the copier (please test it first, since its not used in OSS that much IIUC) periodically, say every day would achieve your goals I believe..
https://github.com/apache/incubator-hudi/blob/c2c0f6b13d5b72b3098ed1b343b0a89679f854b3/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java Any issues in the tool would be simple to fix. Tool itself is couple hundred lines, that all. Thanks Vinoth On Mon, Feb 10, 2020 at 3:56 AM Syed Abdul Kather <[email protected]> wrote: > Yes. Also for restoring the data from cold storage. > > Use case here : > We stream data using debezium and push to Kafka we have retention in Kafka > as 7 days. In case the destination table created using the hudi got crashed > or we need to repopulate then we need a way that can help us restore the > data. > > Thanks and Regards, > S SYED ABDUL KATHER > *Data platform Lead @ Tathastu.ai* > > *+91 - 7411011661* > > > On Mon, Jan 13, 2020 at 10:17 PM Vinoth Chandar <[email protected]> wrote: > > > Hi Syed, > > > > If I follow correctly, are you asking how to do a bulk load first and > then > > use delta streamer on top of that dataset to apply binlogs from Kafka? > > > > Thanks > > Vinoth > > > > On Mon, Jan 13, 2020 at 12:39 AM Syed Abdul Kather <[email protected]> > > wrote: > > > > > Hi Team, > > > > > > We have on-board a few tables that have really huge number of records > > (100 > > > M records ). The plan is like enable the binlog for database that is no > > > issues as stream can handle the load . But for loading the snapshot . > We > > > have use sqoop to import whole table to s3. > > > > > > What we required here? > > > Can we load the whole dump sqooped record to hudi table then we would > > use > > > the stream(binlog data comes vai kafka) > > > > > > Thanks and Regards, > > > S SYED ABDUL KATHER > > > *Bigdata [email protected]* > > > * +91-7411011661* > > > > > >
