vinothchandar commented on issue #1480: [SUPPORT] Backwards Incompatible Schema Evolution URL: https://github.com/apache/incubator-hudi/issues/1480#issuecomment-608529410 >>we would like the instant timestamps to be the same in the new target tables after the transformation so that downstream clients can continue to use their existing instant values while performing incremental pull queries. IIUC the current initialization process hands you a single commit for the first ingest.. but you basically want a physical copy of the old data, as the new data , with just renamed fields/new schema.. In general, this may be worth adding support for in the new exporter tool cc @xushiyan ... wdyt? essentially, something that will preserve file names and just transform the data. For now, even if you create those commit timeline files yourself in `.hoodie`, it may not work since the metadata inside will point to files that no longer exist in the new table.. Here's an approach that could work.. Writing a small program, that will - First copy the `.hoodie` folder to new table location - Then list all files (directly using fs.listStatus()) and filter them such that their commit time < latest commit time in the `.hoodie` folder you copied above - Read all files out using AvroParquetReader to get RDD[GenericRecord] (if it's MOR, we need more work), do your schema adjusting to derive a new RDD[GenericRecord] - Write this out using HoodieAvroParquetWriter back into the same file names.. Essentially, you will have the same file names and same timline (.hoodie) metadata, just with different schema.. Let's also wait to hear from @xushiyan . may be the exporter tool could be reused here
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
