Litianye commented on pull request #1719:
URL: https://github.com/apache/hudi/pull/1719#issuecomment-642497057
> @Litianye no worry we can work through this together. I believe all the
sources treat empty string and `Option.empty()` the same. If not then it's a
bug. If we don't fix it now, the empty string will haunt us later.
Get it! I add little change in `DeltaSync`.
In `KafkaSource` like `JsonKafkaSource` & `AvroKafkaSource`,
`Option.empty()`and `Option.of("")` treated in reset strategy branch
https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java#L184
In `DFSSource` like `AvroDFSSource` & `JsonDFSSource` & `CsvDFSSource` &
`ParquetDFSSource`, i think `Option.of("")` will never occur in method
https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DFSPathSelector.java#L64,
because in `DFSPathSelector` checkpoint generator from
`FileStatus.getModificationTime()`.
In `HoodieIncrSource`, this case fixed in line
https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L106
In `HiveIncrPullSource`, in line
https://github.com/apache/hudi/blob/df2e0c760e7df0bd1b200867b3f0d2ca3a3f1fce/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HiveIncrPullSource.java#L123
will return an empty string, can you assess this and how can we fix it?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]