kazdy edited a comment on issue #3724:
URL: https://github.com/apache/hudi/issues/3724#issuecomment-954863193


   I think `hoodie.datasource.read.begin.instanttime` in unit test is only used 
to assert that new data has been read from source hudi `sourcePath ` and then 
written to `destPath ` by the streaming query initialized by ` 
initStreamingWriteFuture`. 
   
   I tried these settings and it didn't start from the time i specified.
   
   I think that it's not configurable at all:
   `    metadataLog.get(0).getOrElse {
         metadataLog.add(0, INIT_OFFSET)
         INIT_OFFSET
       }`
   
https://github.com/apache/hudi/blob/47ed91799943271f219419cf209793a98b3f09b5/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/streaming/HoodieStreamSource.scala
   
   Where INIT_OFFSET is declared as:
   `val INIT_OFFSET = HoodieSourceOffset(HoodieTimeline.INIT_INSTANT_TS)`
   
https://github.com/apache/hudi/blob/47ed91799943271f219419cf209793a98b3f09b5/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/streaming/HoodieSourceOffset.scala
   
   Which is:
   `  // Instant corresponding to pristine state of the table after its creation
     String INIT_INSTANT_TS = "00000000000000";`
     
https://github.com/apache/hudi/blob/0223c442ec9a746834d1b2f2582c5267b692823a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java
     
   
   > Yes that's correct. On EMR, you need to place it under `/usr/lib/hudi`
   
   So when running EMR ok EKS I'd need to provide custom container image to do 
this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to