[GitHub] [hudi] n3nash commented on issue #3048: [SUPPORT]delta streamer run bootstrap from one hudi table to another error .parquet is not a Parquet file. expected magic number at tail

GitBox Mon, 14 Jun 2021 00:31:39 -0700


n3nash commented on issue #3048:
URL: https://github.com/apache/hudi/issues/3048#issuecomment-859308186



   @fengjian428 Since your source table is Hudi, you cannot use 
`SparkParquetBootstrapDataProvider` to read the data. The 
`SparkParquetBootstrapDataProvider` is used to read a normal parquet table. 
Since new data can be coming into your source Hudi table, the 
`SparkParquetBootstrapDataProvider` may end up reading a parquet file that is 
being written to Hudi table and hence not providing snapshot isolation. 
   
   One way to go about this is to implement a `HoodieBootstrapDataProvider` 
that can read Hudi tables internally using the Spark Datasource, something like 
   ```
   source = spark.read.format(hudi)...
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] n3nash commented on issue #3048: [SUPPORT]delta streamer run bootstrap from one hudi table to another error .parquet is not a Parquet file. expected magic number at tail

Reply via email to