WTa-hash opened a new issue #4241:
URL: https://github.com/apache/hudi/issues/4241


   Are there any tips or support on setting up a Disaster Recovery (DR) 
environment with Apache Hudi?
   
   We are creating our Datalake, stored on AWS S3, by running a Spark 
structured streaming application on AWS EMR. The Spark application is 
processing incoming data from a AWS Kinesis stream and saving them as Hudi 
tables on S3 and syncing with the AWS Glue catalog. All of this happens in a 
single AWS region (us-east-1).
   
   In the event where we need to failover to a different region or our main 
region (us-east-1) goes down, what is the suggested approach to get start up 
again in another AWS region with our existing Datalake data? We can set up S3 
replication to replicate the parquet files (and .hoodie files) to another S3 
bucket residing in a different AWS region, but S3 replication happens 
asynchronously which means files may get replicated out of order and cause 
issues when querying (due to possible missing files).
   
   **Environment Description**
   
   * Hudi version : 0.7.0-amzn-1
   
   * Spark version : 2.4.7
   
   * Hive version : 2.3.7
   
   * Hadoop version : 2.10.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to