Hi Everyone,

I took the existing docker demo of xTable and moved the examples and
configuration to S3 + HMS.

You can see the completed example at
https://github.com/alberttwong/incubator-xtable/tree/main/demo-s3 while
it's going through PR review at
https://github.com/apache/incubator-xtable/pull/459

More details:
What is the purpose of the pull request

Using the xtable docker demo as the base, modify it so it works with S3.
End to End example with readme doc.
Brief change log

   1. added minio container images to provide an object store
   2. changed HMS image to use the Starburst HMS image because Starburst
   has the S3 libraries already built in to the image.
   3. built a custom spark 3.4 container image based on JDK 11 with hadoop
   2.10.2 and hive 2.3.10 (can't use 2.3.1 due to hive 2.3.1 bug) installed.
   Available at
   
https://hub.docker.com/r/atwong/openjdk-11-spark-3.4-hive-2.3.10-hadoop-2.10.2
if
   you dont' want to build it.
   4. git clone hudi and compile mvn with JDK 8 so you can get the
   hudi-hive-sync jars (you can skip this through hudi-hive-sync-bundle on
   mvnrepository.com)
   5. adding missing libraries to run run_sync_tool.sh.
   https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3,
   https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws,
   https://mvnrepository.com/artifact/com.esotericsoftware/kryo-shaded/4.0.2
   , https://mvnrepository.com/artifact/org.apache.parquet/parquet-avro,
   https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client
   6. modifications to iceberg, hudi and delta Trino catalog configurations
   to support S3 bucket lookups
   7. added core-site.xml to inject parameters to xtable and modified
   /etc/hadoop/core-site.xml to jnject parameters to hudi-hive-sync tool
   8. Modified pyspark demo script to include S3 configs


Regards,



http://alberttwong.com <http://bit.ly/1H6mpmA> - +1-949-870-9664 - GPG:
9D0F 6E75 5363 0F39 F64A 447E 2A2E 6721 C637 845A
<https://pgp.mit.edu/pks/lookup?op=get&search=0x2A2E6721C637845A>

Reply via email to