Hi Kabeer I was able to build a simple script on python, and submit it with:
spark-submit --jars $HUDI_SRC/packaging/hoodie-spark-bundle/target/hoodie-spark-bundle-0.4.7.jar --packages com.databricks:spark-avro_2.11:4.0.0 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer ./test.py Yes, the idea is to use upsert, I’ll take a look at the project. Thank you Rodrigo Dominguez www.rorra.com.ar > On Sep 10, 2019, at 10:03 PM, Kabeer Ahmed <[email protected]> wrote: > > Hi Rodrigo, > > Welcome to the HUDI users group. The entire Hudi code base is Java and Scala > based. But there is nothing stopping you from using it through Python > (pyspark). You should be able to copy all the packaging jars into your Spark > installation and use them. But please note that you wouldnt be able to define > your own CombineAndUpdate logic (as far as I know). For eg: if you wanted to > write your own logic to compare the records that are being ingested to the > ones persisted, I am not aware how to write them when using PySpark. > If you are only after running using Python to use HUDI to run upsert use > cases, then I would highly recommend that you look into the Metorikku project > at: https://github.com/YotpoLtd/metorikku > (https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2FYotpoLtd%2Fmetorikku&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D). > The project does quite a lot without writing any code at all. It is based on > HUDI. > If you are still after a Python example, then I can try to write one and > share it with you. > Hope this helps, > Kabeer. > > On Sep 10 2019, at 4:07 pm, Rodrigo Dominguez <[email protected]> wrote: >> I’m new to Hudi, and I’m wondering whether I can use it with python >> (pyspark) and the way to use it. >> >> I was able to download the source code, compile the project, run the Scala >> and java samples, but didn’t see any single Python source code and I’m >> wondering whether this is possible. >> Thank you >> Rodrigo Dominguez >> [email protected] >> >
