Awesome. Also you could try building off master 0.5.0-snapshot if you are having some trouble with the bundles.
Greatly appreciate if you can share progress/feedback. On Wed, Sep 11, 2019 at 1:55 AM Rodrigo Dominguez <[email protected]> wrote: > Hi Kabeer > > I was able to build a simple script on python, and submit it with: > > spark-submit --jars > $HUDI_SRC/packaging/hoodie-spark-bundle/target/hoodie-spark-bundle-0.4.7.jar > --packages com.databricks:spark-avro_2.11:4.0.0 --conf > spark.serializer=org.apache.spark.serializer.KryoSerializer ./test.py > > Yes, the idea is to use upsert, I’ll take a look at the project. > > Thank you > > Rodrigo Dominguez > www.rorra.com.ar > > > > On Sep 10, 2019, at 10:03 PM, Kabeer Ahmed <[email protected]> wrote: > > > > Hi Rodrigo, > > > > Welcome to the HUDI users group. The entire Hudi code base is Java and > Scala based. But there is nothing stopping you from using it through Python > (pyspark). You should be able to copy all the packaging jars into your > Spark installation and use them. But please note that you wouldnt be able > to define your own CombineAndUpdate logic (as far as I know). For eg: if > you wanted to write your own logic to compare the records that are being > ingested to the ones persisted, I am not aware how to write them when using > PySpark. > > If you are only after running using Python to use HUDI to run upsert use > cases, then I would highly recommend that you look into the Metorikku > project at: https://github.com/YotpoLtd/metorikku ( > https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2FYotpoLtd%2Fmetorikku&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D). > The project does quite a lot without writing any code at all. It is based > on HUDI. > > If you are still after a Python example, then I can try to write one and > share it with you. > > Hope this helps, > > Kabeer. > > > > On Sep 10 2019, at 4:07 pm, Rodrigo Dominguez <[email protected]> > wrote: > >> I’m new to Hudi, and I’m wondering whether I can use it with python > (pyspark) and the way to use it. > >> > >> I was able to download the source code, compile the project, run the > Scala and java samples, but didn’t see any single Python source code and > I’m wondering whether this is possible. > >> Thank you > >> Rodrigo Dominguez > >> [email protected] > >> > > > >
