Hi Rodrigo, Welcome to the HUDI users group. The entire Hudi code base is Java and Scala based. But there is nothing stopping you from using it through Python (pyspark). You should be able to copy all the packaging jars into your Spark installation and use them. But please note that you wouldnt be able to define your own CombineAndUpdate logic (as far as I know). For eg: if you wanted to write your own logic to compare the records that are being ingested to the ones persisted, I am not aware how to write them when using PySpark. If you are only after running using Python to use HUDI to run upsert use cases, then I would highly recommend that you look into the Metorikku project at: https://github.com/YotpoLtd/metorikku (https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2FYotpoLtd%2Fmetorikku&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D). The project does quite a lot without writing any code at all. It is based on HUDI. If you are still after a Python example, then I can try to write one and share it with you. Hope this helps, Kabeer.
On Sep 10 2019, at 4:07 pm, Rodrigo Dominguez <[email protected]> wrote: > I’m new to Hudi, and I’m wondering whether I can use it with python (pyspark) > and the way to use it. > > I was able to download the source code, compile the project, run the Scala > and java samples, but didn’t see any single Python source code and I’m > wondering whether this is possible. > Thank you > Rodrigo Dominguez > [email protected] >
