Re: Using hudi with pyspark

Kabeer Ahmed Tue, 10 Sep 2019 18:04:48 -0700

Hi Rodrigo,

Welcome to the HUDI users group. The entire Hudi code base is Java and Scala 
based. But there is nothing stopping you from using it through Python 
(pyspark). You should be able to copy all the packaging jars into your Spark 
installation and use them. But please note that you wouldnt be able to define 
your own CombineAndUpdate logic (as far as I know). For eg: if you wanted to 
write your own logic to compare the records that are being ingested to the ones 
persisted, I am not aware how to write them when using PySpark.
If you are only after running using Python to use HUDI to run upsert use cases, 
then I would highly recommend that you look into the Metorikku project at: 
https://github.com/YotpoLtd/metorikku 
(https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2FYotpoLtd%2Fmetorikku&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D).
 The project does quite a lot without writing any code at all. It is based on 
HUDI.
If you are still after a Python example, then I can try to write one and share 
it with you.
Hope this helps,
Kabeer.


On Sep 10 2019, at 4:07 pm, Rodrigo Dominguez <[email protected]> wrote:
> I’m new to Hudi, and I’m wondering whether I can use it with python (pyspark) 
> and the way to use it.
>
> I was able to download the source code, compile the project, run the Scala 
> and java samples, but didn’t see any single Python source code and I’m 
> wondering whether this is possible.
> Thank you
> Rodrigo Dominguez
> [email protected]
>

Re: Using hudi with pyspark

Reply via email to