Awesome.  Also you could try building off master 0.5.0-snapshot if you are
having some trouble with the bundles.

Greatly appreciate if you can share progress/feedback.

On Wed, Sep 11, 2019 at 1:55 AM Rodrigo Dominguez <[email protected]>
wrote:

> Hi Kabeer
>
> I was able to build a simple script on python, and submit it with:
>
> spark-submit --jars
> $HUDI_SRC/packaging/hoodie-spark-bundle/target/hoodie-spark-bundle-0.4.7.jar
> --packages com.databricks:spark-avro_2.11:4.0.0 --conf
> spark.serializer=org.apache.spark.serializer.KryoSerializer ./test.py
>
> Yes, the idea is to use upsert, I’ll take a look at the project.
>
> Thank you
>
> Rodrigo Dominguez
> www.rorra.com.ar
>
>
> > On Sep 10, 2019, at 10:03 PM, Kabeer Ahmed <[email protected]> wrote:
> >
> > Hi Rodrigo,
> >
> > Welcome to the HUDI users group. The entire Hudi code base is Java and
> Scala based. But there is nothing stopping you from using it through Python
> (pyspark). You should be able to copy all the packaging jars into your
> Spark installation and use them. But please note that you wouldnt be able
> to define your own CombineAndUpdate logic (as far as I know). For eg: if
> you wanted to write your own logic to compare the records that are being
> ingested to the ones persisted, I am not aware how to write them when using
> PySpark.
> > If you are only after running using Python to use HUDI to run upsert use
> cases, then I would highly recommend that you look into the Metorikku
> project at: https://github.com/YotpoLtd/metorikku (
> https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2FYotpoLtd%2Fmetorikku&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D).
> The project does quite a lot without writing any code at all. It is based
> on HUDI.
> > If you are still after a Python example, then I can try to write one and
> share it with you.
> > Hope this helps,
> > Kabeer.
> >
> > On Sep 10 2019, at 4:07 pm, Rodrigo Dominguez <[email protected]>
> wrote:
> >> I’m new to Hudi, and I’m wondering whether I can use it with python
> (pyspark) and the way to use it.
> >>
> >> I was able to download the source code, compile the project, run the
> Scala and java samples, but didn’t see any single Python source code and
> I’m wondering whether this is possible.
> >> Thank you
> >> Rodrigo Dominguez
> >> [email protected]
> >>
> >
>
>

Reply via email to