Re: bulk upsert data batch from Kafka dstream into Postgres db

Cody Koeninger Thu, 14 Dec 2017 06:48:01 -0800

use foreachPartition(), get a connection from a jdbc connection pool,
and insert the data the same way you would in a non-spark program.

If you're only doing inserts, postgres COPY will be faster (e.g.
https://discuss.pivotal.io/hc/en-us/articles/204237003), but if you're
doing updates that's not an option.

Depending on how many spark partitions you have, coalesce() to
decrease the number of partitions may help avoid database contention
and speed things up, but you'll need to experiment.

On Wed, Dec 13, 2017 at 11:52 PM, salemi <alireza.sal...@udo.edu> wrote:
> Hi All,
>
> we are consuming messages from Kafka using Spark dsteam. Once the processing
> is done we would like to update/insert the data in bulk fashion into the
> database.
>
> I was wondering what the best solution for this might be. Our Postgres
> database table is not partitioned.
>
>
> Thank you,
>
> Ali
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: bulk upsert data batch from Kafka dstream into Postgres db

Reply via email to