Re: JdbcIO read needs to fit in memory

Jean-Baptiste Onofré Thu, 24 Oct 2019 07:27:28 -0700

JdbcIO is basically a DoFn. So it could load all on a single executor (there's no obvious way to split).

It's what you mean ?

Regards

Le 24 oct. 2019 15:26, Jozef Vilcek <jozo.vil...@gmail.com> a écrit :

Hi,

I am in a need to read a big-ish data set via JdbcIO. This forced me to bump up memory for my executor (right now using SparkRunner). It seems that JdbcIO has a requirement to fit all data in memory as it is using DoFn to unfold query to list of elements.

BoundedSource would not face the need to fit result in memory, but JdbcIO is using DoFn. Also, in recent discussion [1] it was suggested that BoudnedSource should not be used as it is obsolete.

Does anyone faced this issue? What would be the best way to solve it? If DoFn should be kept, then I can only think of splitting the query to ranges and try to find most fitting number of rows to read at once.

I appreciate any thoughts.

[1] https://lists.apache.org/list.html?dev@beam.apache.org:lte=1M:Reading%20from%20RDB%2C%20ParDo%20or%20BoundedSource

Re: JdbcIO read needs to fit in memory

Reply via email to