Hi Pablo. Thanks for that.. That is exactly what i needed and it is much more simple than I thought hehe
Em sáb, 28 de set de 2019 00:31, Pablo Estrada <pabl...@google.com> escreveu: > Hi Lucas! > That makes sense. I saw a question for this on StackOverflow recently. > Perhaps that was you? [1] - perhaps not, but then you're not the only one > trying to do this. > > I do not know a lot about connecting to RDBs from Python - it seemed to me > that you'd need to also install ODBC / JDBC drivers, and that's not that > easy to do on Dataflow. - So you would need to code a special transform > depending on the database you're reading from. > > As far as I know, Postgres also does not have an easy way to read data in > multiple threads in parallel, so consuming the results of your query would > be done in a single thread, so you can do it with a relatively simple DoFn. > Check my answer to the question [2], which has a DoFn for reading from > Postgres and one for MySQL. > > LMK if that helps! > > [1] > https://stackoverflow.com/questions/46528343/how-to-use-gcp-cloud-sql-as-dataflow-source-and-or-sink-with-python/58106722#58106722 > [2] https://stackoverflow.com/a/58106722/1255356 > > On Fri, Sep 27, 2019 at 4:43 PM Eugene Kirpichov <kirpic...@google.com> > wrote: > >> I'm actually very surprised why to this day nobody wrote a Python >> connector for the Python Database API, like JdbcIO. >> Do we maybe have a way to use JdbcIO from Python via the cross-language >> connectors stuff? >> >> On Fri, Sep 27, 2019 at 4:28 PM Lucas Magalhães < >> lucas.magalh...@paralelocs.com.br> wrote: >> >>> Hi guys. >>> >>> Sorry. I forgot to mention that.. I'm using python SDK.. Its seems that >>> Java SDK looks like more mature, but i have no skill on that language. >>> >>> I'm trying to extract data from postgres (Cloud SQL), make some >>> agregations and save into BigQuery. >>> >>> Em sex, 27 de set de 2019 19:21, Pablo Estrada <pabl...@google.com> >>> escreveu: >>> >>>> Hi Lucas! >>>> Can you share more information about your use case? Java has JdbcIO. >>>> Maybe that's all you need? Or perhaps you're using Python SDK? >>>> Best >>>> -P. >>>> >>>> On Fri, Sep 27, 2019 at 3:08 PM Eugene Kirpichov <kirpic...@google.com> >>>> wrote: >>>> >>>>> Hi Lucas, >>>>> Any reason why you can't use JdbcIO? >>>>> You almost certainly should *not* use BoundedSource, nor Splittable >>>>> DoFn for this. BoundedSource is obsolete in favor of assembling your >>>>> connector from regular transforms and/or using an SDF, and SDF is an >>>>> extremely advanced feature whose primary audience is Beam SDK authors. >>>>> >>>>> On Fri, Sep 27, 2019 at 2:52 PM Lucas Magalhães < >>>>> lucas.magalh...@paralelocs.com.br> wrote: >>>>> >>>>>> Hi guys. >>>>>> >>>>>> I'm new on apache Beam and o would like some help to undestand some >>>>>> behaviours. >>>>>> >>>>>> 1. Is there some performance issue when i'm reading data from a >>>>>> relational database using a ParDo instead of BoundedSource? >>>>>> >>>>>> 2. If I'm going to implement a BoundedSource how does Beam manage >>>>>> the connection? I need to open and close in every method, like split, >>>>>> read, >>>>>> estimete size and so on?? >>>>>> >>>>>> 3. I read something about splittable dofn but i didnt fine >>>>>> instructions about to How implement. Has anyone have something about ir? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>>>