Hi Pablo.

Thanks for that.. That is exactly what i needed and it is much more simple
than I thought hehe


Em sáb, 28 de set de 2019 00:31, Pablo Estrada <pabl...@google.com>
escreveu:

> Hi Lucas!
> That makes sense. I saw a question for this on StackOverflow recently.
> Perhaps that was you? [1] - perhaps not, but then you're not the only one
> trying to do this.
>
> I do not know a lot about connecting to RDBs from Python - it seemed to me
> that you'd need to also install ODBC / JDBC drivers, and that's not that
> easy to do on Dataflow. - So you would need to code a special transform
> depending on the database you're reading from.
>
> As far as I know, Postgres also does not have an easy way to read data in
> multiple threads in parallel, so consuming the results of your query would
> be done in a single thread, so you can do it with a relatively simple DoFn.
> Check my answer to the question [2], which has a DoFn for reading from
> Postgres and one for MySQL.
>
> LMK if that helps!
>
> [1]
> https://stackoverflow.com/questions/46528343/how-to-use-gcp-cloud-sql-as-dataflow-source-and-or-sink-with-python/58106722#58106722
> [2] https://stackoverflow.com/a/58106722/1255356
>
> On Fri, Sep 27, 2019 at 4:43 PM Eugene Kirpichov <kirpic...@google.com>
> wrote:
>
>> I'm actually very surprised why to this day nobody wrote a Python
>> connector for the Python Database API, like JdbcIO.
>> Do we maybe have a way to use JdbcIO from Python via the cross-language
>> connectors stuff?
>>
>> On Fri, Sep 27, 2019 at 4:28 PM Lucas Magalhães <
>> lucas.magalh...@paralelocs.com.br> wrote:
>>
>>> Hi guys.
>>>
>>> Sorry. I forgot to mention that.. I'm using python SDK.. Its seems that
>>> Java SDK looks like more mature, but i have no skill on that language.
>>>
>>> I'm trying to extract data from postgres (Cloud SQL), make some
>>> agregations and save into BigQuery.
>>>
>>> Em sex, 27 de set de 2019 19:21, Pablo Estrada <pabl...@google.com>
>>> escreveu:
>>>
>>>> Hi Lucas!
>>>> Can you share more information about your use case? Java has JdbcIO.
>>>> Maybe that's all you need? Or perhaps you're using Python SDK?
>>>> Best
>>>> -P.
>>>>
>>>> On Fri, Sep 27, 2019 at 3:08 PM Eugene Kirpichov <kirpic...@google.com>
>>>> wrote:
>>>>
>>>>> Hi Lucas,
>>>>> Any reason why you can't use JdbcIO?
>>>>> You almost certainly should *not* use BoundedSource, nor Splittable
>>>>> DoFn for this. BoundedSource is obsolete in favor of assembling your
>>>>> connector from regular transforms and/or using an SDF, and SDF is an
>>>>> extremely advanced feature whose primary audience is Beam SDK authors.
>>>>>
>>>>> On Fri, Sep 27, 2019 at 2:52 PM Lucas Magalhães <
>>>>> lucas.magalh...@paralelocs.com.br> wrote:
>>>>>
>>>>>> Hi guys.
>>>>>>
>>>>>> I'm new on apache Beam and o would like some help to undestand some
>>>>>> behaviours.
>>>>>>
>>>>>> 1. Is there some performance issue when i'm reading data from a
>>>>>> relational database using a ParDo instead of BoundedSource?
>>>>>>
>>>>>> 2. If I'm going to implement a BoundedSource how does Beam manage
>>>>>> the connection? I need to open and close in every method, like split, 
>>>>>> read,
>>>>>> estimete size and so on??
>>>>>>
>>>>>> 3. I read something about splittable dofn but i didnt fine
>>>>>> instructions about to How implement. Has anyone have something about ir?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Reply via email to