Re: Python IO Connector

pbd281 Mon, 06 Jan 2020 15:55:30 -0800

Apache Airflow went for the DB API approach as well and it seems like to have 
worked well for them. We will likely need to add extra_requires for each 
database engine Python package though, which adds some complexity but not a lot


> On Jan 6, 2020, at 6:12 PM, Eugene Kirpichov <[email protected]> wrote:
> 
> Agreed with above, it seems prudent to develop a pure-Python connector for 
> something as common as interacting with a database. It's likely easier to 
> achieve an idiomatic API, familiar to non-Beam Python SQL users, within pure 
> Python.
> 
> Developing a cross-language connector here might be plain impossible, because 
> rows read from a database are (at least in JDBC) not encodable - they require 
> a user's callback to translate to an encodable user type, and the callback 
> can't be in Python because then you have to encode its input before giving it 
> to Python. Same holds for the write transform.
> 
> Not sure about sqlalchemy though, maybe use plain DB-API 
> https://www.python.org/dev/peps/pep-0249/ instead? Seems like the Python one 
> is more friendly than JDBC in the sense that it actually returns rows as 
> tuples of simple data types.
> 
>> On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw <[email protected]> wrote:
>>> On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath <[email protected]> 
>>> wrote:
>> 
>>> Regarding cross-language transforms, we need to add better documentation, 
>>> but for now you'll have to go with existing examples and tests. For example,
>>> 
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py
>>> 
>>> Note that cross-language transforms feature is currently only available for 
>>> Flink Runner. Dataflow support is in development.
>> 
>> I think it works with all non-Dataflow runners, with the exception of the 
>> Java and Go Direct runners. (It does work with the Python direct runner.)
>>  
>>> I'm fine with developing this natively for Python as well. AFAIK Java JDBC 
>>> IO connector is not a super-complicated connector and it should be fine to 
>>> make relatively easy to maintain and widely usable connectors available in 
>>> multiple SDKs.
>> 
>> Yes, a case can certainly be made for having native connectors for 
>> particular common/simple sources. (We certainly don't call cross-language to 
>> read text files for example.)
>>  
>>> 
>>> Thanks,
>>> Cham 
>>> 
>>> 
>>>> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <[email protected]> wrote:
>>>> +Chamikara Jayalath +Heejong Lee 
>>>> 
>>>>> On Mon, Jan 6, 2020 at 10:20 AM <[email protected]> wrote:
>>>>> How do I go about doing that? From the docs, it appears cross language 
>>>>> transforms are
>>>>> currently undocumented. 
>>>>> https://beam.apache.org/roadmap/connectors-multi-sdk/
>>>>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <[email protected]> wrote:
>>>>>> 
>>>>>> What about using a cross language transform between Python and the 
>>>>>> already existing Java JdbcIO transform?
>>>>>> 
>>>>>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <[email protected]> wrote:
>>>>>>> I’d like to develop the Python SDK’s SQL IO connector. I was thinking 
>>>>>>> it would be easiest to use sqlalchemy to achieve maximum database 
>>>>>>> engine support, but I suppose I could also create an ABC for databases 
>>>>>>> that follow the DB API and create subclasses for each database engine 
>>>>>>> that override a connect method. What are your thoughts on the best way 
>>>>>>> to do this?

Re: Python IO Connector

Reply via email to