+1 to Jarek On Sun, Dec 29, 2019, 21:58 Jarek Potiuk <[email protected]> wrote:
> I think it's a rare thing for anyone to use a generic DB operations (as > operator). You usually know which database you have as source/target. > Even if in one company you have more than one database, then it is not a > problem to use one hook/operator in Postgres and another in MySQL etc. > > IMHO it's better to have a specialized DB hook rather than an SQLAlchemy > one as you can use specialized SQL for that database. > Also those specialized hooks already use the DbApiHook with all the common > code extracted so there is not a lot of duplicated code. > > SQLAlchemy is good for an application that has a lot of complex logic and > OO architecture and should rely on several different DBs as backend. > But it is mostly only limitation in case of simple ETL scripts (which is > the case for pretty much all Airflow tasks). > > J. > > On Fri, Dec 27, 2019 at 1:31 PM Bjorn Olsen <[email protected]> > wrote: > > > Hi community > > > > > > > > I have a question/suggestion about the different SQL Database hooks + > > operators. > > > > > > > > Examples: > > > > - MsSqlHook (Implements DbApiHook using pymssql) > > > > - MySqlHook (Implements DbApiHook using MySQLdb) > > > > - PostgresHook (Implements DbApiHook using psycopg2) > > > > > > > > These hooks tend to have convenience methods for doing more complex DB > > functions (eg bulk loading), which is useful. > > > > > > > > However there is also a downside to them. > > > > First I need to figure out which operator and hook to use and what driver > > it is using. Then I have to review the source code to understand what it > is > > doing and what it isn't doing. Finally (at my unique client site) I > request > > the driver lib to be whitelisted and installed, and I test it for my use > > case. > > > > > > > > This can be quite time consuming if working with several different DBs. > > > > > > > > Does Airflow have a general-purpose SQL hook + operator? > > > > I didn't see one but perhaps I'm missing it :) > > > > > > > > I was considering implementing something based on SQL Alchemy. > > > > > > > > Advantages: > > > > - Range of database dialects > > > > - Simple to use esp. for prototyping > > > > - Consistent - no DB-specific code > > > > - No extra dependencies - Airflow already requires SQL Alchemy > > > > - Maybe SQL Alchemy features would be useful eg function-based querying > > > > > > > > Disadvantages: > > > > - Another way to do the same thing (some existing DB hooks) > > > > - Maybe less efficient than using DB specific hooks > > > > > > > > What are your thoughts? > > > > > > > > Kind regards > > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> >
