[ 
https://issues.apache.org/jira/browse/AIRFLOW-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372416#comment-15372416
 ] 

Bence Nagy commented on AIRFLOW-139:
------------------------------------

Oh, sorry, I'm subscribed to all Jira notifications but I think that's a bad 
idea since I didn't see me specifically being mentioned here. So to answer a 
month late:

[~mattland]'s suggestion does not sound right to me at all. Special cases are 
already bad, but in this case they wouldn't be even correct; Postgres 8.0.2 
does not support autocommit and even though it's unlikely to be running for 
people, it is just plain wrong in my opinion to include an exception for it 
just because Redshift happened to fork that version. Not only that, but then 
Redshift would break as well if Amazon ever decided to change their server 
version (even if just rebasing onto 8.0.3 or something.) So, yeah, my stance 
here is a big 'no, don't do that'.

As I mentioned in the GitHub commit comments as well, the entire fact that 
Redshift worked with the PostgresOperator is a coincidence, and as far as I 
know, was never officially said to be supported by it. Due to nuances like this 
one, most other products working with databases that I've seen (for instance 
DataGrip and Metabase) consider Redshift to be separate from vanilla 
PostgreSQL, with the former requiring a special DB driver for it, and the 
latter not supporting Redshift at all even though they support Postgres. In 
light of all this, I think the correct and logical solution here is to add a 
RedshiftOperator, with different behavior around the value of 
{{supports_autocommit}} and possibly some other differences as well down the 
road if the operators get more complex. I don't have a strong opinion on 
whether it should be a subclass of PostgresOperator, but I think I would 
implement this like that myself, since after all, Redshift is a derivative of 
Postgres, sharing quite a lot of behavior.

> Executing VACUUM with PostgresOperator
> --------------------------------------
>
>                 Key: AIRFLOW-139
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-139
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: Airflow 1.7.0
>            Reporter: Rafael
>
> Dear Airflow Maintainers,
> h1. Environment
> * Airflow version: *v1.7.0*
> * Airflow components: *PostgresOperator*
> * Python Version: *Python 3.5.1*
> * Operating System: *15.4.0 Darwin*
> h1. Description of Issue
> I am trying to execute a `VACUUM` command as part of DAG with the 
> `PostgresOperator`, which fails with the following error:
> {quote}
> [2016-05-14 16:14:01,849] {__init__.py:36} INFO - Using executor 
> SequentialExecutor
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 15, in <module>
>     args.func(args)
>   File 
> "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/bin/cli.py",
>  line 203, in run
>     pool=args.pool,
>   File 
> "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/models.py",
>  line 1067, in run
>     result = task_copy.execute(context=context)
>   File 
> "/usr/local/lib/python3.5/site-packages/airflow/operators/postgres_operator.py",
>  line 39, in execute
>     self.hook.run(self.sql, self.autocommit, parameters=self.parameters)
>   File 
> "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/hooks/dbapi_hook.py",
>  line 109, in run
>     cur.execute(s)
> psycopg2.InternalError: VACUUM cannot run inside a transaction block
> {quote}
> I could create a small python script that performs the operation, as 
> explained in [this stackoverflow 
> entry](http://stackoverflow.com/questions/1017463/postgresql-how-to-run-vacuum-from-code-outside-transaction-block).
>  However, I would like to know first if the `VACUUM` command should be 
> supported by the `PostgresOperator`.
> h1. Reproducing the Issue
> The operator can be declared as follows:
> {quote}
> conn = ('postgres_default')
> t4 = PostgresOperator(
>     task_id='vacuum',
>     postgres_conn_id=conn,
>     sql=("VACUUM public.table"),
>     dag=dag
>     )
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to