[
https://issues.apache.org/jira/browse/AIRFLOW-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896037#comment-16896037
]
ASF subversion and git services commented on AIRFLOW-4961:
----------------------------------------------------------
Commit 7d02a9db0035e253b7438219ab3e669f79e6b2cd in airflow's branch
refs/heads/master from Fahran Wallace
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=7d02a9d ]
[AIRFLOW-4961] Insert TaskFail.duration as int match DB schema column type
(#5593)
When writing a task failure record, convert 'duration' decimal
value -> an int before persistence, to remove reliance on the
database doing this automatically and gracefully.
> SQL Error when writing Task Failure (using CockroachDB)
> -------------------------------------------------------
>
> Key: AIRFLOW-4961
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4961
> Project: Apache Airflow
> Issue Type: Bug
> Components: database
> Affects Versions: 1.10.3
> Reporter: Fahran Wallace
> Priority: Minor
>
> We're running Airflow against CockroachDB (which is designed to be
> "compatible" with PostgresSQL
> [https://www.cockroachlabs.com/blog/why-postgres/]), but with a few subtle
> differences
> [https://www.cockroachlabs.com/docs/stable/porting-postgres.html]).
>
> It's been fairly painless so far (we've got a few minor tweaks of SQL grammar
> in our fork, especially around DB migrations), but we've come across
> something we think is worth merging back to Airflow trunk. It's both not
> quite correct behaviour in normal operation anyway, and it's a 4 character
> fix.
> When a task has failed, we get this error writing to the task_fail table in
> (taskfail.py):
> {code:java}
> value type decimal doesn't match type INT8 of column "duration"{code}
> This is because total_seconds() returns a decimal, but it's assigned to the
> variable duration, bound to an integer database column here:
>
> {code:java}
> self.duration = (self.end_date - self.start_date).total_seconds()
> {code}
>
> [https://github.com/apache/airflow/blob/b33b9898677e70ef3fcb4bf03cd28b4077681224/airflow/models/taskfail.py#L53]
> A fix looks like:
> {code:java}
> self.duration = int((self.end_date - self.start_date).total_seconds()){code}
> which means we're no longer reliant on the database to truncate and cast this
> correctly,
> The flipside of this argument is documented here:
> [https://www.postgresql.org/message-id/[email protected]] -
> postgres made a conscious decision to not throw errors in this situation.
> I'll ping a copy of this ticket over to Cockroach labs too, to see if they
> have any thoughts (it's not listed as a current incompatibility, but probably
> should be).
>
> From the stacktrace:
> {code:java}
> [2019-07-03 09:06:33,402] {{base_task_runner.py:101}} INFO - Job
> 465548343190323206: Subtask update_snapshot cursor.execute(statement,
> parameters)
> [2019-07-03 09:06:33,402] {{base_task_runner.py:101}} INFO - Job
> 465548343190323206: Subtask update_snapshot sqlalchemy.exc.ProgrammingError:
> (psycopg2.errors.DatatypeMismatch) value type decimal doesn't match type INT8
> of column "duration"
> [2019-07-03 09:06:33,402] {{base_task_runner.py:101}} INFO - Job
> 465548343190323206: Subtask update_snapshot [SQL: 'INSERT INTO task_fail
> (task_id, dag_id, execution_date, start_date, end_date, duration) VALUES
> (%(task_id)s, %(dag_id)s, %(execution_date)s, %(start_date)s, %(end_date)s,
> %(duration)s) RETURNING task_fail.id'] [parameters: {'task_id':
> 'update_snapshot', 'dag_id': 'snapshot_table_update', 'execution_date':
> <Pendulum [2019-07-03T08:30:46.221467+00:00]>, 'start_date':
> datetime.datetime(2019, 7, 3, 9, 1, 19, 436714, tzinfo=<TimezoneInfo [UTC,
> GMT, +00:00:00, STD]>), 'end_date': datetime.datetime(2019, 7, 3, 9, 6, 33,
> 337921, tzinfo=<Timezone [UTC]>), 'duration': 313.901207}] (Background on
> this error at: http://sqlalche.me/e/f405)
> {code}
> I'll attach a PR to this ticket in a sec.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)