[
https://issues.apache.org/jira/browse/AIRFLOW-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759744#comment-16759744
]
Ash Berlin-Taylor edited comment on AIRFLOW-1632 at 2/4/19 10:18 AM:
---------------------------------------------------------------------
[~ashb] [~mikeghen]
This was not fixed.
First it's not a clone of: https://issues.apache.org/jira/browse/AIRFLOW-1586
AIRFLOW-1586 requests to add type DATE.
This issue is of type DATETIME which python mktime function can not convert
DATETIME 0001-01-01 00:00:00
It generates:
{code:java}
Subtask: Traceback (most recent call last):Subtask: File
"/usr/local/bin/airflow", line 27, in <module>
Subtask: args.func(args)
Subtask: File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py",
line 392, in runSubtask: pool=args.pool,
Subtask: File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py",
line 50, in wrapperSubtask: result = func(*args, **kwargs)
Subtask: File "/usr/local/lib/python2.7/dist-packages/airflow/models.py",
line 1493, in _run_raw_task
Subtask: result = task_copy.execute(context=context)
Subtask: File
"/usr/local/lib/python2.7/dist-packages/airflow/contrib/operators/mysql_to_gcs.py",
line 89, in execute
Subtask: files_to_upload = self._write_local_data_files(cursor)
Subtask: File
"/usr/local/lib/python2.7/dist-packages/airflow/contrib/operators/mysql_to_gcs.py",
line 130, in _write_local_data_files
Subtask: row = map(self.convert_types, row)
Subtask: File
"/usr/local/lib/python2.7/dist-packages/airflow/contrib/operators/mysql_to_gcs.py",
line 194, in convert_types
Subtask: return time.mktime(value.timetuple())
Subtask: ValueError: year out of range{code}
The value 0001-01-01 00:00:00 is valid for MySQL DATETIME butis not handled
by mktime
was (Author: jackjack10):
[~ashb] [~mikeghen]
This was not fixed.
First it's not a clone of: https://issues.apache.org/jira/browse/AIRFLOW-1586
AIRFLOW-1586 requests to add type DATE.
This issue is of type DATETIME which python mktime function can not convert
DATETIME 0001-01-01 00:00:00
It generates:
{code:java}
Subtask: Traceback (most recent call last):Subtask: File
"/usr/local/bin/airflow", line 27, in <module>Subtask:
args.func(args)Subtask: File
"/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 392, in
runSubtask: pool=args.pool,Subtask: File
"/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 50, in
wrapperSubtask: result = func(*args, **kwargs)Subtask: File
"/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1493, in
_run_raw_taskSubtask: result = task_copy.execute(context=context)Subtask:
File
"/usr/local/lib/python2.7/dist-packages/airflow/contrib/operators/mysql_to_gcs.py",
line 89, in executeSubtask: files_to_upload =
self._write_local_data_files(cursor)Subtask: File
"/usr/local/lib/python2.7/dist-packages/airflow/contrib/operators/mysql_to_gcs.py",
line 130, in _write_local_data_filesSubtask: row = map(self.convert_types,
row)Subtask: File
"/usr/local/lib/python2.7/dist-packages/airflow/contrib/operators/mysql_to_gcs.py",
line 194, in convert_typesSubtask: return
time.mktime(value.timetuple())Subtask: ValueError: year out of range{code}
The value 0001-01-01 00:00:00 is valid for MySQL DATETIME butis not handled
by mktime
> MySQL to GCS fails for date/datetime before ~1850
> -------------------------------------------------
>
> Key: AIRFLOW-1632
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1632
> Project: Apache Airflow
> Issue Type: Bug
> Components: gcp
> Environment: Google Cloud Platform
> Reporter: Michael Ghen
> Assignee: Michael Ghen
> Priority: Minor
>
> For tables in MySQL that use a "date" or "datetime" type, a dag that exports
> from MySQL to Google Cloud Storage and then loads from GCS to BigQuery will
> fail when the dates are before 1970.
> When the table is exported as JSON to a GCS bucket, dates and datetimes are
> converted to timestamps using:
> {code}
> time.mktime(value.timetuple())
> {code}
> This creates a problem when you try parse a date that can't be converted to a
> UNIX timestamp. For example:
> {code}
> >>> value = datetime.date(1850,1,1)
> >>> time.mktime(value.timetuple())
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: year out of range
> {code}
> *Steps to reproduce*
> 0. Set up a MySQL connection and GCP connection in Airflow.
> 1. Create a MySQL table with a "date" field and put some data into the table.
> {code}
> CREATE TABLE table_with_date (
> date_field date,
> datetime_field datetime
> );
> INSERT INTO table_with_date (date_field, datetime_field) VALUES
> ('1850-01-01',NOW());
> {code}
> 2. Create a DAG that will export the data from the MySQL to GCS and then load
> from GCS to BigQuery (use the schema file). For example:
> {code}
> extract = MySqlToGoogleCloudStorageOperator(
> task_id="extract_table",
> mysql_conn_id='mysql_connection',
> google_cloud_storage_conn_id='gcp_connection',
> sql="SELECT * FROM table_with_date",
> bucket='gcs-bucket',
> filename='table_with_date.json',
> schema_filename='schemas/table_with_date.json',
> dag=dag)
> load = GoogleCloudStorageToBigQueryOperator(
> task_id="load_table",
> bigquery_conn_id='gcp_connection',
> google_cloud_storage_conn_id='gcp_connection',
> bucket='gcs-bucket',
> destination_project_dataset_table="dataset.table_with_date",
> source_objects=['table_with_date.json'],
> schema_object='schemas/table_with_date.json',
> source_format='NEWLINE_DELIMITED_JSON',
> create_disposition='CREATE_IF_NEEDED',
> write_disposition='WRITE_TRUNCATE',
> dag=dag)
> load.set_upstream(extract)
> {code}
> 3. Run the DAG
> Expected: The DAG runs successfully.
> Actual: The `extract_table` task fails with error:
> {code}
> ...
> ERROR - year out of range
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1374, in run
> result = task_copy.execute(context=context)
> File
> "/usr/lib/python2.7/site-packages/airflow/contrib/operators/mysql_to_gcs.py",
> line 91, in execute
> files_to_upload = self._write_local_data_files(cursor)
> File
> "/usr/lib/python2.7/site-packages/airflow/contrib/operators/mysql_to_gcs.py",
> line 132, in _write_local_data_files
> row = map(self.convert_types, row)
> File
> "/usr/lib/python2.7/site-packages/airflow/contrib/operators/mysql_to_gcs.py",
> line 196, in convert_types
> return time.mktime(value.timetuple())
> ValueError: year out of range
> ...
> {code}
> *Comments:*
> This is really a problem with Python not being able to handle years before
> like 1850. Bigquery timestamp seems to be able to take years all the way to
> year 0001. From,
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp-type,
> Timestamp range is:
> {quote}
> 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999 UTC.
> {quote}
> I think the fix is probably to keep date/datetime converting to timestamp but
> use `calendar.timegm`
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)