[ https://issues.apache.org/jira/browse/AIRFLOW-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481681#comment-16481681 ]
ASF subversion and git services commented on AIRFLOW-2448: ---------------------------------------------------------- Commit 67b351183b0f85e9484f1f7f70e0b46300753b60 in incubator-airflow's branch refs/heads/master from [~sekikn] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=67b3511 ] [AIRFLOW-2448] Enhance HiveCliHook.load_df to work with datetime HiveCliHook.load_df can not handle DataFrame which contains datetime for now. This PR enhances it to work with datetime, fixes some bug introduced by AIRFLOW-2441, and addresses some flake8 issues. Closes #3364 from sekikn/AIRFLOW-2448 > Enhance HiveCliHook.load_df to work with datetime > ------------------------------------------------- > > Key: AIRFLOW-2448 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2448 > Project: Apache Airflow > Issue Type: Improvement > Components: hive_hooks, hooks > Reporter: Kengo Seki > Assignee: Kengo Seki > Priority: Major > Fix For: 2.0.0 > > > I tried to load DataFrame which contains time-series data into Hive via > HiveCliHook.load_df, but it failed: > {code} > In [1]: import pandas as pd > In [2]: from datetime import datetime, timedelta > In [3]: df = pd.DataFrame({"t": [datetime(2018, 1, 1) + timedelta(i) for i in > range(0, 10)], "v": range(0, 10)}) > In [4]: df > Out[4]: > t v > 0 2018-01-01 0 > 1 2018-01-02 1 > 2 2018-01-03 2 > 3 2018-01-04 3 > 4 2018-01-05 4 > 5 2018-01-06 5 > 6 2018-01-07 6 > 7 2018-01-08 7 > 8 2018-01-09 8 > 9 2018-01-10 9 > In [5]: from airflow.hooks.hive_hooks import HiveCliHook > In [6]: hook = HiveCliHook() > [2018-05-10 10:29:40,600] {base_hook.py:85} INFO - Using connection to: > localhost > In [7]: hook.load_df(df, "ts") > --------------------------------------------------------------------------- > KeyError Traceback (most recent call last) > <ipython-input-7-7a7e58740159> in <module>() > ----> 1 hook.load_df(df, "ts") > /home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in > load_df(self, df, table, create, recreate, field_dict, delimiter, encoding, > pandas_kwargs, **kwargs) > 335 > 336 if field_dict is None and (create or recreate): > --> 337 field_dict = _infer_field_types_from_df(df) > 338 > 339 df.to_csv(path_or_buf=f, > /home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in > _infer_field_types_from_df(df) > 326 } > 327 > --> 328 return dict((col, DTYPE_KIND_HIVE_TYPE[dtype.kind]) for > col, dtype in df.dtypes.iteritems()) > 329 > 330 if pandas_kwargs is None: > /home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in > <genexpr>((col, dtype)) > 326 } > 327 > --> 328 return dict((col, DTYPE_KIND_HIVE_TYPE[dtype.kind]) for > col, dtype in df.dtypes.iteritems()) > 329 > 330 if pandas_kwargs is None: > KeyError: 'M' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)