[ https://issues.apache.org/jira/browse/AIRFLOW-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16727950#comment-16727950 ]
jack commented on AIRFLOW-3047: ------------------------------- [~vladglinskiy] can you submit PR for this? > HiveCliHook does not work properly with Beeline > ----------------------------------------------- > > Key: AIRFLOW-3047 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3047 > Project: Apache Airflow > Issue Type: Bug > Components: hive_hooks, hooks > Affects Versions: 1.10.0 > Reporter: Vladislav Glinskiy > Priority: Major > > Simple _HiveOperator_ does not work properly in the case when > _hive_cli_default_ connection configured to use _Beeline_. > > *Steps to reproduce:* > 1. Setup Hive/HiveServer2 and Airflow environment with _beeline_ in _PATH_ > 2. Create test _datetimes_ table > As example: > {code:java} > CREATE EXTERNAL TABLE datetimes ( > datetimes STRING) > STORED AS PARQUET > LOCATION '/opt/apps/datetimes';{code} > > 3. Edit _hive_cli_default_ connection: > {code:java} > airflow connections --delete --conn_id hive_cli_default > airflow connections --add --conn_id hive_cli_default --conn_type hive_cli > --conn_host $HOST --conn_port 10000 --conn_schema default --conn_login > $CONN_LOGIN --conn_password $CONN_PASSWORD --conn_extra "{\"use_beeline\": > true, \"auth\": \"null;user=$HS_USER;password=$HS_PASSWORD\"}" > {code} > Set variables according to your environment. > > 4. Create simple DAG: > {code:java} > """ > ### > Sample DAG, which declares single Hive task. > """ > import datetime > import airflow > from airflow import DAG > from airflow.operators.hive_operator import HiveOperator > from datetime import timedelta > default_args = { > 'owner': 'airflow', > 'depends_on_past': False, > 'start_date': airflow.utils.dates.days_ago(0, hour=0, minute=0, second=1), > 'email': ['airf...@example.com'], > 'email_on_failure': False, > 'email_on_retry': False, > 'retries': 1, > 'retry_delay': timedelta(minutes=5), > 'provide_context': True > } > dag = DAG( > 'hive_task_dag', > default_args=default_args, > description='Single task DAG', > schedule_interval=timedelta(minutes=15)) > insert_current_datetime = HiveOperator( > task_id='insert_current_datetime_task', > hql="insert into table datetimes values ('" + > datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y") + "');", > dag=dag) > dag.doc_md = __doc__ > {code} > > 5. Trigger DAG execution. Ensure that DAG completes successfully. > 6. Check _datetimes_ table. It will be empty. > > As it turned out the issue is caused by an invalid temporary script file. The > problem will be fixed if we add new-line character at the end of the script. > So, a possible fix is to change: > *hive_hooks.py:182* > {code:java} > if schema: > hql = "USE {schema};\n{hql}".format(**locals()) > {code} > to > {code:java} > if schema: > hql = "USE {schema};\n{hql}\n".format(**locals()) > {code} > Don't know how it can affect _hive shell_ queries since it is tested only > against _beeline_. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)