[ 
https://issues.apache.org/jira/browse/AIRFLOW-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16727950#comment-16727950
 ] 

jack commented on AIRFLOW-3047:
-------------------------------

[~vladglinskiy] can you submit PR for this?

> HiveCliHook does not work properly with Beeline
> -----------------------------------------------
>
>                 Key: AIRFLOW-3047
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3047
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hive_hooks, hooks
>    Affects Versions: 1.10.0
>            Reporter: Vladislav Glinskiy
>            Priority: Major
>
> Simple _HiveOperator_ does not work properly in the case when 
> _hive_cli_default_ connection configured to use _Beeline_.
>  
> *Steps to reproduce:* 
> 1. Setup Hive/HiveServer2 and Airflow environment with _beeline_ in _PATH_
> 2. Create test _datetimes_ table
> As example:
> {code:java}
> CREATE EXTERNAL TABLE datetimes (
> datetimes STRING)
> STORED AS PARQUET
> LOCATION '/opt/apps/datetimes';{code}
>  
> 3. Edit _hive_cli_default_ connection:
> {code:java}
> airflow connections --delete --conn_id hive_cli_default
> airflow connections --add --conn_id hive_cli_default --conn_type hive_cli 
> --conn_host $HOST --conn_port 10000 --conn_schema default --conn_login 
> $CONN_LOGIN --conn_password $CONN_PASSWORD --conn_extra "{\"use_beeline\": 
> true, \"auth\": \"null;user=$HS_USER;password=$HS_PASSWORD\"}"
> {code}
> Set variables according to your environment.
>  
> 4. Create simple DAG:
> {code:java}
> """
> ###
> Sample DAG, which declares single Hive task.
> """
> import datetime
> import airflow
> from airflow import DAG
> from airflow.operators.hive_operator import HiveOperator
> from datetime import timedelta
> default_args = {
>   'owner': 'airflow',
>   'depends_on_past': False,
>   'start_date': airflow.utils.dates.days_ago(0, hour=0, minute=0, second=1),
>   'email': ['airf...@example.com'],
>   'email_on_failure': False,
>   'email_on_retry': False,
>   'retries': 1,
>   'retry_delay': timedelta(minutes=5),
>   'provide_context': True
> }
> dag = DAG(
>     'hive_task_dag',
>     default_args=default_args,
>     description='Single task DAG',
>     schedule_interval=timedelta(minutes=15))
> insert_current_datetime = HiveOperator(
>     task_id='insert_current_datetime_task',
>     hql="insert into table datetimes values ('" + 
> datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y") + "');",
>     dag=dag)
> dag.doc_md = __doc__
> {code}
>  
> 5. Trigger DAG execution. Ensure that DAG completes successfully.
> 6. Check _datetimes_ table. It will be empty.
>  
> As it turned out the issue is caused by an invalid temporary script file. The 
> problem will be fixed if we add new-line character at the end of the script.
> So, a possible fix is to change:
> *hive_hooks.py:182*
> {code:java}
> if schema:
>     hql = "USE {schema};\n{hql}".format(**locals())
> {code}
> to
> {code:java}
> if schema:
>     hql = "USE {schema};\n{hql}\n".format(**locals())
> {code}
> Don't know how it can affect _hive shell_ queries since it is tested only 
> against _beeline_.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to