liwei li created AIRFLOW-3180:
---------------------------------

             Summary: Chinese characters all become gibberish when using  
BashOperator  beeline command insert data  into hive table
                 Key: AIRFLOW-3180
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3180
             Project: Apache Airflow
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: liwei li


with BashOperator ,i use beeline to insert data into hive ,hql with chinese 
characters ,after dag run success,hive data contain unreadable code.

python :
{code:java}
# -*- coding: utf-8 -*-
import airflow
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import BranchPythonOperator
from datetime import datetime
import time
from datetime import timedelta
import sys

import pendulum
local_tz = pendulum.timezone("Asia/Shanghai")

reload(sys)
sys.setdefaultencoding('utf-8')

default_args = {
              'owner': 'airflow',
              'depends_on_past':False,
              'start_date':datetime(2018,10,9,19,22,20,tzinfo=local_tz),
              'retries':0
}

dag = DAG(
            'inserthiveutf8',
             default_args=default_args,
             description='null',
              catchup=False,
             schedule_interval=None
)


adf37 = r"""
beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p "*****"  -e 
"insert into   di_zz.tt_wms_inout_detail_new(fac_id) values ('中')"

  """
abcd8491539084126613 =BashOperator(
               task_id='abcd8491539084126613',
               bash_command=adf37,
               dag=dag){code}
i have tried this:
{code:java}
abcd8491539084126613 =BashOperator( task_id='abcd8491539084126613', 
bash_command="sh ~/insert.sh ", dag=dag){code}
this:
{code:java}
export LANG=en_US.UTF-8 
beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p "*****" -e 
'insert into di_zz.tt_wms_inout_detail_new(fac_id) values ("中")'{code}
this:
{code:java}
beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p "*****" -f 
~/hql.sql{code}
log:
{code:java}
[2018-10-09 21:00:58,485] {bash_operator.py:110} INFO - INFO  : Compiling 
command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9): 
insert into   di_zz.tt_wms_inout_detail_new(fac_id) values ("???")
[2018-10-09 21:00:58,485] {bash_operator.py:110} INFO - INFO  : Semantic 
Analysis Completed
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO  : Returning Hive 
schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:timestamp, 
comment:null), FieldSchema(name:_col1, type:string, comment:null), 
FieldSchema(name:_col2, type:void, comment:null), FieldSchema(name:_col3, 
type:void, comment:null), FieldSchema(name:_col4, type:void, comment:null), 
FieldSchema(name:_col5, type:void, comment:null), FieldSchema(name:_col6, 
type:void, comment:null), FieldSchema(name:_col7, type:bigint, comment:null), 
FieldSchema(name:_col8, type:bigint, comment:null), FieldSchema(name:_col9, 
type:bigint, comment:null), FieldSchema(name:_col10, type:void, comment:null), 
FieldSchema(name:_col11, type:timestamp, comment:null)], properties:null)
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO  : Completed 
compiling 
command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9); Time 
taken: 0.291 seconds
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO  : Executing 
command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9): 
insert into   di_zz.tt_wms_inout_detail_new(fac_id) values ("???")
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO  : Query ID = 
hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9
{code}
data:
{code:java}
+------------------------------------+---------------------------------+-----------------------------------+----------------------------------+------------------------------------+------------------------------------+---------------------------------------+-----------------------------------+----------------------------------+-----------------------------------+-------------------------------------------+--------------------------------------+--+
| tt_wms_inout_detail_new.stat_date  | tt_wms_inout_detail_new.fac_id  | 
tt_wms_inout_detail_new.fac_name  | tt_wms_inout_detail_new.ware_id  | 
tt_wms_inout_detail_new.ware_name  | tt_wms_inout_detail_new.ware_type  | 
tt_wms_inout_detail_new.product_code  | tt_wms_inout_detail_new.ware_cnt  | 
tt_wms_inout_detail_new.ware_in  | tt_wms_inout_detail_new.ware_out  | 
tt_wms_inout_detail_new.sap_factory_name  | tt_wms_inout_detail_new.di_etl_date 
 |
+------------------------------------+---------------------------------+-----------------------------------+----------------------------------+------------------------------------+------------------------------------+---------------------------------------+-----------------------------------+----------------------------------+-----------------------------------+-------------------------------------------+--------------------------------------+--+
| NULL                               | ���                             | NULL   
                           | NULL                             | NULL            
                   | NULL                               | NULL                  
                | NULL                              | NULL                      
       | NULL                              | NULL                               
       | NULL                                 |
{code}
Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to