liwei li created AIRFLOW-3180:
---------------------------------
Summary: Chinese characters all become gibberish when using
BashOperator beeline command insert data into hive table
Key: AIRFLOW-3180
URL: https://issues.apache.org/jira/browse/AIRFLOW-3180
Project: Apache Airflow
Issue Type: Bug
Affects Versions: 1.10.0
Reporter: liwei li
with BashOperator ,i use beeline to insert data into hive ,hql with chinese
characters ,after dag run success,hive data contain unreadable code.
python :
{code:java}
# -*- coding: utf-8 -*-
import airflow
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import BranchPythonOperator
from datetime import datetime
import time
from datetime import timedelta
import sys
import pendulum
local_tz = pendulum.timezone("Asia/Shanghai")
reload(sys)
sys.setdefaultencoding('utf-8')
default_args = {
'owner': 'airflow',
'depends_on_past':False,
'start_date':datetime(2018,10,9,19,22,20,tzinfo=local_tz),
'retries':0
}
dag = DAG(
'inserthiveutf8',
default_args=default_args,
description='null',
catchup=False,
schedule_interval=None
)
adf37 = r"""
beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p "*****" -e
"insert into di_zz.tt_wms_inout_detail_new(fac_id) values ('中')"
"""
abcd8491539084126613 =BashOperator(
task_id='abcd8491539084126613',
bash_command=adf37,
dag=dag){code}
i have tried this:
{code:java}
abcd8491539084126613 =BashOperator( task_id='abcd8491539084126613',
bash_command="sh ~/insert.sh ", dag=dag){code}
this:
{code:java}
export LANG=en_US.UTF-8
beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p "*****" -e
'insert into di_zz.tt_wms_inout_detail_new(fac_id) values ("中")'{code}
this:
{code:java}
beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p "*****" -f
~/hql.sql{code}
log:
{code:java}
[2018-10-09 21:00:58,485] {bash_operator.py:110} INFO - INFO : Compiling
command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9):
insert into di_zz.tt_wms_inout_detail_new(fac_id) values ("???")
[2018-10-09 21:00:58,485] {bash_operator.py:110} INFO - INFO : Semantic
Analysis Completed
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO : Returning Hive
schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:timestamp,
comment:null), FieldSchema(name:_col1, type:string, comment:null),
FieldSchema(name:_col2, type:void, comment:null), FieldSchema(name:_col3,
type:void, comment:null), FieldSchema(name:_col4, type:void, comment:null),
FieldSchema(name:_col5, type:void, comment:null), FieldSchema(name:_col6,
type:void, comment:null), FieldSchema(name:_col7, type:bigint, comment:null),
FieldSchema(name:_col8, type:bigint, comment:null), FieldSchema(name:_col9,
type:bigint, comment:null), FieldSchema(name:_col10, type:void, comment:null),
FieldSchema(name:_col11, type:timestamp, comment:null)], properties:null)
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO : Completed
compiling
command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9); Time
taken: 0.291 seconds
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO : Executing
command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9):
insert into di_zz.tt_wms_inout_detail_new(fac_id) values ("???")
[2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO : Query ID =
hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9
{code}
data:
{code:java}
+------------------------------------+---------------------------------+-----------------------------------+----------------------------------+------------------------------------+------------------------------------+---------------------------------------+-----------------------------------+----------------------------------+-----------------------------------+-------------------------------------------+--------------------------------------+--+
| tt_wms_inout_detail_new.stat_date | tt_wms_inout_detail_new.fac_id |
tt_wms_inout_detail_new.fac_name | tt_wms_inout_detail_new.ware_id |
tt_wms_inout_detail_new.ware_name | tt_wms_inout_detail_new.ware_type |
tt_wms_inout_detail_new.product_code | tt_wms_inout_detail_new.ware_cnt |
tt_wms_inout_detail_new.ware_in | tt_wms_inout_detail_new.ware_out |
tt_wms_inout_detail_new.sap_factory_name | tt_wms_inout_detail_new.di_etl_date
|
+------------------------------------+---------------------------------+-----------------------------------+----------------------------------+------------------------------------+------------------------------------+---------------------------------------+-----------------------------------+----------------------------------+-----------------------------------+-------------------------------------------+--------------------------------------+--+
| NULL | ��� | NULL
| NULL | NULL
| NULL | NULL
| NULL | NULL
| NULL | NULL
| NULL |
{code}
Thanks in advance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)