[
https://issues.apache.org/jira/browse/AIRFLOW-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liwei li closed AIRFLOW-3180.
-----------------------------
Resolution: Not A Bug
> Chinese characters all become gibberish when using BashOperator beeline
> command insert data into hive table
> --------------------------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-3180
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3180
> Project: Apache Airflow
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: liwei li
> Priority: Blocker
>
> with BashOperator ,i use beeline to insert data into hive ,hql with chinese
> characters ,after dag run success,hive data contain unreadable code.
> python :
> {code:java}
> # -*- coding: utf-8 -*-
> import airflow
> from airflow.models import DAG
> from airflow.operators.bash_operator import BashOperator
> from airflow.operators.python_operator import BranchPythonOperator
> from datetime import datetime
> import time
> from datetime import timedelta
> import sys
> import pendulum
> local_tz = pendulum.timezone("Asia/Shanghai")
> reload(sys)
> sys.setdefaultencoding('utf-8')
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past':False,
> 'start_date':datetime(2018,10,9,19,22,20,tzinfo=local_tz),
> 'retries':0
> }
> dag = DAG(
> 'inserthiveutf8',
> default_args=default_args,
> description='null',
> catchup=False,
> schedule_interval=None
> )
> adf37 = r"""
> beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p "*****"
> -e "insert into di_zz.tt_wms_inout_detail_new(fac_id) values ('中')"
> """
> abcd8491539084126613 =BashOperator(
> task_id='abcd8491539084126613',
> bash_command=adf37,
> dag=dag){code}
> i have tried this:
> {code:java}
> abcd8491539084126613 =BashOperator( task_id='abcd8491539084126613',
> bash_command="sh ~/insert.sh ", dag=dag){code}
> this:
> {code:java}
> export LANG=en_US.UTF-8
> beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p "*****" -e
> 'insert into di_zz.tt_wms_inout_detail_new(fac_id) values ("中")'{code}
> this:
> {code:java}
> beeline -u "jdbc:hive2://10.138.***.***:30010/di_zz" -n "*****" -p "*****" -f
> ~/hql.sql{code}
> log:
> {code:java}
> [2018-10-09 21:00:58,485] {bash_operator.py:110} INFO - INFO : Compiling
> command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9):
> insert into di_zz.tt_wms_inout_detail_new(fac_id) values ("???")
> [2018-10-09 21:00:58,485] {bash_operator.py:110} INFO - INFO : Semantic
> Analysis Completed
> [2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO : Returning
> Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:timestamp,
> comment:null), FieldSchema(name:_col1, type:string, comment:null),
> FieldSchema(name:_col2, type:void, comment:null), FieldSchema(name:_col3,
> type:void, comment:null), FieldSchema(name:_col4, type:void, comment:null),
> FieldSchema(name:_col5, type:void, comment:null), FieldSchema(name:_col6,
> type:void, comment:null), FieldSchema(name:_col7, type:bigint, comment:null),
> FieldSchema(name:_col8, type:bigint, comment:null), FieldSchema(name:_col9,
> type:bigint, comment:null), FieldSchema(name:_col10, type:void,
> comment:null), FieldSchema(name:_col11, type:timestamp, comment:null)],
> properties:null)
> [2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO : Completed
> compiling
> command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9);
> Time taken: 0.291 seconds
> [2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO : Executing
> command(queryId=hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9):
> insert into di_zz.tt_wms_inout_detail_new(fac_id) values ("???")
> [2018-10-09 21:00:58,486] {bash_operator.py:110} INFO - INFO : Query ID =
> hive_20181009210000_89390a92-c4de-413f-9958-4d7da1065ef9
> {code}
> data:
> {code:java}
> +------------------------------------+---------------------------------+-----------------------------------+----------------------------------+------------------------------------+------------------------------------+---------------------------------------+-----------------------------------+----------------------------------+-----------------------------------+-------------------------------------------+--------------------------------------+--+
> | tt_wms_inout_detail_new.stat_date | tt_wms_inout_detail_new.fac_id |
> tt_wms_inout_detail_new.fac_name | tt_wms_inout_detail_new.ware_id |
> tt_wms_inout_detail_new.ware_name | tt_wms_inout_detail_new.ware_type |
> tt_wms_inout_detail_new.product_code | tt_wms_inout_detail_new.ware_cnt |
> tt_wms_inout_detail_new.ware_in | tt_wms_inout_detail_new.ware_out |
> tt_wms_inout_detail_new.sap_factory_name |
> tt_wms_inout_detail_new.di_etl_date |
> +------------------------------------+---------------------------------+-----------------------------------+----------------------------------+------------------------------------+------------------------------------+---------------------------------------+-----------------------------------+----------------------------------+-----------------------------------+-------------------------------------------+--------------------------------------+--+
> | NULL | ��� | NULL
> | NULL | NULL
> | NULL | NULL
> | NULL | NULL
> | NULL | NULL
> | NULL |
> {code}
> this is a simple example .the production script is comlex more
> Thanks in advance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)