[
https://issues.apache.org/jira/browse/IMPALA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273245#comment-17273245
]
Quanlong Huang commented on IMPALA-3777:
----------------------------------------
I can't reproduce the issue now. It's confused for me why the delimiter can be
mutil-bytes in the description. It does cause an AnalysisException:
{code:java}
$ bin/impala-shell.sh
Starting Impala Shell with no authentication using Python 2.7.16
Opened TCP connection to localhost:21050
Connected to localhost:21050
Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build
08367e91f04508b54f77b56e0d211dd167b0116f)
***********************************************************************************
Welcome to the Impala shell.
(impala shell build version not available)
To see live updates on a query's progress, run 'set LIVE_SUMMARY=1;'.
***********************************************************************************
[localhost:21050] default> create table unicode_parse_error(id int) row format
delimited fields terminated by '\u0023##';
Query: create table unicode_parse_error(id int) row format delimited fields
terminated by '\u0023##'
ERROR: AnalysisException: ESCAPED BY values and LINE/FIELD terminators must be
specified as a single character or as a decimal value in the range [-128:127]:
###
{code}
For using '\u0023' as the delimiter, it's ok and work as expected:
{code:java}
[localhost:21050] default> create table unicode_parse_error(id int) row format
delimited fields terminated by '\u0023';
Query: create table unicode_parse_error(id int) row format delimited fields
terminated by '\u0023'
+-------------------------+
| summary |
+-------------------------+
| Table has been created. |
+-------------------------+
Fetched 1 row(s) in 0.14s
[localhost:21050] default> describe extended unicode_parse_error;
Query: describe extended unicode_parse_error
+------------------------------+------------------------------------------------------------+----------------------+
| name | type
| comment |
+------------------------------+------------------------------------------------------------+----------------------+
| # col_name | data_type
| comment |
| | NULL
| NULL |
| id | int
| NULL |
| | NULL
| NULL |
| # Detailed Table Information | NULL
| NULL |
| Database: | default
| NULL |
| OwnerType: | USER
| NULL |
| Owner: | quanlong
| NULL |
| CreateTime: | Thu Jan 28 09:18:52 CST 2021
| NULL |
| LastAccessTime: | UNKNOWN
| NULL |
| Retention: | 0
| NULL |
| Location: |
hdfs://localhost:20500/test-warehouse/unicode_parse_error | NULL
|
| Table Type: | EXTERNAL_TABLE
| NULL |
| Table Parameters: | NULL
| NULL |
| | EXTERNAL
| TRUE |
| | OBJCAPABILITIES
| EXTREAD,EXTWRITE |
| | TRANSLATED_TO_EXTERNAL
| TRUE |
| | external.table.purge
| TRUE |
| | transient_lastDdlTime
| 1611796732 |
| | NULL
| NULL |
| # Storage Information | NULL
| NULL |
| SerDe Library: |
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL
|
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat
| NULL |
| OutputFormat: |
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL
|
| Compressed: | No
| NULL |
| Num Buckets: | 0
| NULL |
| Bucket Columns: | []
| NULL |
| Sort Columns: | []
| NULL |
| Storage Desc Params: | NULL
| NULL |
| | field.delim
| # |
| | serialization.format
| # |
| | NULL
| NULL |
| # Constraints | NULL
| NULL |
+------------------------------+------------------------------------------------------------+----------------------+
Fetched 33 row(s) in 4.54s
{code}
> SqlParser parsed error for unicode
> ----------------------------------
>
> Key: IMPALA-3777
> URL: https://issues.apache.org/jira/browse/IMPALA-3777
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.2.4
> Environment: CentOS 6.7 64 bit. impalad version 2.7.0-cdh5-INTERNAL
> DEBUG
> Reporter: Yuanhao Luo
> Priority: Minor
> Labels: correctness, downgraded
> Attachments: After calling SqlParser.parse.JPG, Before calling
> SqlParser.parse.JPG
>
>
> When I run query:create table unicode_parse_error(id int) row format
> delimited fields terminated by '\u0023##'; the field delimiter becomes to
> '\u0017##'.
> Logs:
> {noformat}
> [nobida147:21000] > create table unicode_parse_error(id int) row format
> delimited fields terminated by '\u0023##';
> Query: create table unicode_parse_error(id int) row format delimited fields
> terminated by '\u0023##'
> Fetched 0 row(s) in 242.44s
> [nobida147:21000] > describe extended unicode_parse_error;
> Query: describe extended unicode_parse_error
> +------------------------------+------------------------------------------------------------------+----------------------+
> | name | type
> | comment |
> +------------------------------+------------------------------------------------------------------+----------------------+
> | # col_name | data_type
> | comment |
> | | NULL
> | NULL |
> | id | int
> | NULL |
> | | NULL
> | NULL |
> | # Detailed Table Information | NULL
> | NULL |
> | Database: | db1
> | NULL |
> | Owner: | root
> | NULL |
> | CreateTime: | Thu Jun 23 15:54:20 CST 2016
> | NULL |
> | LastAccessTime: | UNKNOWN
> | NULL |
> | Protect Mode: | None
> | NULL |
> | Retention: | 0
> | NULL |
> | Location: |
> hdfs://localhost:20500/test-warehouse/db1.db/unicode_parse_error | NULL
> |
> | Table Type: | MANAGED_TABLE
> | NULL |
> | Table Parameters: | NULL
> | NULL |
> | | transient_lastDdlTime
> | 1466668460 |
> | | NULL
> | NULL |
> | # Storage Information | NULL
> | NULL |
> | SerDe Library: |
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL
> |
> | InputFormat: | org.apache.hadoop.mapred.TextInputFormat
> | NULL |
> | OutputFormat: |
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL
> |
> | Compressed: | No
> | NULL |
> | Num Buckets: | 0
> | NULL |
> | Bucket Columns: | []
> | NULL |
> | Sort Columns: | []
> | NULL |
> | Storage Desc Params: | NULL
> | NULL |
> | | field.delim
> | \u0017## |
> | | serialization.format
> | \u0017## |
> +------------------------------+------------------------------------------------------------------+----------------------+
> Fetched 27 row(s) in 4.77s
> {noformat}
> After debugging, it seems that SqlParser.parse() goes wrong. As attachment
> shows, before calling SqlParse.parse() the statement is: fields terminated by
> '\u0023##' , but after parsing, it becomes '\u0017##'
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]