On Mon, Jul 25, 2016 at 7:30 AM, Yuanhao Luo <[email protected]>
wrote:

>
> Hello, Jim Apple:
> I can't find any tests for case *escape character is the same value as
> field delimiter* and case *escape character is the same value as tuple
> delimiter *from
> testdata/workloads/functional-query/queries/QueryTest/delimited-text.test.
> I ran some tests on branch cdh5-trunk(commit id: 50a7ba059), and logs
> below show that even though we have already add warning "*WARNINGS: Field
> delimiter and escape character have same value. Escape character will be
> ignored*" and "*WARNINGS: Line delimiter and escape character have same
> value: . Escape character will be ignored*" for these two corner cases,
> but codes don't work as expected.
>

Can you describe what behavior you expected?


> It's a little difficult for me to fix these corner cases, so in my next
> patch, I'm going to enhance restriction as below:
>
> 1. Delimiters can't be an empty string.
> 2. Tuple delimiter can't be the first byte of field delimiter.
> 3. Escape char can't be the first byte of field delimiter.
> 4. Escape char and tuple delimiter can't be the same.
> 5. Delimiters can't contain '\0'.
>
> Whenever you change your planned design, we need to re-evaluate it against
the current code to see how it is different, especially any changes that
break currently working code.


> What's more, in my tests, I found that *sql-parser.cup can't parse
> unicode and octol of extended ASCII character(with decimal value from 128
> to 255)* correctly. For example, if we want to set "#@#" as fields
> terminator, we can use *fields terminated by '\u0023\100\043' , *which
> refers to ASCII *#@#* respectively. The parse result is right. However,
> when I want to set double *thorn(extended ASCII character with decimal
> value 254)* as field terminator, for example *fields terminated by
> '\u00fe\376',* it turns out to* '\u00A4376' *when I run* 'describe
> extended table'. *I have report this issue in IMPALA-3777
> <https://issues.cloudera.org/projects/IMPALA/issues/IMPALA-3777?filter=allissues>
> already.
>

I don't see a reason to bring that bug into this discussion. Do you?

Reply via email to