On Mon, Jul 25, 2016 at 7:30 AM, Yuanhao Luo <[email protected]> wrote:
> > Hello, Jim Apple: > I can't find any tests for case *escape character is the same value as > field delimiter* and case *escape character is the same value as tuple > delimiter *from > testdata/workloads/functional-query/queries/QueryTest/delimited-text.test. > I ran some tests on branch cdh5-trunk(commit id: 50a7ba059), and logs > below show that even though we have already add warning "*WARNINGS: Field > delimiter and escape character have same value. Escape character will be > ignored*" and "*WARNINGS: Line delimiter and escape character have same > value: . Escape character will be ignored*" for these two corner cases, > but codes don't work as expected. > Can you describe what behavior you expected? > It's a little difficult for me to fix these corner cases, so in my next > patch, I'm going to enhance restriction as below: > > 1. Delimiters can't be an empty string. > 2. Tuple delimiter can't be the first byte of field delimiter. > 3. Escape char can't be the first byte of field delimiter. > 4. Escape char and tuple delimiter can't be the same. > 5. Delimiters can't contain '\0'. > > Whenever you change your planned design, we need to re-evaluate it against the current code to see how it is different, especially any changes that break currently working code. > What's more, in my tests, I found that *sql-parser.cup can't parse > unicode and octol of extended ASCII character(with decimal value from 128 > to 255)* correctly. For example, if we want to set "#@#" as fields > terminator, we can use *fields terminated by '\u0023\100\043' , *which > refers to ASCII *#@#* respectively. The parse result is right. However, > when I want to set double *thorn(extended ASCII character with decimal > value 254)* as field terminator, for example *fields terminated by > '\u00fe\376',* it turns out to* '\u00A4376' *when I run* 'describe > extended table'. *I have report this issue in IMPALA-3777 > <https://issues.cloudera.org/projects/IMPALA/issues/IMPALA-3777?filter=allissues> > already. > I don't see a reason to bring that bug into this discussion. Do you?
