paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table 
functions
URL: https://github.com/apache/drill/pull/2054#issuecomment-615470669
 
 
   A similar analysis applies to the other queries. For `drill-3149_1`:
   
   ```
   select columns[0] from table(`table_function/cr_lf.csv`(type=>'text', 
lineDelimiter=>'\r\n'))
   ```
   
   Expected (confusing: each line is one field, `columns[0]`):
   
   ```
   1,aaa,bbb
   2,ccc,ddd
   3,eee,
   4,fff,ggg
   ```
   
   New results with column as the field delimiter rather than newline:
   
   ```
   1
   2
   3
   4
   ```
   
   Next query, `drill-3149_4`:
   
   ```
   select * from table(`table_function/lf_cr.tsv`(type=>'text', 
lineDelimiter=>'\n\r'))
   ```
   
   Expected:
   
   ```
   ["1\taaa\tbbb"]
   ["2\tccc\tddd"]
   ["3\teee\t"]
   ["4\tfff\tggg"]
   ```
   
   Actual:
   
   ```
   ["1","aaa","bbb"]
   ["2","ccc","ddd"]
   ["3","eee",""]
   ["4","fff","ggg"]
   ```
   
   Notice how the expected results are wrong. We are using a tsv (tab separated 
file) but we expect the query to treat the tab as a normal character. With this 
PR, we change only the line delimiter, not the field delimiter, which seems 
more accurate.
   
   Finally, `drill-3149_13`:
   
   ```
   select * from 
table(`table_function/chinese.txt`(type=>'text',lineDelimiter=>'甡脑坏了'))
   ```
   
   Expected (all columns in a single field):
   
   ```
   ["1,aaa,bbb"]
   ["2,ccc,ddd"]
   ["3,eee,"]
   ["4,fff,ggg"]
   ```
   
   With this PR we get the correct results:
   
   ```
   ["1","aaa","bbb"]
   ["2","ccc","ddd"]
   ["3","eee",""]
   ["4","fff","ggg"]
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to