paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table functions URL: https://github.com/apache/drill/pull/2054#issuecomment-615470669 A similar analysis applies to the other queries. For `drill-3149_1`: ``` select columns[0] from table(`table_function/cr_lf.csv`(type=>'text', lineDelimiter=>'\r\n')) ``` Expected (confusing: each line is one field, `columns[0]`): ``` 1,aaa,bbb 2,ccc,ddd 3,eee, 4,fff,ggg ``` New results with column as the field delimiter rather than newline: ``` 1 2 3 4 ``` Next query, `drill-3149_4`: ``` select * from table(`table_function/lf_cr.tsv`(type=>'text', lineDelimiter=>'\n\r')) ``` Expected: ``` ["1\taaa\tbbb"] ["2\tccc\tddd"] ["3\teee\t"] ["4\tfff\tggg"] ``` Actual: ``` ["1","aaa","bbb"] ["2","ccc","ddd"] ["3","eee",""] ["4","fff","ggg"] ``` Notice how the expected results are wrong. We are using a tsv (tab separated file) but we expect the query to treat the tab as a normal character. With this PR, we change only the line delimiter, not the field delimiter, which seems more accurate. Finally, `drill-3149_13`: ``` select * from table(`table_function/chinese.txt`(type=>'text',lineDelimiter=>'η΅θεδΊ')) ``` Expected (all columns in a single field): ``` ["1,aaa,bbb"] ["2,ccc,ddd"] ["3,eee,"] ["4,fff,ggg"] ``` With this PR we get the correct results: ``` ["1","aaa","bbb"] ["2","ccc","ddd"] ["3","eee",""] ["4","fff","ggg"] ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
