[
https://issues.apache.org/jira/browse/TAJO-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350324#comment-14350324
]
ASF GitHub Bot commented on TAJO-1339:
--------------------------------------
Github user jihoonson commented on the pull request:
https://github.com/apache/tajo/pull/377#issuecomment-77556909
@sirpkt thanks for your work.
I have a couple of comments.
First, ```CSVFILE_DELIMITER``` is deprecated. If there are any bugs related
to ```TEXT_DELIMITER```, we must fix it instead of using
```CSVFILE_DELIMITER```.
In addition, I have a question about the following your description.
```
Without this, when two tables, one is 'csvfile.delimiter'=';' and the other
has no option, are used in the same query, one is set as TEXT_DELIMITER = ';'
and the other is set as TEXT_DELIMITER = '|' and compete for the meta option of
the Execution block.
```
If only one table is configured with an explicitly different delimiter, two
tables must be parsed with different delimiters. If I misunderstand anything,
please let me know.
Second, it is hard to unserstand for me why ```CSVFILE_DELIMITER``` or
```TEXT_DELIMITER``` is required to store intermediate results. As you know,
the default file type of intermediate results is ```RAW```, not ```CSV``` or
```TEXT```.
Would you mind explaining more detailed backgrounds?
> Incorrect handling of tables with custom delimiter when their data contain '|'
> ------------------------------------------------------------------------------
>
> Key: TAJO-1339
> URL: https://issues.apache.org/jira/browse/TAJO-1339
> Project: Tajo
> Issue Type: Bug
> Reporter: Keuntae Park
> Assignee: Keuntae Park
>
> With the table data
> {code}
> 1;a;1.1
> 2;a|b;2.4
> 3;b|c|d;3.2
> {code}
> and external table declaration
> {code}
> create external table delimiter (id int, name text, score float) using csv
> with ('csvfile.delimiter'=';') location 'xxx';
> {code}
> , I got the following incorrect query result for query 'select name, score
> from delimiter'
> {code}
> name,score
> -------------------------------
> a,1.1
> a,null
> b,null
> {code}
> It looks like '|' in name column is recognized as delimiter.
> As I inspect the code,
> table meta information like 'csvfile.delimiter' is only valid on leaf scan
> operation and all the other operations (including making intermediate data
> and materialize query result) assumes that delimiter is
> DEFAULT_FIELD_DELIMITER, which is '|'.
> Hence, if the plan has the process of making intermediate data,
> it handles '|' in the data as a delimiter even though it is not actually.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)