[jira] [Commented] (TAJO-1339) Incorrect handling of tables with custom delimiter when their data contain '|'

ASF GitHub Bot (JIRA) Fri, 06 Mar 2015 05:19:09 -0800

    [ 
https://issues.apache.org/jira/browse/TAJO-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350324#comment-14350324
 ]


ASF GitHub Bot commented on TAJO-1339:
--------------------------------------

Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/377#issuecomment-77556909
  
    @sirpkt thanks for your work.
    I have a couple of comments.
    First, ```CSVFILE_DELIMITER``` is deprecated. If there are any bugs related 
to ```TEXT_DELIMITER```, we must fix it instead of using 
```CSVFILE_DELIMITER```.
    In addition, I have a question about the following your description.
    ```
    Without this, when two tables, one is 'csvfile.delimiter'=';' and the other 
has no option, are used in the same query, one is set as TEXT_DELIMITER = ';' 
and the other is set as TEXT_DELIMITER = '|' and compete for the meta option of 
the Execution block.
    ```
    If only one table is configured with an explicitly different delimiter, two 
tables must be parsed with different delimiters. If I misunderstand anything, 
please let me know.
    
    Second, it is hard to unserstand for me why ```CSVFILE_DELIMITER``` or 
```TEXT_DELIMITER``` is required to store intermediate results. As you know, 
the default file type of intermediate results is ```RAW```, not ```CSV``` or 
```TEXT```.
    Would you mind explaining more detailed backgrounds?


> Incorrect handling of tables with custom delimiter when their data contain '|'
> ------------------------------------------------------------------------------
>
>                 Key: TAJO-1339
>                 URL: https://issues.apache.org/jira/browse/TAJO-1339
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Keuntae Park
>            Assignee: Keuntae Park
>
> With the table data
> {code}
> 1;a;1.1
> 2;a|b;2.4
> 3;b|c|d;3.2
> {code}
> and external table declaration
> {code}
> create external table delimiter (id int, name text, score float) using csv
> with ('csvfile.delimiter'=';') location 'xxx';
> {code}
> , I got the following incorrect query result for query 'select name, score 
> from delimiter'
> {code}
> name,score
> -------------------------------
> a,1.1
> a,null
> b,null
> {code}
> It looks like '|' in name column is recognized as delimiter.
> As I inspect the code,
> table meta information like 'csvfile.delimiter' is only valid on leaf scan 
> operation and all the other operations (including making intermediate data 
> and materialize query result) assumes that delimiter is 
> DEFAULT_FIELD_DELIMITER, which is '|'.
> Hence, if the plan has the process of making intermediate data, 
> it handles '|' in the data as a delimiter even though it is not actually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TAJO-1339) Incorrect handling of tables with custom delimiter when their data contain '|'

Reply via email to