[jira] [Commented] (TAJO-1685) Query fails when using table data which located on local file system occasionally on fully distributed mode.

ASF GitHub Bot (JIRA) Wed, 15 Jul 2015 22:46:47 -0700

    [ 
https://issues.apache.org/jira/browse/TAJO-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629233#comment-14629233
 ]


ASF GitHub Bot commented on TAJO-1685:
--------------------------------------

GitHub user blrunner opened a pull request:

    https://github.com/apache/tajo/pull/631

    TAJO-1685: Query fails when using table data which located on local file 
system occasionally on fully distributed mode.

    I added a message for avoiding users confusion as follows.
    
    ```
    default> \admin -cluster
    Query Master
    ============
    
    Live  Dead  Tasks
    ----- ----- -----
    3     0     0    
    
    Live QueryMasters
    =================
    
    QueryMaster               Port  Query Heap       Status    
    ------------------------- ----- ----- ---------- ----------
    tajo-04:28093 28092 0     16000 MB   RUNNING   
    tajo-03:28093 28092 0     16000 MB   RUNNING   
    tajo-05:28093 28092 0     16000 MB   RUNNING   
    
    
    Worker
    ======
    
    Live  Dead 
    ----- -----
    3     0    
    
    Live Workers
    ============
    
    Worker                    Port  Tasks Mem        Disk       Heap         
Status    
    ------------------------- ----- ----- ---------- ---------- ------------ 
----------
    tajo-04:28091 53960 0     0/1024     0.00/2.00  427/16000 MB RUNNING   
    tajo-03:28091 34833 0     0/1024     0.00/2.00  418/16000 MB RUNNING   
    tajo-05:28091 40578 0     0/1024     0.00/2.00  412/16000 MB RUNNING   
    
    
    Dead Workers
    ============
    
    No Dead Workers
    
    default> \d table1;
    
    table name: default.table1
    table uri: file:/home/tajo/hadoop/data.csv
    store type: text
    number of rows: unknown
    volume: 60 B
    Options: 
        'text.delimiter'='|'
    
    schema: 
    id  INT4
    name        TEXT
    score       FLOAT4
    type        TEXT
    
    
    default> select * from table1;
    id,  name,  score,  type
    -------------------------------
    1,  abc,  1.1,  a
    2,  def,  2.3,  b
    3,  ghi,  3.4,  c
    4,  jkl,  4.5,  d
    5,  mno,  5.6,  e
    (5 rows, 0.029 sec, 60 B selected)
    
    default> select count(*) from table1;
    ERROR: The table data should be on all hosts to run TajoWorker or be on 
distributed file system. : file:/home/tajo/hadoop/data.csv
    
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/blrunner/tajo TAJO-1685

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #631
    
----
commit 351c5aa59650eda0ef4c6f04e13bd6fe3d3590e5
Author: JaeHwa Jung <[email protected]>
Date:   2015-07-16T05:37:22Z

    TAJO-1685: Query fails when using table data which located on local file 
system occasionally on fully distributed mode.

----


> Query fails when using table data which located on local file system 
> occasionally on fully distributed mode.
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: TAJO-1685
>                 URL: https://issues.apache.org/jira/browse/TAJO-1685
>             Project: Tajo
>          Issue Type: Improvement
>          Components: Java Client, SQL Shell
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>
> Tajo allows that the location of table would be set the path of local file 
> system, for example, “file:///home/tajo/xyz”. When querying above table data 
> on pseudo distributed mode, the query would finished successfully. Pseudo 
> distributed mode for tajo means that TajoMaster and TajoWorker just run on 
> the same host. But when querying the data on fully distribute mode, the query 
> would failed because the data was’t located on all hosts for running 
> TajoWorker. In this case, users would see ambiguous error message as follows.
> {code:xml}
> default> create external table table1 (
> >       id int,
> >       name text,
> >       score float,
> >       type text)
> >       using text with ('text.delimiter'='|') location 
> > 'file:///home/tajo/data.csv'
> > ;
> OK
> default> \d table1;
> table name: default.table1
> table uri: file:///home/tajo/data.csv
> store type: text
> number of rows: unknown
> volume: 60 B
> Options: 
>       'text.delimiter'='|'
> schema: 
> id    INT4
> name  TEXT
> score FLOAT4
> type  TEXT
> default> select * from table1;
> id,  name,  score,  type
> -------------------------------
> 1,  abc,  1.1,  a
> 2,  def,  2.3,  b
> 3,  ghi,  3.4,  c
> 4,  jkl,  4.5,  d
> 5,  mno,  5.6,  e
> (5 rows, 0.081 sec, 60 B selected)
> default> select count(*) from table1;
> ERROR: No error message
> {code}
> It doesn’t seems easy for users to know the cause of the error. We need to 
> print a well-defined message for avoiding users confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TAJO-1685) Query fails when using table data which located on local file system occasionally on fully distributed mode.

Reply via email to