[
https://issues.apache.org/jira/browse/TAJO-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629233#comment-14629233
]
ASF GitHub Bot commented on TAJO-1685:
--------------------------------------
GitHub user blrunner opened a pull request:
https://github.com/apache/tajo/pull/631
TAJO-1685: Query fails when using table data which located on local file
system occasionally on fully distributed mode.
I added a message for avoiding users confusion as follows.
```
default> \admin -cluster
Query Master
============
Live Dead Tasks
----- ----- -----
3 0 0
Live QueryMasters
=================
QueryMaster Port Query Heap Status
------------------------- ----- ----- ---------- ----------
tajo-04:28093 28092 0 16000 MB RUNNING
tajo-03:28093 28092 0 16000 MB RUNNING
tajo-05:28093 28092 0 16000 MB RUNNING
Worker
======
Live Dead
----- -----
3 0
Live Workers
============
Worker Port Tasks Mem Disk Heap
Status
------------------------- ----- ----- ---------- ---------- ------------
----------
tajo-04:28091 53960 0 0/1024 0.00/2.00 427/16000 MB RUNNING
tajo-03:28091 34833 0 0/1024 0.00/2.00 418/16000 MB RUNNING
tajo-05:28091 40578 0 0/1024 0.00/2.00 412/16000 MB RUNNING
Dead Workers
============
No Dead Workers
default> \d table1;
table name: default.table1
table uri: file:/home/tajo/hadoop/data.csv
store type: text
number of rows: unknown
volume: 60 B
Options:
'text.delimiter'='|'
schema:
id INT4
name TEXT
score FLOAT4
type TEXT
default> select * from table1;
id, name, score, type
-------------------------------
1, abc, 1.1, a
2, def, 2.3, b
3, ghi, 3.4, c
4, jkl, 4.5, d
5, mno, 5.6, e
(5 rows, 0.029 sec, 60 B selected)
default> select count(*) from table1;
ERROR: The table data should be on all hosts to run TajoWorker or be on
distributed file system. : file:/home/tajo/hadoop/data.csv
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/blrunner/tajo TAJO-1685
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/631.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #631
----
commit 351c5aa59650eda0ef4c6f04e13bd6fe3d3590e5
Author: JaeHwa Jung <[email protected]>
Date: 2015-07-16T05:37:22Z
TAJO-1685: Query fails when using table data which located on local file
system occasionally on fully distributed mode.
----
> Query fails when using table data which located on local file system
> occasionally on fully distributed mode.
> ------------------------------------------------------------------------------------------------------------
>
> Key: TAJO-1685
> URL: https://issues.apache.org/jira/browse/TAJO-1685
> Project: Tajo
> Issue Type: Improvement
> Components: Java Client, SQL Shell
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
>
> Tajo allows that the location of table would be set the path of local file
> system, for example, “file:///home/tajo/xyz”. When querying above table data
> on pseudo distributed mode, the query would finished successfully. Pseudo
> distributed mode for tajo means that TajoMaster and TajoWorker just run on
> the same host. But when querying the data on fully distribute mode, the query
> would failed because the data was’t located on all hosts for running
> TajoWorker. In this case, users would see ambiguous error message as follows.
> {code:xml}
> default> create external table table1 (
> > id int,
> > name text,
> > score float,
> > type text)
> > using text with ('text.delimiter'='|') location
> > 'file:///home/tajo/data.csv'
> > ;
> OK
> default> \d table1;
> table name: default.table1
> table uri: file:///home/tajo/data.csv
> store type: text
> number of rows: unknown
> volume: 60 B
> Options:
> 'text.delimiter'='|'
> schema:
> id INT4
> name TEXT
> score FLOAT4
> type TEXT
> default> select * from table1;
> id, name, score, type
> -------------------------------
> 1, abc, 1.1, a
> 2, def, 2.3, b
> 3, ghi, 3.4, c
> 4, jkl, 4.5, d
> 5, mno, 5.6, e
> (5 rows, 0.081 sec, 60 B selected)
> default> select count(*) from table1;
> ERROR: No error message
> {code}
> It doesn’t seems easy for users to know the cause of the error. We need to
> print a well-defined message for avoiding users confusion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)