[ 
https://issues.apache.org/jira/browse/TAJO-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899063#comment-13899063
 ] 

Jihoon Son commented on TAJO-367:
---------------------------------

Sorry for the late response.
In my opinion, this issue still meaningful because we need to support various 
kinds of storages including HDFS or HBase in a cluster. 
For this purpose, I planned that Fragment should be used as an abstracted 
storage unit. Since location information is provided by a generic storage 
handler (TAJO-337), we don't need to maintain the location information in each 
fragment. 
Also, I expect that this refactoring will reduce the code complexity of task 
scheduling. 

> Separate the locality information from Fragment
> -----------------------------------------------
>
>                 Key: TAJO-367
>                 URL: https://issues.apache.org/jira/browse/TAJO-367
>             Project: Tajo
>          Issue Type: Improvement
>          Components: query master, storage, worker
>            Reporter: Jihoon Son
>             Fix For: 0.8-incubating
>
>
> Fragment is designed to represent a portion of the abstracted input source.
> However, since It is currently used for the task scheduling and the task 
> allocation, it includes the locality information as well as the abstraction 
> of the input data.
> The locality information is used only in the task scheduling, and thus the 
> locality information should be separated from Fragment.
> The locality information is used in the task scheduling to assign tasks to 
> workers closes to the data regardless of the kind of the storage layer.
> To consider input data and their locality in the task scheduler, we need to 
> design a new class including a Fragment and the locality information such as 
> FragmentWithHost.
> In this issue, following works should be resolved.
> * Removing the host information from FileFragment
> * Creating a new class FragmentWithHost that contains an instance of the 
> Fragment interface and the locality information consisting of hosts and disk 
> ids
> * Refactoring SubQuery, StorageManager and TaskScheduler to use 
> FragmentWithHost



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to