[ 
https://issues.apache.org/jira/browse/TAJO-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104953#comment-14104953
 ] 

Mai Hai Thanh commented on TAJO-983:
------------------------------------

Thank [~hyunsik]!
I investigated your interesting approach. However, I think that the effects of 
the two approaches are the same. In case there are multiple file chunks (which 
is almost always the case with not-so-small data), we have to merge them into 
one file, which can be a file to be returned by Fetcher::get() or a 
FileFragment. To merge multiple chunks, copying data is unavoidable. In case 
there is only 1 file chunk to be fetched and this chunk does not represents a 
complete file, we also have to copy the data. In case there is only 1 file 
chunk to be fetched and this chunk represents a complete file (this case 
happens only with very small data), we should theoretically avoid copying and 
use the file directly. Nevertheless, we have to treat this file as an 
exceptional case in later processing code because it is not stored in the 
conventional default folder and with the conventional file name format. Beside, 
because chunk size is by default limited to be only 8 KB, the copying of data 
is not a problem. So, to keep the code clean for ease of maintenance and 
because of the low cost (also, rare case), I prefer to keep the current 
approach.

> Worker should directly read Intermediate data stored in localhost rather than 
> fetching
> --------------------------------------------------------------------------------------
>
>                 Key: TAJO-983
>                 URL: https://issues.apache.org/jira/browse/TAJO-983
>             Project: Tajo
>          Issue Type: Bug
>          Components: data shuffle
>            Reporter: Hyunsik Choi
>            Assignee: Mai Hai Thanh
>         Attachments: TAJO-983.140820.0.patch.txt
>
>
> Currently, worker always fetches all intermediate via Fetcher and than store 
> them in local file system even though some intermediate data already are  
> stored in local file system. It is inefficient and causes unnecessary I/O and 
> extra storage occupation. We should improve it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to