[
https://issues.apache.org/jira/browse/TAJO-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105047#comment-14105047
]
Mai Hai Thanh commented on TAJO-983:
------------------------------------
Hi Hyunsik,
I noticed the commit of TAJO-992 about 2 hours ago and already looked at it.
If intermediate data can be up to hundreds of mega bytes, it is worth to
implement a non copy approach. I will try it.
Beside, I want to ask about the intermediate files. As I understand, an
intermediate file can be stored in a remote host. So, Tajo's pull server uses
Http protocol to transfer them from the remote host to the local host. Is it
right ? (if this is right, then reading data in local host whenever possible
has a benefit of avoiding Http file transfer. I guess file copy should be
faster than Http file transfer even for local host's transfer. This is another
benefit beside reducing I/O and extra storage)
> Worker should directly read Intermediate data stored in localhost rather than
> fetching
> --------------------------------------------------------------------------------------
>
> Key: TAJO-983
> URL: https://issues.apache.org/jira/browse/TAJO-983
> Project: Tajo
> Issue Type: Bug
> Components: data shuffle
> Reporter: Hyunsik Choi
> Assignee: Mai Hai Thanh
> Attachments: TAJO-983.140820.0.patch.txt
>
>
> Currently, worker always fetches all intermediate via Fetcher and than store
> them in local file system even though some intermediate data already are
> stored in local file system. It is inefficient and causes unnecessary I/O and
> extra storage occupation. We should improve it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)