[ 
https://issues.apache.org/jira/browse/CRUNCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389886#comment-14389886
 ] 

Micah Whitacre commented on CRUNCH-505:
---------------------------------------

>>Would it be ok if I were to start working on it?

Totally feel free.  

>> If so, do you maybe have some tips on where to start? 

The first step to me would be to validate that if someone did use tachyon as 
the default FS would everything in Crunch work.  So not just for the 
intermediate state but for that plus persistence at the beginning or end.  Then 
make sure we can tweak the Tachyon write type for targets to make sure it goes 
to HDFS.

Like I said the goal would be for Tachyon to be optional and not a required 
part of Crunch.  I haven't dug in so not sure how much that will help/hinder 
your original vision for this task.

> Store intermediate data in memory only using Tachyon
> ----------------------------------------------------
>
>                 Key: CRUNCH-505
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-505
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.12.0
>            Reporter: Ioannis Kerkinos
>            Assignee: Josh Wills
>
> Tachyon is a memory-centric distributed storage system that enables reliable 
> data sharing at memory-speed. If used as the storage for intermediate data 
> (between MR jobs) it should improve performance as you won't have to go to 
> HDFS. In order to do so, the MUST_CACHE write type of Tachyon can be used. 
> This will enable data to be persisted in memory only without going to HDFS. 
> So the intermediate data will be read/written at memory-speed and only the 
> final result will be written in HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to