[ 
https://issues.apache.org/jira/browse/TINKERPOP3-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko A. Rodriguez closed TINKERPOP3-374.
-----------------------------------------
    Resolution: Fixed

> MapReduce implementations should specify what vertex data they need
> -------------------------------------------------------------------
>
>                 Key: TINKERPOP3-374
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-374
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: process
>            Reporter: Matthias Broecheler
>            Assignee: Marko A. Rodriguez
>             Fix For: 3.0.0.GA
>
>
> The map stage takes a Vertex object as argument but the execution framework 
> of the MapReduce job does not know what data from the Vertex is needed during 
> the map phase. For execution frameworks (like Giraph, TinkerGraph) where all 
> that data is held in memory already, that doesn't make a difference but for 
> execution frameworks that need to pull/load/read the data from somewhere 
> (e.g. Hadoop, Fulgora) this could lead to a lot of wasted time.
> For instance, consider a map-reduce job following a PR vertex program to 
> compute some aggregate of the computed PR values. In that case, no edges or 
> other vertex properties are needed - just the single PR property. A Hadoop 
> based implementation would then have to read the entire graph from HDFS 
> instead of just the single value.
> Similar to VertexPrograms, MapReduce should have a method where incident 
> traversals specifying the data needed for the map job are returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to