[
https://issues.apache.org/jira/browse/TINKERPOP3-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marko A. Rodriguez closed TINKERPOP3-374.
-----------------------------------------
Resolution: Fixed
> MapReduce implementations should specify what vertex data they need
> -------------------------------------------------------------------
>
> Key: TINKERPOP3-374
> URL: https://issues.apache.org/jira/browse/TINKERPOP3-374
> Project: TinkerPop 3
> Issue Type: Improvement
> Components: process
> Reporter: Matthias Broecheler
> Assignee: Marko A. Rodriguez
> Fix For: 3.0.0.GA
>
>
> The map stage takes a Vertex object as argument but the execution framework
> of the MapReduce job does not know what data from the Vertex is needed during
> the map phase. For execution frameworks (like Giraph, TinkerGraph) where all
> that data is held in memory already, that doesn't make a difference but for
> execution frameworks that need to pull/load/read the data from somewhere
> (e.g. Hadoop, Fulgora) this could lead to a lot of wasted time.
> For instance, consider a map-reduce job following a PR vertex program to
> compute some aggregate of the computed PR values. In that case, no edges or
> other vertex properties are needed - just the single PR property. A Hadoop
> based implementation would then have to read the entire graph from HDFS
> instead of just the single value.
> Similar to VertexPrograms, MapReduce should have a method where incident
> traversals specifying the data needed for the map job are returned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)