[jira] [Updated] (GIRAPH-214) GiraphJob should have configuration split out of it to be cleaner (GiraphConf)

Eli Reisman (JIRA) Sun, 12 Aug 2012 15:15:40 -0700

     [ 
https://issues.apache.org/jira/browse/GIRAPH-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eli Reisman updated GIRAPH-214:
-------------------------------

    Attachment: GIRAPH-214-5-option1.patch

Hi. I am working on two options for this patch. This compiles but won't run, 
and here's why:

When we create a GiraphJob, we place a Configuration into it. It is passed to 
Hadoop, which COPIES that Configuration (in a JobConf) and stores it. When we 
receive this Configuration back on the Giraph side, it is passed to us from the 
Mapper.Context or JobContext Hadoop holds. The Configuration instances these 
two Hadoop objects return to us via getConfiguration() are 
org.apache.mapred.JobConf in all the Giraph profiles. Even when I attempt to 
set GiraphConf to inherit from JobConf, no casting to GiraphConf is allowed at 
runtime when these JobConf's arrive at our doorstep as they are JobConf copies 
and not GiraphConf's at all any more.

Option 2 (not finished yet) involves having GiraphConf own the Configuration 
object rather than inherit from it, and route calls to it. Its a better 
approach on the surface, but this involves many instantiations of GiraphConf to 
wrap all the different Configuration instances we receive from the Hadoop side 
at each entry point back into Giraph code. Given our memory issues, this seems 
like a step in the wrong direction just to add some convenience 
accessor/mutator methods to Giraph.

Jakob made the point that it is not desirable to place all of our config 
definitions into GiraphConf directly since some are domain specific and others 
are application specific. He felt it was more clear to allow users to see the 
definitions in the local code where their user-set values are read from the 
job's Configuration. This means raw calls to getInt, getClass etc. as the 
application author needs to extract their custom values. Giving even the 
application layer this raw Configuration access from within GiraphConf without 
getters and setters seems to negate the whole purpose of GiraphConf again.

Before I go further on this, I would really like to gather some opinions about 
whether it is practical to do this at all at this stage in Giraph's 
development. Perhaps this should wait for a future JIRA that refactors our 
coupling with the Hadoop framework before this can be implemented in a useful 
way?

If there's a nice option 3 I have not thought of, please feel free to tell me 
your idea or attempt it, most of the grunt work is done for you in the 
"option1" patch here already, as is consistent (for now) with the current trunk.



                
> GiraphJob should have configuration split out of it to be cleaner (GiraphConf)
> ------------------------------------------------------------------------------
>
>                 Key: GIRAPH-214
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-214
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Avery Ching
>            Assignee: Eli Reisman
>            Priority: Minor
>         Attachments: GIRAPH-214-1.patch, GIRAPH-214-2.patch, 
> GIRAPH-214-3.patch, GIRAPH-214-4.patch, GIRAPH-214-5-option1.patch
>
>
> Currently all the configuration for Giraph is part of GiraphJob, making 
> things messy for GiraphJob.
> It would be better if we added a GiraphConf (similar to Hive) that is 
> responsible for handling configuration of the Job.
> i.e.
> public class GiraphJob extends Configuration....
> To simplify config, we should make get/set methods for as many of the 
> parameters as possible.
> We are targeting configuration such as
>   /**
>    * Set the vertex class (required)
>    *
>    * @param vertexClass Runs vertex computation
>    */
>   public final void setVertexClass(Class<?> vertexClass) {
>     getConfiguration().setClass(VERTEX_CLASS, vertexClass, BasicVertex.class);
>   }
>   /**
>    * Set the vertex input format class (required)
>    *
>    * @param vertexInputFormatClass Determines how graph is input
>    */
>   public final void setVertexInputFormatClass(
>       Class<?> vertexInputFormatClass) {
>     getConfiguration().setClass(VERTEX_INPUT_FORMAT_CLASS,
>         vertexInputFormatClass,
>         VertexInputFormat.class);
>   }
>   /**
>    * Set the vertex output format class (optional)
>    *
>    * @param vertexOutputFormatClass Determines how graph is output
>    */
>   public final void setVertexOutputFormatClass(
>       Class<?> vertexOutputFormatClass) {
>     getConfiguration().setClass(VERTEX_OUTPUT_FORMAT_CLASS,
>         vertexOutputFormatClass,
>         VertexOutputFormat.class);
>   }
>   /**
>    * Set the vertex combiner class (optional)
>    *
>    * @param vertexCombinerClass Determines how vertex messages are combined
>    */
>   public final void setVertexCombinerClass(Class<?> vertexCombinerClass) {
>     getConfiguration().setClass(VERTEX_COMBINER_CLASS,
>         vertexCombinerClass,
>         VertexCombiner.class);
>   }
>   /**
>    * Set the graph partitioner class (optional)
>    *
>    * @param graphPartitionerFactoryClass Determines how the graph is 
> partitioned
>    */
>   public final void setGraphPartitionerFactoryClass(
>       Class<?> graphPartitionerFactoryClass) {
>     getConfiguration().setClass(GRAPH_PARTITIONER_FACTORY_CLASS,
>         graphPartitionerFactoryClass,
>         GraphPartitionerFactory.class);
>   }
>   /**
>    * Set the vertex resolver class (optional)
>    *
>    * @param vertexResolverClass Determines how vertex mutations are resolved
>    */
>   public final void setVertexResolverClass(Class<?> vertexResolverClass) {
>     getConfiguration().setClass(VERTEX_RESOLVER_CLASS,
>         vertexResolverClass,
>         VertexResolver.class);
>   }
>   /**
>    * Set the worker context class (optional)
>    *
>    * @param workerContextClass Determines what code is executed on a each
>    *        worker before and after each superstep and computation
>    */
>   public final void setWorkerContextClass(Class<?> workerContextClass) {
>     getConfiguration().setClass(WORKER_CONTEXT_CLASS,
>         workerContextClass,
>         WorkerContext.class);
>   }
> ...etc. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-214) GiraphJob should have configuration split out of it to be cleaner (GiraphConf)

Reply via email to