[jira] [Commented] (GIRAPH-214) GiraphJob should have configuration split out of it to be cleaner (GiraphConf)

Eli Reisman (JIRA) Tue, 14 Aug 2012 09:30:40 -0700

    [ 
https://issues.apache.org/jira/browse/GIRAPH-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434235#comment-13434235
 ]


Eli Reisman commented on GIRAPH-214:
------------------------------------

I too like the idea of a single repository of internal values since its easy 
for new folks to review the cmd-line options in one place, although perhaps 
having domain-specific comments replace the in-code definitions so people who 
look to that code to find them or how they operate can be pointed to GiraphConf 
as well.

There is a bigger problem. Its used all over the place, leaving our side from 
GiraphJob and coming back (as a copy of the Configuration or GiraphConf we put 
in, not an original) wrapped in a org.apache.hadoop.mapred.JobConf inside the 
Mapper.Context passed to GraphMapper and spread around EVERYWHERE from there 
via said Context (sometimes in the form of a JobContext.) This is where the 
inheritance gets ugly:

The class hierarchy coming out of Hadoop is Configuration (no interface to 
route calls from, just a class) and the JobConf inherits from that. When I 
attempt to make GiraphConf inherit from JobConf so I can do the cast when we 
get the JobConf, it compiles but (as you see in option 1) does not make it past 
the runtime and the tests. I was hoping this was because I picked the wrong 
JobConf (there's .mapred, .mapreduce, .mappred_v2 stuff in the Hadoop code) but 
there's only one JobConfImpl I see in the Hadoop code coming out of .mapred 
just as GiraphConf does. So for some reason it won't cast.

If I try to wrap the JobConf rather than inherit, I have to constantly allow 
"raw access" to the Conf through a method such as getRawConf() which crops up 
anytime a user or anyone else does requires raw access to things like getInt() 
setClass() etc for application-specific stuff that should not be defined 
internally. This seems to add confusion at the user level which started to get 
frustrating and/or defeat the purpose of keeping users and some plumbing out of 
the raw access and keep them using getters/setters. I could add an input arg to 
give the wrapped instance directly to the places where GraphMapper hands it out 
to all the other components inside of a JobContext or Mapper.Context, but if I 
let those components get it as they do now (or in the case of some of the IO 
stuff if they too get it from Hadoop) then we have to create a bunch of 
GiraphConf wrappers on every node, and that seems wasteful too.

I could certainly attempt this if you like, this was option 2. Ideally, I'd 
like to figure out why it won't let us just inherit JobConf. I think it's 
because its a copy of the "raw Configuration" wrapped in JobConf when it comes 
back to us, and not our original GiraphConf any more, but then why can't I copy 
on the way back out of Hadoop when my GiraphConf inherits from JobConf and 
ought to have that ability (also sets off the compiler or tests.)

I will play with it more today but its very puzzling. And it does compile 
without complaint as is!

So basically: group vote on central config options in general (Avery is a 
strong 'yes' and I'm a bit on the fence but more 'yes' than 'no', Jakob's a 
strong 'no', others say...?) and calling all Hadoop masters -- whats up with 
the JobConf?!?

                
> GiraphJob should have configuration split out of it to be cleaner (GiraphConf)
> ------------------------------------------------------------------------------
>
>                 Key: GIRAPH-214
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-214
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Avery Ching
>            Assignee: Eli Reisman
>            Priority: Minor
>         Attachments: GIRAPH-214-1.patch, GIRAPH-214-2.patch, 
> GIRAPH-214-3.patch, GIRAPH-214-4.patch, GIRAPH-214-5-option1.patch, 
> GIRAPH-214-6-option1.patch
>
>
> Currently all the configuration for Giraph is part of GiraphJob, making 
> things messy for GiraphJob.
> It would be better if we added a GiraphConf (similar to Hive) that is 
> responsible for handling configuration of the Job.
> i.e.
> public class GiraphJob extends Configuration....
> To simplify config, we should make get/set methods for as many of the 
> parameters as possible.
> We are targeting configuration such as
>   /**
>    * Set the vertex class (required)
>    *
>    * @param vertexClass Runs vertex computation
>    */
>   public final void setVertexClass(Class<?> vertexClass) {
>     getConfiguration().setClass(VERTEX_CLASS, vertexClass, BasicVertex.class);
>   }
>   /**
>    * Set the vertex input format class (required)
>    *
>    * @param vertexInputFormatClass Determines how graph is input
>    */
>   public final void setVertexInputFormatClass(
>       Class<?> vertexInputFormatClass) {
>     getConfiguration().setClass(VERTEX_INPUT_FORMAT_CLASS,
>         vertexInputFormatClass,
>         VertexInputFormat.class);
>   }
>   /**
>    * Set the vertex output format class (optional)
>    *
>    * @param vertexOutputFormatClass Determines how graph is output
>    */
>   public final void setVertexOutputFormatClass(
>       Class<?> vertexOutputFormatClass) {
>     getConfiguration().setClass(VERTEX_OUTPUT_FORMAT_CLASS,
>         vertexOutputFormatClass,
>         VertexOutputFormat.class);
>   }
>   /**
>    * Set the vertex combiner class (optional)
>    *
>    * @param vertexCombinerClass Determines how vertex messages are combined
>    */
>   public final void setVertexCombinerClass(Class<?> vertexCombinerClass) {
>     getConfiguration().setClass(VERTEX_COMBINER_CLASS,
>         vertexCombinerClass,
>         VertexCombiner.class);
>   }
>   /**
>    * Set the graph partitioner class (optional)
>    *
>    * @param graphPartitionerFactoryClass Determines how the graph is 
> partitioned
>    */
>   public final void setGraphPartitionerFactoryClass(
>       Class<?> graphPartitionerFactoryClass) {
>     getConfiguration().setClass(GRAPH_PARTITIONER_FACTORY_CLASS,
>         graphPartitionerFactoryClass,
>         GraphPartitionerFactory.class);
>   }
>   /**
>    * Set the vertex resolver class (optional)
>    *
>    * @param vertexResolverClass Determines how vertex mutations are resolved
>    */
>   public final void setVertexResolverClass(Class<?> vertexResolverClass) {
>     getConfiguration().setClass(VERTEX_RESOLVER_CLASS,
>         vertexResolverClass,
>         VertexResolver.class);
>   }
>   /**
>    * Set the worker context class (optional)
>    *
>    * @param workerContextClass Determines what code is executed on a each
>    *        worker before and after each superstep and computation
>    */
>   public final void setWorkerContextClass(Class<?> workerContextClass) {
>     getConfiguration().setClass(WORKER_CONTEXT_CLASS,
>         workerContextClass,
>         WorkerContext.class);
>   }
> ...etc. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-214) GiraphJob should have configuration split out of it to be cleaner (GiraphConf)

Reply via email to