[
https://issues.apache.org/jira/browse/GIRAPH-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434235#comment-13434235
]
Eli Reisman commented on GIRAPH-214:
------------------------------------
I too like the idea of a single repository of internal values since its easy
for new folks to review the cmd-line options in one place, although perhaps
having domain-specific comments replace the in-code definitions so people who
look to that code to find them or how they operate can be pointed to GiraphConf
as well.
There is a bigger problem. Its used all over the place, leaving our side from
GiraphJob and coming back (as a copy of the Configuration or GiraphConf we put
in, not an original) wrapped in a org.apache.hadoop.mapred.JobConf inside the
Mapper.Context passed to GraphMapper and spread around EVERYWHERE from there
via said Context (sometimes in the form of a JobContext.) This is where the
inheritance gets ugly:
The class hierarchy coming out of Hadoop is Configuration (no interface to
route calls from, just a class) and the JobConf inherits from that. When I
attempt to make GiraphConf inherit from JobConf so I can do the cast when we
get the JobConf, it compiles but (as you see in option 1) does not make it past
the runtime and the tests. I was hoping this was because I picked the wrong
JobConf (there's .mapred, .mapreduce, .mappred_v2 stuff in the Hadoop code) but
there's only one JobConfImpl I see in the Hadoop code coming out of .mapred
just as GiraphConf does. So for some reason it won't cast.
If I try to wrap the JobConf rather than inherit, I have to constantly allow
"raw access" to the Conf through a method such as getRawConf() which crops up
anytime a user or anyone else does requires raw access to things like getInt()
setClass() etc for application-specific stuff that should not be defined
internally. This seems to add confusion at the user level which started to get
frustrating and/or defeat the purpose of keeping users and some plumbing out of
the raw access and keep them using getters/setters. I could add an input arg to
give the wrapped instance directly to the places where GraphMapper hands it out
to all the other components inside of a JobContext or Mapper.Context, but if I
let those components get it as they do now (or in the case of some of the IO
stuff if they too get it from Hadoop) then we have to create a bunch of
GiraphConf wrappers on every node, and that seems wasteful too.
I could certainly attempt this if you like, this was option 2. Ideally, I'd
like to figure out why it won't let us just inherit JobConf. I think it's
because its a copy of the "raw Configuration" wrapped in JobConf when it comes
back to us, and not our original GiraphConf any more, but then why can't I copy
on the way back out of Hadoop when my GiraphConf inherits from JobConf and
ought to have that ability (also sets off the compiler or tests.)
I will play with it more today but its very puzzling. And it does compile
without complaint as is!
So basically: group vote on central config options in general (Avery is a
strong 'yes' and I'm a bit on the fence but more 'yes' than 'no', Jakob's a
strong 'no', others say...?) and calling all Hadoop masters -- whats up with
the JobConf?!?
> GiraphJob should have configuration split out of it to be cleaner (GiraphConf)
> ------------------------------------------------------------------------------
>
> Key: GIRAPH-214
> URL: https://issues.apache.org/jira/browse/GIRAPH-214
> Project: Giraph
> Issue Type: Bug
> Reporter: Avery Ching
> Assignee: Eli Reisman
> Priority: Minor
> Attachments: GIRAPH-214-1.patch, GIRAPH-214-2.patch,
> GIRAPH-214-3.patch, GIRAPH-214-4.patch, GIRAPH-214-5-option1.patch,
> GIRAPH-214-6-option1.patch
>
>
> Currently all the configuration for Giraph is part of GiraphJob, making
> things messy for GiraphJob.
> It would be better if we added a GiraphConf (similar to Hive) that is
> responsible for handling configuration of the Job.
> i.e.
> public class GiraphJob extends Configuration....
> To simplify config, we should make get/set methods for as many of the
> parameters as possible.
> We are targeting configuration such as
> /**
> * Set the vertex class (required)
> *
> * @param vertexClass Runs vertex computation
> */
> public final void setVertexClass(Class<?> vertexClass) {
> getConfiguration().setClass(VERTEX_CLASS, vertexClass, BasicVertex.class);
> }
> /**
> * Set the vertex input format class (required)
> *
> * @param vertexInputFormatClass Determines how graph is input
> */
> public final void setVertexInputFormatClass(
> Class<?> vertexInputFormatClass) {
> getConfiguration().setClass(VERTEX_INPUT_FORMAT_CLASS,
> vertexInputFormatClass,
> VertexInputFormat.class);
> }
> /**
> * Set the vertex output format class (optional)
> *
> * @param vertexOutputFormatClass Determines how graph is output
> */
> public final void setVertexOutputFormatClass(
> Class<?> vertexOutputFormatClass) {
> getConfiguration().setClass(VERTEX_OUTPUT_FORMAT_CLASS,
> vertexOutputFormatClass,
> VertexOutputFormat.class);
> }
> /**
> * Set the vertex combiner class (optional)
> *
> * @param vertexCombinerClass Determines how vertex messages are combined
> */
> public final void setVertexCombinerClass(Class<?> vertexCombinerClass) {
> getConfiguration().setClass(VERTEX_COMBINER_CLASS,
> vertexCombinerClass,
> VertexCombiner.class);
> }
> /**
> * Set the graph partitioner class (optional)
> *
> * @param graphPartitionerFactoryClass Determines how the graph is
> partitioned
> */
> public final void setGraphPartitionerFactoryClass(
> Class<?> graphPartitionerFactoryClass) {
> getConfiguration().setClass(GRAPH_PARTITIONER_FACTORY_CLASS,
> graphPartitionerFactoryClass,
> GraphPartitionerFactory.class);
> }
> /**
> * Set the vertex resolver class (optional)
> *
> * @param vertexResolverClass Determines how vertex mutations are resolved
> */
> public final void setVertexResolverClass(Class<?> vertexResolverClass) {
> getConfiguration().setClass(VERTEX_RESOLVER_CLASS,
> vertexResolverClass,
> VertexResolver.class);
> }
> /**
> * Set the worker context class (optional)
> *
> * @param workerContextClass Determines what code is executed on a each
> * worker before and after each superstep and computation
> */
> public final void setWorkerContextClass(Class<?> workerContextClass) {
> getConfiguration().setClass(WORKER_CONTEXT_CLASS,
> workerContextClass,
> WorkerContext.class);
> }
> ...etc.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira