[
https://issues.apache.org/jira/browse/GIRAPH-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Reisman updated GIRAPH-214:
-------------------------------
Attachment: GIRAPH-214-5-option1.patch
Hi. I am working on two options for this patch. This compiles but won't run,
and here's why:
When we create a GiraphJob, we place a Configuration into it. It is passed to
Hadoop, which COPIES that Configuration (in a JobConf) and stores it. When we
receive this Configuration back on the Giraph side, it is passed to us from the
Mapper.Context or JobContext Hadoop holds. The Configuration instances these
two Hadoop objects return to us via getConfiguration() are
org.apache.mapred.JobConf in all the Giraph profiles. Even when I attempt to
set GiraphConf to inherit from JobConf, no casting to GiraphConf is allowed at
runtime when these JobConf's arrive at our doorstep as they are JobConf copies
and not GiraphConf's at all any more.
Option 2 (not finished yet) involves having GiraphConf own the Configuration
object rather than inherit from it, and route calls to it. Its a better
approach on the surface, but this involves many instantiations of GiraphConf to
wrap all the different Configuration instances we receive from the Hadoop side
at each entry point back into Giraph code. Given our memory issues, this seems
like a step in the wrong direction just to add some convenience
accessor/mutator methods to Giraph.
Jakob made the point that it is not desirable to place all of our config
definitions into GiraphConf directly since some are domain specific and others
are application specific. He felt it was more clear to allow users to see the
definitions in the local code where their user-set values are read from the
job's Configuration. This means raw calls to getInt, getClass etc. as the
application author needs to extract their custom values. Giving even the
application layer this raw Configuration access from within GiraphConf without
getters and setters seems to negate the whole purpose of GiraphConf again.
Before I go further on this, I would really like to gather some opinions about
whether it is practical to do this at all at this stage in Giraph's
development. Perhaps this should wait for a future JIRA that refactors our
coupling with the Hadoop framework before this can be implemented in a useful
way?
If there's a nice option 3 I have not thought of, please feel free to tell me
your idea or attempt it, most of the grunt work is done for you in the
"option1" patch here already, as is consistent (for now) with the current trunk.
> GiraphJob should have configuration split out of it to be cleaner (GiraphConf)
> ------------------------------------------------------------------------------
>
> Key: GIRAPH-214
> URL: https://issues.apache.org/jira/browse/GIRAPH-214
> Project: Giraph
> Issue Type: Bug
> Reporter: Avery Ching
> Assignee: Eli Reisman
> Priority: Minor
> Attachments: GIRAPH-214-1.patch, GIRAPH-214-2.patch,
> GIRAPH-214-3.patch, GIRAPH-214-4.patch, GIRAPH-214-5-option1.patch
>
>
> Currently all the configuration for Giraph is part of GiraphJob, making
> things messy for GiraphJob.
> It would be better if we added a GiraphConf (similar to Hive) that is
> responsible for handling configuration of the Job.
> i.e.
> public class GiraphJob extends Configuration....
> To simplify config, we should make get/set methods for as many of the
> parameters as possible.
> We are targeting configuration such as
> /**
> * Set the vertex class (required)
> *
> * @param vertexClass Runs vertex computation
> */
> public final void setVertexClass(Class<?> vertexClass) {
> getConfiguration().setClass(VERTEX_CLASS, vertexClass, BasicVertex.class);
> }
> /**
> * Set the vertex input format class (required)
> *
> * @param vertexInputFormatClass Determines how graph is input
> */
> public final void setVertexInputFormatClass(
> Class<?> vertexInputFormatClass) {
> getConfiguration().setClass(VERTEX_INPUT_FORMAT_CLASS,
> vertexInputFormatClass,
> VertexInputFormat.class);
> }
> /**
> * Set the vertex output format class (optional)
> *
> * @param vertexOutputFormatClass Determines how graph is output
> */
> public final void setVertexOutputFormatClass(
> Class<?> vertexOutputFormatClass) {
> getConfiguration().setClass(VERTEX_OUTPUT_FORMAT_CLASS,
> vertexOutputFormatClass,
> VertexOutputFormat.class);
> }
> /**
> * Set the vertex combiner class (optional)
> *
> * @param vertexCombinerClass Determines how vertex messages are combined
> */
> public final void setVertexCombinerClass(Class<?> vertexCombinerClass) {
> getConfiguration().setClass(VERTEX_COMBINER_CLASS,
> vertexCombinerClass,
> VertexCombiner.class);
> }
> /**
> * Set the graph partitioner class (optional)
> *
> * @param graphPartitionerFactoryClass Determines how the graph is
> partitioned
> */
> public final void setGraphPartitionerFactoryClass(
> Class<?> graphPartitionerFactoryClass) {
> getConfiguration().setClass(GRAPH_PARTITIONER_FACTORY_CLASS,
> graphPartitionerFactoryClass,
> GraphPartitionerFactory.class);
> }
> /**
> * Set the vertex resolver class (optional)
> *
> * @param vertexResolverClass Determines how vertex mutations are resolved
> */
> public final void setVertexResolverClass(Class<?> vertexResolverClass) {
> getConfiguration().setClass(VERTEX_RESOLVER_CLASS,
> vertexResolverClass,
> VertexResolver.class);
> }
> /**
> * Set the worker context class (optional)
> *
> * @param workerContextClass Determines what code is executed on a each
> * worker before and after each superstep and computation
> */
> public final void setWorkerContextClass(Class<?> workerContextClass) {
> getConfiguration().setClass(WORKER_CONTEXT_CLASS,
> workerContextClass,
> WorkerContext.class);
> }
> ...etc.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira