Re: SparkContext

Dmitriy Lyubimov Wed, 28 May 2014 12:08:55 -0700

the process i used is outlined here
http://mahout.apache.org/developers/github.html



On Wed, May 28, 2014 at 11:52 AM, Pat Ferrel <[email protected]> wrote:

> Great, I see it today.
>
> This was done with a PR, did you have to use the two local repos method to
> merge and push to git.wp at apache?
>
> On May 28, 2014, at 10:38 AM, Dmitriy Lyubimov <[email protected]> wrote:
>
> Pat,
>
> (1) pick the latest master with M-1529. It does abstraction you are
> referring to.
>
> (2) mahoutSparkContext() method that creates DistributedContext accepts
> sparkConf parameter. That's where you could setup whatever additional spark
> properties or parameters you want to set up . Mahout context overrides only
> the serializer and something else (that i don't yet remember at this point)
>
> BTW additional parameters mechanism is an issue for shell. it creates a
> context before there's a chance for it to be tweaked. Although shell can
> always re-create implicit val dcs = mahoutSparkContext()...
>
> -d
>
>
> On Wed, May 28, 2014 at 10:15 AM, Pat Ferrel <[email protected]> wrote:
>
> > Actually I’m working on read/write store stuff and that is why I need to
> > access the SparkContext. I mainly want a way to extend the conf before
> the
> > context is created. I can do it now but it’s not as pretty as I’d like.
> > There will be drivers that call DSL stuff or perform other non-DSL
> > operations and they may need to mess with the conf before the context is
> > created.
> >
> > The driver is currently something like this:
> >
> > abstract class MahoutDriver {
> >  protected var mc: SparkContext = _
> >  protected def start(masterUrl: String, appName: String,
> >            customJars:Traversable[String] = Traversable.empty[String]) :
> > Unit = {
> >    mc = mahoutSparkContext(masterUrl, appName, customJars)
> >  }
> >
> >  protected def stop: Unit = {
> >    mc.stop
> >  }
> >
> >  protected def process: Unit
> >  def main(args: Array[String]): Unit
> > }
> >
> >
> > So I put the conf changes in the child’s .start like this
> >
> >  override def start(masterUrl: String = options.master, appName: String =
> > options.appName,
> >                      customJars:Traversable[String] =
> > Traversable.empty[String]): Unit = {
> >    System.setProperty("spark.kryo.referenceTracking", "false")
> >    System.setProperty("spark.kryoserializer.buffer.mb", "100")
> >    System.setProperty("spark.executor.memory", "2g”)// todo: need a way
> > to calculate these or let the user pass them in.
> >    super.start(masterUrl, appName, customJars)
> >  }
> >
> > This would be a lot cleaner if the conf were separate from the context
> > creation. The wrap is not necessary for this, just a slight refactoring
> of
> > the sparkbinding but while we are at it maybe a wrapper is worth
> > considering.
> >
> > Not sure what you mean by data store. The SparkContext is required to
> read
> > or write to files but any supported H2 URIs will work and H2 supports
> > several Schemes. Also you may be reading from HDFS and writing to a DB,
> > which would not use the SparkContext. So it doesn’t seem like a job conf
> > type thing let alone a MahoutContext thing.
> >
> > On May 28, 2014, at 9:31 AM, Saikat Kanjilal <[email protected]>
> wrote:
> >
> > I like this idea, however my question would be what would be the benefit
> > of creating a wrapper context , is it to get access to all of the
> internal
> > domain/config related objects within mahout.  Also should the
> MahoutContext
> > have some relationship to the data store where the data is headed?
> >
> >> From: [email protected]
> >> Subject: SparkContext
> >> Date: Wed, 28 May 2014 09:07:19 -0700
> >> To: [email protected]
> >>
> >> For purposes outside the DSL it seems we need to wrap the SparkContext
> > with something like a MahoutContext. The current sparkBindings object
> looks
> > pretty DSL specific. It sets some kyro properties but these need to be
> > accessible to code outside the DSL. I’ve been creating a raw SparkContext
> > and passing around the ubiquitous “sc”, which works but is this the way
> we
> > should be doing this?
> >
> >
> >
>
>

Re: SparkContext

Reply via email to