Re: SparkContext

Dmitriy Lyubimov Wed, 28 May 2014 12:16:39 -0700

honestly, merging a PR is no different from merging a branch. If you
otherwise know how to merge in Git, you can merge a PR. The only specifics
there is specifying the PR branch correctly.


And closing PRs via messages. Although if you are committing your own PR,
then you can close via github "close" button too.


On Wed, May 28, 2014 at 12:05 PM, Pat Ferrel <[email protected]> wrote:

> OK, I see how to do this with one local. This is going to be very easy to
> mess up so apologies in advance.
>
> On May 28, 2014, at 11:52 AM, Pat Ferrel <[email protected]> wrote:
>
> Great, I see it today.
>
> This was done with a PR, did you have to use the two local repos method to
> merge and push to git.wp at apache?
>
> On May 28, 2014, at 10:38 AM, Dmitriy Lyubimov <[email protected]> wrote:
>
> Pat,
>
> (1) pick the latest master with M-1529. It does abstraction you are
> referring to.
>
> (2) mahoutSparkContext() method that creates DistributedContext accepts
> sparkConf parameter. That's where you could setup whatever additional spark
> properties or parameters you want to set up . Mahout context overrides only
> the serializer and something else (that i don't yet remember at this point)
>
> BTW additional parameters mechanism is an issue for shell. it creates a
> context before there's a chance for it to be tweaked. Although shell can
> always re-create implicit val dcs = mahoutSparkContext()...
>
> -d
>
>
> On Wed, May 28, 2014 at 10:15 AM, Pat Ferrel <[email protected]> wrote:
>
> > Actually I’m working on read/write store stuff and that is why I need to
> > access the SparkContext. I mainly want a way to extend the conf before
> the
> > context is created. I can do it now but it’s not as pretty as I’d like.
> > There will be drivers that call DSL stuff or perform other non-DSL
> > operations and they may need to mess with the conf before the context is
> > created.
> >
> > The driver is currently something like this:
> >
> > abstract class MahoutDriver {
> > protected var mc: SparkContext = _
> > protected def start(masterUrl: String, appName: String,
> >           customJars:Traversable[String] = Traversable.empty[String]) :
> > Unit = {
> >   mc = mahoutSparkContext(masterUrl, appName, customJars)
> > }
> >
> > protected def stop: Unit = {
> >   mc.stop
> > }
> >
> > protected def process: Unit
> > def main(args: Array[String]): Unit
> > }
> >
> >
> > So I put the conf changes in the child’s .start like this
> >
> > override def start(masterUrl: String = options.master, appName: String =
> > options.appName,
> >                     customJars:Traversable[String] =
> > Traversable.empty[String]): Unit = {
> >   System.setProperty("spark.kryo.referenceTracking", "false")
> >   System.setProperty("spark.kryoserializer.buffer.mb", "100")
> >   System.setProperty("spark.executor.memory", "2g”)// todo: need a way
> > to calculate these or let the user pass them in.
> >   super.start(masterUrl, appName, customJars)
> > }
> >
> > This would be a lot cleaner if the conf were separate from the context
> > creation. The wrap is not necessary for this, just a slight refactoring
> of
> > the sparkbinding but while we are at it maybe a wrapper is worth
> > considering.
> >
> > Not sure what you mean by data store. The SparkContext is required to
> read
> > or write to files but any supported H2 URIs will work and H2 supports
> > several Schemes. Also you may be reading from HDFS and writing to a DB,
> > which would not use the SparkContext. So it doesn’t seem like a job conf
> > type thing let alone a MahoutContext thing.
> >
> > On May 28, 2014, at 9:31 AM, Saikat Kanjilal <[email protected]>
> wrote:
> >
> > I like this idea, however my question would be what would be the benefit
> > of creating a wrapper context , is it to get access to all of the
> internal
> > domain/config related objects within mahout.  Also should the
> MahoutContext
> > have some relationship to the data store where the data is headed?
> >
> >> From: [email protected]
> >> Subject: SparkContext
> >> Date: Wed, 28 May 2014 09:07:19 -0700
> >> To: [email protected]
> >>
> >> For purposes outside the DSL it seems we need to wrap the SparkContext
> > with something like a MahoutContext. The current sparkBindings object
> looks
> > pretty DSL specific. It sets some kyro properties but these need to be
> > accessible to code outside the DSL. I’ve been creating a raw SparkContext
> > and passing around the ubiquitous “sc”, which works but is this the way
> we
> > should be doing this?
> >
> >
> >
>
>
>

Re: SparkContext

Reply via email to