Re: SparkContext

Pat Ferrel Wed, 28 May 2014 11:53:34 -0700

Great, I see it today.

This was done with a PR, did you have to use the two local repos method to 
merge and push to git.wp at apache?


On May 28, 2014, at 10:38 AM, Dmitriy Lyubimov <[email protected]> wrote:

Pat,

(1) pick the latest master with M-1529. It does abstraction you are
referring to.

(2) mahoutSparkContext() method that creates DistributedContext accepts
sparkConf parameter. That's where you could setup whatever additional spark
properties or parameters you want to set up . Mahout context overrides only
the serializer and something else (that i don't yet remember at this point)

BTW additional parameters mechanism is an issue for shell. it creates a
context before there's a chance for it to be tweaked. Although shell can
always re-create implicit val dcs = mahoutSparkContext()...

-d


On Wed, May 28, 2014 at 10:15 AM, Pat Ferrel <[email protected]> wrote:

> Actually I’m working on read/write store stuff and that is why I need to
> access the SparkContext. I mainly want a way to extend the conf before the
> context is created. I can do it now but it’s not as pretty as I’d like.
> There will be drivers that call DSL stuff or perform other non-DSL
> operations and they may need to mess with the conf before the context is
> created.
> 
> The driver is currently something like this:
> 
> abstract class MahoutDriver {
>  protected var mc: SparkContext = _
>  protected def start(masterUrl: String, appName: String,
>            customJars:Traversable[String] = Traversable.empty[String]) :
> Unit = {
>    mc = mahoutSparkContext(masterUrl, appName, customJars)
>  }
> 
>  protected def stop: Unit = {
>    mc.stop
>  }
> 
>  protected def process: Unit
>  def main(args: Array[String]): Unit
> }
> 
> 
> So I put the conf changes in the child’s .start like this
> 
>  override def start(masterUrl: String = options.master, appName: String =
> options.appName,
>                      customJars:Traversable[String] =
> Traversable.empty[String]): Unit = {
>    System.setProperty("spark.kryo.referenceTracking", "false")
>    System.setProperty("spark.kryoserializer.buffer.mb", "100")
>    System.setProperty("spark.executor.memory", "2g”)// todo: need a way
> to calculate these or let the user pass them in.
>    super.start(masterUrl, appName, customJars)
>  }
> 
> This would be a lot cleaner if the conf were separate from the context
> creation. The wrap is not necessary for this, just a slight refactoring of
> the sparkbinding but while we are at it maybe a wrapper is worth
> considering.
> 
> Not sure what you mean by data store. The SparkContext is required to read
> or write to files but any supported H2 URIs will work and H2 supports
> several Schemes. Also you may be reading from HDFS and writing to a DB,
> which would not use the SparkContext. So it doesn’t seem like a job conf
> type thing let alone a MahoutContext thing.
> 
> On May 28, 2014, at 9:31 AM, Saikat Kanjilal <[email protected]> wrote:
> 
> I like this idea, however my question would be what would be the benefit
> of creating a wrapper context , is it to get access to all of the internal
> domain/config related objects within mahout.  Also should the MahoutContext
> have some relationship to the data store where the data is headed?
> 
>> From: [email protected]
>> Subject: SparkContext
>> Date: Wed, 28 May 2014 09:07:19 -0700
>> To: [email protected]
>> 
>> For purposes outside the DSL it seems we need to wrap the SparkContext
> with something like a MahoutContext. The current sparkBindings object looks
> pretty DSL specific. It sets some kyro properties but these need to be
> accessible to code outside the DSL. I’ve been creating a raw SparkContext
> and passing around the ubiquitous “sc”, which works but is this the way we
> should be doing this?
> 
> 
>

Re: SparkContext

Reply via email to