Actually I’m working on read/write store stuff and that is why I need to access 
the SparkContext. I mainly want a way to extend the conf before the context is 
created. I can do it now but it’s not as pretty as I’d like. There will be 
drivers that call DSL stuff or perform other non-DSL operations and they may 
need to mess with the conf before the context is created.

The driver is currently something like this:

abstract class MahoutDriver {
  protected var mc: SparkContext = _
  protected def start(masterUrl: String, appName: String,
            customJars:Traversable[String] = Traversable.empty[String]) : Unit 
= {
    mc = mahoutSparkContext(masterUrl, appName, customJars)
  }

  protected def stop: Unit = {
    mc.stop
  }

  protected def process: Unit
  def main(args: Array[String]): Unit
}


So I put the conf changes in the child’s .start like this

  override def start(masterUrl: String = options.master, appName: String = 
options.appName,
                      customJars:Traversable[String] = 
Traversable.empty[String]): Unit = {
    System.setProperty("spark.kryo.referenceTracking", "false")
    System.setProperty("spark.kryoserializer.buffer.mb", "100")
    System.setProperty("spark.executor.memory", "2g”)// todo: need a way to 
calculate these or let the user pass them in.
    super.start(masterUrl, appName, customJars)
  }

This would be a lot cleaner if the conf were separate from the context 
creation. The wrap is not necessary for this, just a slight refactoring of the 
sparkbinding but while we are at it maybe a wrapper is worth considering. 

Not sure what you mean by data store. The SparkContext is required to read or 
write to files but any supported H2 URIs will work and H2 supports several 
Schemes. Also you may be reading from HDFS and writing to a DB, which would not 
use the SparkContext. So it doesn’t seem like a job conf type thing let alone a 
MahoutContext thing.

On May 28, 2014, at 9:31 AM, Saikat Kanjilal <[email protected]> wrote:

I like this idea, however my question would be what would be the benefit of 
creating a wrapper context , is it to get access to all of the internal 
domain/config related objects within mahout.  Also should the MahoutContext 
have some relationship to the data store where the data is headed?

> From: [email protected]
> Subject: SparkContext
> Date: Wed, 28 May 2014 09:07:19 -0700
> To: [email protected]
> 
> For purposes outside the DSL it seems we need to wrap the SparkContext with 
> something like a MahoutContext. The current sparkBindings object looks pretty 
> DSL specific. It sets some kyro properties but these need to be accessible to 
> code outside the DSL. I’ve been creating a raw SparkContext and passing 
> around the ubiquitous “sc”, which works but is this the way we should be 
> doing this? 
                                          

Reply via email to