the process i used is outlined here http://mahout.apache.org/developers/github.html
On Wed, May 28, 2014 at 11:52 AM, Pat Ferrel <[email protected]> wrote: > Great, I see it today. > > This was done with a PR, did you have to use the two local repos method to > merge and push to git.wp at apache? > > On May 28, 2014, at 10:38 AM, Dmitriy Lyubimov <[email protected]> wrote: > > Pat, > > (1) pick the latest master with M-1529. It does abstraction you are > referring to. > > (2) mahoutSparkContext() method that creates DistributedContext accepts > sparkConf parameter. That's where you could setup whatever additional spark > properties or parameters you want to set up . Mahout context overrides only > the serializer and something else (that i don't yet remember at this point) > > BTW additional parameters mechanism is an issue for shell. it creates a > context before there's a chance for it to be tweaked. Although shell can > always re-create implicit val dcs = mahoutSparkContext()... > > -d > > > On Wed, May 28, 2014 at 10:15 AM, Pat Ferrel <[email protected]> wrote: > > > Actually I’m working on read/write store stuff and that is why I need to > > access the SparkContext. I mainly want a way to extend the conf before > the > > context is created. I can do it now but it’s not as pretty as I’d like. > > There will be drivers that call DSL stuff or perform other non-DSL > > operations and they may need to mess with the conf before the context is > > created. > > > > The driver is currently something like this: > > > > abstract class MahoutDriver { > > protected var mc: SparkContext = _ > > protected def start(masterUrl: String, appName: String, > > customJars:Traversable[String] = Traversable.empty[String]) : > > Unit = { > > mc = mahoutSparkContext(masterUrl, appName, customJars) > > } > > > > protected def stop: Unit = { > > mc.stop > > } > > > > protected def process: Unit > > def main(args: Array[String]): Unit > > } > > > > > > So I put the conf changes in the child’s .start like this > > > > override def start(masterUrl: String = options.master, appName: String = > > options.appName, > > customJars:Traversable[String] = > > Traversable.empty[String]): Unit = { > > System.setProperty("spark.kryo.referenceTracking", "false") > > System.setProperty("spark.kryoserializer.buffer.mb", "100") > > System.setProperty("spark.executor.memory", "2g”)// todo: need a way > > to calculate these or let the user pass them in. > > super.start(masterUrl, appName, customJars) > > } > > > > This would be a lot cleaner if the conf were separate from the context > > creation. The wrap is not necessary for this, just a slight refactoring > of > > the sparkbinding but while we are at it maybe a wrapper is worth > > considering. > > > > Not sure what you mean by data store. The SparkContext is required to > read > > or write to files but any supported H2 URIs will work and H2 supports > > several Schemes. Also you may be reading from HDFS and writing to a DB, > > which would not use the SparkContext. So it doesn’t seem like a job conf > > type thing let alone a MahoutContext thing. > > > > On May 28, 2014, at 9:31 AM, Saikat Kanjilal <[email protected]> > wrote: > > > > I like this idea, however my question would be what would be the benefit > > of creating a wrapper context , is it to get access to all of the > internal > > domain/config related objects within mahout. Also should the > MahoutContext > > have some relationship to the data store where the data is headed? > > > >> From: [email protected] > >> Subject: SparkContext > >> Date: Wed, 28 May 2014 09:07:19 -0700 > >> To: [email protected] > >> > >> For purposes outside the DSL it seems we need to wrap the SparkContext > > with something like a MahoutContext. The current sparkBindings object > looks > > pretty DSL specific. It sets some kyro properties but these need to be > > accessible to code outside the DSL. I’ve been creating a raw SparkContext > > and passing around the ubiquitous “sc”, which works but is this the way > we > > should be doing this? > > > > > > > >
