[
https://issues.apache.org/jira/browse/BIGTOP-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204270#comment-14204270
]
jay vyas edited comment on BIGTOP-1366 at 11/10/14 3:06 AM:
------------------------------------------------------------
Hi guys.
*TL;DR : I reviewed it in spark 0.9 and created a driver bash script for
submitting spark jobs to 0.9 since spark-submit isn't available. we might need
to have a couple minor mods to the spark driver to match 0.9 apis SparkContext.
I did this testing in pure bigtop 0.9 VMs. details below: *
Okay, I've cobbled together a "spark submit" type script based on some
templates i found online for bigtop. This will be the way we submit jobs for
*spark 9x*. When we upgrade to spark 1x we can use rj's exact README
directions above.
{noformat}
source /etc/spark/conf/spark-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.71.x86_64/
# system jars:
CLASSPATH=$CLASSPATH:$SPARK_HOME/assembly/lib/*
# app jar:
CLASSPATH=$CLASSPATH:/usr/lib/spark/examples/lib/spark-examples_2.10-0.9.1.jar:/bigtop-home/*jar:/usr/lib/spark/*:/usr/lib/spark/lib/*:/usr/lib/spark/assembly/lib/*
CONFIG_OPTS="-Dspark.master=local
-Dspark.jars=target/sparkwordcount-0.0.1-SNAPSHOT.jar"
$JAVA_HOME/bin/java -cp $CLASSPATH $CONFIG_OPTS
org.apache.spark.examples.SparkPi local 2 2
{noformat}
result:
{noformat}
[vagrant@bigtop1 ~]$ ./submit.sh
Reading zipcode data
Read 30891 zipcode entries
Reading name data
Read 86987 first names and 47819 last names
Reading product data
Read 4 product categories
Generating stores...
Done.
Generating customers...
Done.
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.spark.SparkContext.<init>(Lorg/apache/spark/SparkConf;)V
at
com.github.rnowling.bps.datagenerator.spark.SparkDriver$.main(Driver.scala:45)
at
com.github.rnowling.bps.datagenerator.spark.SparkDriver.main(Driver.scala)
{noformat}
So we will need to possibly refactor the way *SparkContext* is instantiated for
*0.9* api . otherwise looks to work quite well, and the spark driver launches
and gives *great* error messages, for missing *resources/* dir and so on.
which i really like. I did this in bigtop VMs, and just copied the resources/*
into {{bigtop-home}} .
was (Author: jayunit100):
Hi guys. Okay, I've cobbled together a "spark submit" type script based on
some templates i found online for bigtop. This will be the way we submit jobs
for *spark 9x*. When we upgrade to spark 1x we can use rj's exact README
directions above.
{noformat}
source /etc/spark/conf/spark-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.71.x86_64/
# system jars:
CLASSPATH=$CLASSPATH:$SPARK_HOME/assembly/lib/*
# app jar:
CLASSPATH=$CLASSPATH:/usr/lib/spark/examples/lib/spark-examples_2.10-0.9.1.jar:/bigtop-home/*jar:/usr/lib/spark/*:/usr/lib/spark/lib/*:/usr/lib/spark/assembly/lib/*
CONFIG_OPTS="-Dspark.master=local
-Dspark.jars=target/sparkwordcount-0.0.1-SNAPSHOT.jar"
$JAVA_HOME/bin/java -cp $CLASSPATH $CONFIG_OPTS
org.apache.spark.examples.SparkPi local 2 2
{noformat}
result:
{noformat}
[vagrant@bigtop1 ~]$ ./submit.sh
Reading zipcode data
Read 30891 zipcode entries
Reading name data
Read 86987 first names and 47819 last names
Reading product data
Read 4 product categories
Generating stores...
Done.
Generating customers...
Done.
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.spark.SparkContext.<init>(Lorg/apache/spark/SparkConf;)V
at
com.github.rnowling.bps.datagenerator.spark.SparkDriver$.main(Driver.scala:45)
at
com.github.rnowling.bps.datagenerator.spark.SparkDriver.main(Driver.scala)
{noformat}
So we will need to possibly refactor the way *SparkContext* is instantiated for
*0.9* api . otherwise looks to work quite well, and the spark driver launches
and gives *great* error messages, for missing *resources/* dir and so on.
which i really like. I did this in bigtop VMs, and just copied the resources/*
into {{bigtop-home}} .
> Updated, Richer Model for Generating Data for BigPetStore
> ----------------------------------------------------------
>
> Key: BIGTOP-1366
> URL: https://issues.apache.org/jira/browse/BIGTOP-1366
> Project: Bigtop
> Issue Type: Improvement
> Components: blueprints
> Affects Versions: backlog
> Reporter: RJ Nowling
> Assignee: RJ Nowling
> Priority: Minor
> Original Estimate: 8,736h
> Remaining Estimate: 8,736h
>
> BigPetStore uses synthetic data as the basis for its workflow. BPS's current
> model for generating customer data is sufficient for basic testing of the
> Hadoop ecosystem, **but the model is very basic and lacks sufficient
> complexity for embedding interesting patterns into the data**.
> As a result, **more complex, scalable testing such as testing clustering
> algorithms in Mahout on non-trivial data or multidimensional data with
> factors influencing it** is not currently possible.
> Efforts are currently underway to incrementally improve the current model
> (see BIGTOP-1271 and BIGTOP-1272).
> To create a model that can that incorporate **realistic, non-hierarchichal
> patterns** and input data to generate rich customer/transaction data with
> interesting correlations will require a re-imagining of the current model and
> its framework.
> To support the improvements to the model in BigPetStore, I have been working
> on an **alternative ab initio model, developed from scratch**. Since the
> development of a new model involves substantial R&D work with more
> specialized tools (mathematical and plotting libraries), I'm doing the
> current work outside of BPS using the iPython Notebook environment. Due to
> the long time frame, the model will be developed on a separate timeline to
> prevent slowing the development of BPS.
> Once the model has stabilized, I will begin incorporating the model into BPS
> itself. One option is to implement the model in using Scala for clean
> integration with **spark** which is likely to play an increasingly important
> role in the hadoop ecosystem, and thus will be an important part of
> bigpetstore as a test/blueprint app.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)