Re: Error with --files

2016-04-14 Thread Benjamin Zaitlen
That fixed it! Thank you! --Ben On Thu, Apr 14, 2016 at 5:53 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Thu, Apr 14, 2016 at 2:14 PM, Benjamin Zaitlen <quasi...@gmail.com> > wrote: > >> spark-submit --master yarn-cluster /home/ubuntu/test_spark.p

Error with --files

2016-04-14 Thread Benjamin Zaitlen
Hi All, I'm trying to use the --files option with yarn: spark-submit --master yarn-cluster /home/ubuntu/test_spark.py --files > /home/ubuntu/localtest.txt#appSees.txt I never see the file in HDFS or in the yarn containers. Am I doing something incorrect ? I'm running spark 1.6.0 Thanks,

Re: 1.5 Build Errors

2015-10-06 Thread Benjamin Zaitlen
Hi All, Sean patiently worked with me in solving this issue. The problem was entirely my fault in settings MAVEN_OPTS env variable was set and was overriding everything. --Ben On Tue, Sep 8, 2015 at 1:37 PM, Benjamin Zaitlen <quasi...@gmail.com> wrote: > Yes, just reran with the

1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
Hi All, I'm trying to build a distribution off of the latest in master and I keep getting errors on MQTT and the build fails. I'm running the build on a m1.large which has 7.5 GB of RAM and no other major processes are running. MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M

Re: 1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
8, 2015 at 1:53 PM, Benjamin Zaitlen <quasi...@gmail.com> > wrote: > > Hi All, > > > > I'm trying to build a distribution off of the latest in master and I keep > > getting errors on MQTT and the build fails. I'm running the build on a > > m1.large which has

Re: 1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
a > PR to change any occurrences of lower recommendations to 3gb. > > On Tue, Sep 8, 2015 at 3:02 PM, Benjamin Zaitlen <quasi...@gmail.com> > wrote: > > Ah, right. Should've caught that. > > > > The docs seem to recommend 2gb. Should that be increased as we

Re: 1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
s -Pyarn -Phive -Phive-thriftserver -Phadoop-2.4 > >> -Dhadoop.version=2.4.0 > > > > > > So the heap size is still 2g even with MAVEN_OPTS set with 4g. I noticed > > that within build/mvn _COMPILE_JVM_OPTS is set to 2g and this is what > > ZINC_OPTS is set to. > >

Re: 1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
hile compiling ? > > Cheers > > On Tue, Sep 8, 2015 at 7:56 AM, Benjamin Zaitlen <quasi...@gmail.com> > wrote: > >> I'm still getting errors with 3g. I've increase to 4g and I'll report >> back >> >> To be clear: >> >> export MAVEN_OPTS=&quo

Submitting Python Applications from Remote to Master

2014-11-14 Thread Benjamin Zaitlen
Hi All, I'm not quite clear on whether submitting a python application to spark standalone on ec2 is possible. Am I reading this correctly: *A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (e.g. Master

Re: iPython notebook ec2 cluster matlabplot not found?

2014-09-29 Thread Benjamin Zaitlen
HI Andy, I built an anaconda/spark AMI a few months ago. I'm still iterating on it so if things break please report them. If you want to give it awhirl: ./spark-ec2 -k my_key -i ~/.ssh/mykey.rsa -a ami-3ecd0c56 The nice thing about anaconda is that it come pre-baked with ipython-notebook,

TimeStamp selection with SparkSQL

2014-09-04 Thread Benjamin Zaitlen
I may have missed this but is it possible to select on datetime in a SparkSQL query jan1 = sqlContext.sql(SELECT * FROM Stocks WHERE datetime = '2014-01-01') Additionally, is there a guide as to what SQL is valid? The guide says, Note that Spark SQL currently uses a very basic SQL parser It

Re: Anaconda Spark AMI

2014-07-12 Thread Benjamin Zaitlen
, 2014 at 11:54 AM, Benjamin Zaitlen quasi...@gmail.com wrote: Hi All, I'm a dev a Continuum and we are developing a fair amount of tooling around Spark. A few days ago someone expressed interest in numpy+pyspark and Anaconda came up as a reasonable solution. I spent a number of hours

Anaconda Spark AMI

2014-07-03 Thread Benjamin Zaitlen
Hi All, I'm a dev a Continuum and we are developing a fair amount of tooling around Spark. A few days ago someone expressed interest in numpy+pyspark and Anaconda came up as a reasonable solution. I spent a number of hours yesterday trying to rework the base Spark AMI on EC2 but sadly was