Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
r point to note is that this is the total time to run a null job, so >> this includes scheduling + task launch + time to send back results etc. >> >> Shivaram >> >> On Fri, Nov 7, 2014 at 9:23 PM, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >&g

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
wrote: > > > On Fri, Nov 7, 2014 at 8:04 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Sounds good. I'm looking forward to tracking improvements in this area. >> >> Also, just to connect some more dots here, I just remembered that there is &

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
Sounds good. I'm looking forward to tracking improvements in this area. Also, just to connect some more dots here, I just remembered that there is currently an initiative to add an IndexedRDD interface. Some interesting use cases mentioned there i

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
> > If, for example, you have a cluster of 100 machines, this means the > scheduler can launch 150 tasks per machine per second. Did you mean 15 tasks per machine per second here? Or alternatively, 10 machines? I don't know of any existing Spark clusters that have a large enough number > of mach

Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
I just watched Kay's talk from 2013 on Sparrow . Is replacing Spark's native scheduler with Sparrow still on the books? The Sparrow repo hasn't been updated recently, and I don't see any JIRA issues about it. It woul

Re: Using partitioning to speed up queries in Shark

2014-11-06 Thread Nicholas Chammas
Did you mean to send this to the user list? This is the dev list, where we discuss things related to development on Spark itself. On Thu, Nov 6, 2014 at 5:01 PM, Gordon Benjamin wrote: > Hi All, > > I'm using Spark/Shark as the foundation for some reporting that I'm doing > and have a customers

Re: JIRA + PR backlog

2014-11-06 Thread Nicholas Chammas
I think better tooling will make it much easier for committers to trim the list of stale JIRA issues and PRs. Convenience enables action. - Spark PR Dashboard : Additional filters for stale PRs or

Re: create_image.sh contains broken hadoop web link

2014-11-05 Thread Nicholas Chammas
Yup, I just stumbled on that. I'll submit a PR to fix that link. Thanks Ted. On Wed, Nov 5, 2014 at 11:13 PM, Ted Yu wrote: > The artifacts are in archive: > http://archive.apache.org/dist/hadoop/common/hadoop-2.4.1/ > > Cheers > > On Nov 5, 2014, at 8:07 PM, Nicholas Cha

Re: create_image.sh contains broken hadoop web link

2014-11-05 Thread Nicholas Chammas
gt; http://search-hadoop.com/m/LgpTk2Pnw6O/andrew+apache+mirror&subj=Re+All+mirrored+download+links+from+the+Apache+Hadoop+site+are+broken > > Cheers > > On Wed, Nov 5, 2014 at 7:36 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> As part of my wor

create_image.sh contains broken hadoop web link

2014-11-05 Thread Nicholas Chammas
As part of my work for SPARK-3821 , I tried building an AMI today using create_image.sh. This line appears to be broken now (it wasn’t a week or

Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Nicholas Chammas
+1 on this proposal. On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu wrote: > Will these maintainers have a cleanup for those pending PRs upon we start > to apply this model? I second Nan's question. I would like to see this initiative drive a reduction in the number of stale PRs we have out there. We

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Nicholas Chammas
ecord >> at the Nürburgring in a 2014 1000hp LaFerrari and somehow forgetting to >> mention that the last record was held by a 2001 Toyota Celica. >> >> - Steve >> >> >> From: Nicholas Chammas >> Date: Wednesday, November 5, 2014 at 15:56 >> To: Stev

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Nicholas Chammas
Steve Nunez, I believe the information behind the links below should address your concerns earlier about Databricks's submission to the Daytona Gray benchmark. On Wed, Nov 5, 2014 at 6:43 PM, Nicholas Chammas wrote: > On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas < >

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Nicholas Chammas
On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: I believe that benchmark has a pending certification on it. See > http://sortbenchmark.org under "Process". > Regarding this comment, Reynold has just announced that this benc

Re: Build fails on master (f90ad5d)

2014-11-04 Thread Nicholas Chammas
omehow being spent in downloading and building > dependencies. > > Anyway, if sbt is supported it would be great to add docs about somewhere, > especially since, as you point out, most devs are using it. > > Thanks for your help. > > Alex > > On Tue, Nov 4, 2014 at 5:42 PM, Nic

Re: Build fails on master (f90ad5d)

2014-11-04 Thread Nicholas Chammas
t; StreamingKMeans in package mllib > [WARNING] import org.apache.spark.mllib.clustering.StreamingKMeans > Are they expected? > > Also, mvn complains about not having zinc. Is this a problem? > > [WARNING] Zinc server is not available at port 3030 - reverting to normal > incremen

Re: Build fails on master (f90ad5d)

2014-11-04 Thread Nicholas Chammas
FWIW, the "official" build instructions are here: https://github.com/apache/spark#building-spark On Tue, Nov 4, 2014 at 5:11 PM, Ted Yu wrote: > I built based on this commit today and the build was successful. > > What command did you use ? > > Cheers > > On Tue, Nov 4, 2014 at 2:08 PM, Alessand

Re: branch-1.2 has been cut

2014-11-03 Thread Nicholas Chammas
Minor question, but when would be the right time to update the default Spark version in the EC2 script? On Mon, Nov 3, 2014 at 3:55 AM, Patrick Wendell wrote: > Hi All, > > I've just cut the rele

Re: Surprising Spark SQL benchmark

2014-11-01 Thread Nicholas Chammas
dera / etc.; we'll share the code on the > list as soon as we're done. > > -Kay > > On Fri, Oct 31, 2014 at 12:45 PM, Nicholas Chammas < > nicholas.cham...@gmail.com > > wrote: > >> I believe that benchmark has a pending certification on it. See >&

Re: Surprising Spark SQL benchmark

2014-11-01 Thread Nicholas Chammas
S3. I could reproduce some of > the data using HiBench but not the web corpus sub sample. As a result, for > all the hard work put into documenting it, it's still hard to reproduce :( > > On Friday, October 31, 2014, Nicholas Chammas > wrote: > >> I believe that bench

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Nicholas Chammas
ark in future > to avoid the stigma of vendor reported benchmarks and publish enough > information and code to let others repeat the exercise easily. > > - Steve > > > > On 10/31/14, 11:30, "Nicholas Chammas" > wrote: > > >Thanks for the response,

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Nicholas Chammas
e benchmark, I'd invite them to > directly involve Spark SQL developers in the future. Until then, I > wouldn't give much credence to this or any other similar vendor > benchmark. > > - Patrick > > On Fri, Oct 31, 2014 at 10:38 AM, Nicholas Chammas > > wrote: >

Surprising Spark SQL benchmark

2014-10-31 Thread Nicholas Chammas
I know we don't want to be jumping at every benchmark someone posts out there, but this one surprised me: http://www.citusdata.com/blog/86-making-postgresql-scale-hadoop-style This benchmark has Spark SQL failing to complete several queries in the TPC-H benchmark. I don't understand much about th

Re: Potential areas for working

2014-10-26 Thread Nicholas Chammas
Have y’all taken a look at these links? - https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-StarterTasks - https://issues.apache.org/jira/browse/SPARK-3740?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(O

Re: Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
So back to my original question... :) If we wanted to post this guide to the user list or to a gist for easy reference, would we rather have Maven or SBT listed? And is there anything else about the steps that should be modified? Nick On Mon, Oct 20, 2014 at 8:25 PM, Sean Owen wrote: > Oh righ

Re: Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
even have to brew install it. Surely SBT isn't in the dev tools even? > I recall I had to install it. I'd be surprised to hear it required > zero setup. > > On Mon, Oct 20, 2014 at 8:04 PM, Nicholas Chammas > wrote: > > Yeah, I would use sbt too, but I thought if I w

Re: Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
y use SBT on Mac and that one doesn't require any setup ... > > > On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> If one were to put together a short but comprehensive guide to setting up >> Spark to run locally on

Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
If one were to put together a short but comprehensive guide to setting up Spark to run locally on OS X, would it look like this? # Install Maven. On OS X, we suggest using Homebrew. brew install maven # Set some important Java and Maven environment variables.export JAVA_HOME=$(/usr/libexec/java_ho

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-17 Thread Nicholas Chammas
the repos. the problem > w/this is that there might be a window where changes haven't made it to the > local mirror and tests run against it. more fun stuff to think about... > > now that i have some stats, and a list of all of the times/dates of the > failures, i will

Using Docker to Parallelize Tests

2014-10-17 Thread Nicholas Chammas
https://news.ycombinator.com/item?id=8471812 The parent thread has lots of interesting use cases for Docker, and the linked comment seems most relevant to our testing predicament. I might look into this after I finish something presentable with Packer and our EC2 scripts, but if anyone else is in

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread Nicholas Chammas
On Thu, Oct 16, 2014 at 3:55 PM, shane knapp wrote: > i really, truly hate non-deterministic failures. Amen bruddah.

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread Nicholas Chammas
quency that these happen has decreased > significantly (3 in the past ~18hr). > > seems like the git plugin downgrade has helped relieve the problem, but > hasn't fixed it. i'll be looking in to this more today. > > On Wed, Oct 15, 2014 at 7:05 PM, Nicholas Chammas < &

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread Nicholas Chammas
ilds triggered and no timeouts. :crossestoes: :) > > > > On Wed, Oct 15, 2014 at 2:19 PM, shane knapp > wrote: > > > >> ok, we're up and building... :crossesfingersfortheumpteenthtime: > >> > >> On Wed, Oct 15, 2014 at 1:59 PM, Nicholas Chammas &l

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread Nicholas Chammas
I support this effort. :thumbsup: On Wed, Oct 15, 2014 at 4:52 PM, shane knapp wrote: > i'm going to be downgrading our git plugin (from 2.2.7 to 2.2.2) to see if > that helps w/the git fetch timeouts. > > this will require a short downtime (~20 mins for builds to finish, ~20 mins > to downgrade

Re: new jenkins update + tentative release date

2014-10-13 Thread Nicholas Chammas
t; i set this to 20 minutes... let's see if that helps. > > On Mon, Oct 13, 2014 at 2:48 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Ah, that sucks. Thank you for looking into this. >> >> On Mon, Oct 13, 2014 at 5:43 PM, shane knapp wro

Re: new jenkins update + tentative release date

2014-10-13 Thread Nicholas Chammas
Ah, that sucks. Thank you for looking into this. On Mon, Oct 13, 2014 at 5:43 PM, shane knapp wrote: > On Mon, Oct 13, 2014 at 2:28 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Thanks for doing this work Shane. >> >> So is Jenkins in the n

Re: new jenkins update + tentative release date

2014-10-13 Thread Nicholas Chammas
Thanks for doing this work Shane. So is Jenkins in the new datacenter now? Do you know if the problems with checking out patches from GitHub should be resolved now? Here's an example from the past hour . Nick On

Re: Scalastyle improvements / large code reformatting

2014-10-13 Thread Nicholas Chammas
On Mon, Oct 13, 2014 at 11:57 AM, Patrick Wendell wrote: > That would even work for imports as well, > you'd just have a thing where if anyone modified some imports they > would have to fix all the imports in that file. It's at least worth a > try. > OK, that sounds like a fair compromise. I've

Re: Scalastyle improvements / large code reformatting

2014-10-13 Thread Nicholas Chammas
The arguments against large scale refactorings make sense. Doing them, if at all, during QA cycles or around releases sounds like a promising idea. Coupled with that, would it be useful to implement new rules outside of these potential windows for refactoring in such a way that they report on styl

Re: Trouble running tests

2014-10-10 Thread Nicholas Chammas
est". The "hive/test" part takes the >> longest, so I usually leave that out until just before submitting unless my >> changes are hive specific. >> >> On Thu, Oct 9, 2014 at 11:40 AM, Nicholas Chammas < >> nicholas.cham...@gmail.com >> > wrote

spark-prs and mesos/spark-ec2

2014-10-09 Thread Nicholas Chammas
Does it make sense to point the Spark PR review board to read from mesos/spark-ec2 as well? PRs submitted against that repo may reference Spark JIRAs and need review just like any other Spark PR. Nick

Re: Trouble running tests

2014-10-09 Thread Nicholas Chammas
_RUN_SQL_TESTS needs to be true as well. Those two _... variables set get correctly when tests are run on Jenkins. They’re not meant to be manipulated directly by testers. Did you want to run SQL tests only locally? You can try faking being Jenkins by setting AMPLAB_JENKINS=true before calling run

spark-ec2 can't initialize spark-standalone module

2014-10-08 Thread Nicholas Chammas
This line in setup.sh initializes several modules, which are defined here . # Install / Init module

Re: Extending Scala style checks

2014-10-08 Thread Nicholas Chammas
PARK-3850: Scala style: Disallow trailing spaces <https://issues.apache.org/jira/browse/SPARK-3850>. Nick On Tue, Oct 7, 2014 at 4:45 PM, Nicholas Chammas wrote: > For starters, do we have a list of all the Scala style rules that are > currently not enforced automatically but are likely

Re: Unneeded branches/tags

2014-10-08 Thread Nicholas Chammas
gt; > On Tue, Oct 7, 2014 at 6:27 PM, Reynold Xin wrote: > > Those branches are no longer active. However, I don't think we can delete > > branches from github due to the way ASF mirroring works. I might be wrong > > there. > > > > > > > > On T

Unneeded branches/tags

2014-10-07 Thread Nicholas Chammas
Just curious: Are there branches and/or tags on the repo that we don’t need anymore? What are the scala-2.9 and streaming branches for, for example? And do we still need branches for older versions of Spark that we are not backporting stuff to, like branch-0.5? Nick ​

Re: Extending Scala style checks

2014-10-07 Thread Nicholas Chammas
wrote: > Since we can easily catch the list of all changed files in a PR, I think > we can start with adding the no trailing space check for newly changed > files only? > > > On 10/2/14 9:24 AM, Nicholas Chammas wrote: > >> Yeah, I remember that hell when I added PEP 8 to the

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-06 Thread Nicholas Chammas
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html > > On Sat, Oct 4, 2014 at 7:28 AM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Thanks for posting that script, Patrick. It looks like a good place to >> start. >>

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-04 Thread Nicholas Chammas
would help a lot with random issues around port and filesystem > contention we have for unit tests. > > I'm not sure if the long term place for this would be inside the spark > codebase or a community library or what. But it would definitely be > very valuable to have if so

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-03 Thread Nicholas Chammas
Packer template. That's very cool. I'll be looking into this. Nick On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas wrote: > Thanks for the update, Nate. I'm looking forward to seeing how these > projects turn out. > > David, Packer looks very, very interesting. I

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-02 Thread Nicholas Chammas
ev list once we get on top of our own product release and > the bigtop work > > Nate > > > -Original Message- > From: David Rowe [mailto:davidr...@gmail.com] > Sent: Thursday, October 02, 2014 4:44 PM > To: Nicholas Chammas > Cc: dev; Shivaram Venkataraman >

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-02 Thread Nicholas Chammas
ld be improved by removing unnecessary copies > 3. We could make less frequently used modules like Tachyon, persistent hdfs > not a part of the default setup. > > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42 > > Thanks > Shivaram > > > > > On Sat,

Re: Extending Scala style checks

2014-10-01 Thread Nicholas Chammas
Does anyone know if Scala has something equivalent to autopep8 <https://pypi.python.org/pypi/autopep8>? It would help patch up the existing code base a lot quicker as we add in new style rules. ​ On Wed, Oct 1, 2014 at 9:24 PM, Nicholas Chammas wrote: > Yeah, I remember that hell whe

Re: Extending Scala style checks

2014-10-01 Thread Nicholas Chammas
Oct 1, 2014 at 6:13 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Ah, since there appears to be a built-in rule for end-of-line whitespace, >> Michael and Cheng, y'all should be able to add this in pretty easily. >> >> Nick >> >>

Re: Extending Scala style checks

2014-10-01 Thread Nicholas Chammas
w.scalastyle.org/rules-0.1.0.html > > > > Cheers > > > > On Wed, Oct 1, 2014 at 2:01 PM, Nicholas Chammas < > nicholas.cham...@gmail.com > >> wrote: > > > >> As discussed here <https://github.com/apache/spark/pull/2619>, it > would be

Extending Scala style checks

2014-10-01 Thread Nicholas Chammas
As discussed here , it would be good to extend our Scala style checks to programmatically enforce as many of our style rules as possible. Does anyone know if it's relatively straightforward to enforce additional rules like the "no trailing spaces" rule me

Re: amplab jenkins is down

2014-10-01 Thread Nicholas Chammas
getting there. my guess would be early next week > for the switchover. > > On Wed, Oct 1, 2014 at 12:53 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> On Thu, Sep 4, 2014 at 4:19 PM, shane knapp wrote: >> >>> on a side note, this inc

Re: do MIMA checking before all test cases start?

2014-10-01 Thread Nicholas Chammas
it first. Wouldn't hurt. > > On Thu, Sep 25, 2014 at 6:39 AM, Nicholas Chammas > wrote: > > It might still make sense to make this change if MIMA checks are always > > relatively quick, for the same reason we do style checks first. > > > > On Thu, Sep 25, 2014 a

Re: amplab jenkins is down

2014-10-01 Thread Nicholas Chammas
On Thu, Sep 4, 2014 at 4:19 PM, shane knapp wrote: > on a side note, this incident will be accelerating our plan to move the > entire jenkins infrastructure in to a managed datacenter environment. > this > will be our major push over the next couple of weeks. more details about > this, also, as

thank you for reviewing our patches

2014-09-26 Thread Nicholas Chammas
I recently came across this mailing list post by Linus Torvalds about the value of reviewing even “trivial” patches. The following passages stood out to me: I think that much more important than the patch is the fact that people get used to the notion that th

Re: Spark SQL use of alias in where clause

2014-09-25 Thread Nicholas Chammas
That is correct. Aliases in the SELECT clause can only be referenced in the ORDER BY and HAVING clauses. Otherwise, you'll have to just repeat the statement, like concat() in this case. A more elegant alternative, which is probably not available in Spark SQL yet, is to use Common Table Expressions

Re: do MIMA checking before all test cases start?

2014-09-25 Thread Nicholas Chammas
It might still make sense to make this change if MIMA checks are always relatively quick, for the same reason we do style checks first. On Thu, Sep 25, 2014 at 12:25 AM, Nan Zhu wrote: > yeah, I tried that, but there is always an issue when I ran dev/mima, > > it always gives me some binary comp

Re: Tests and Test Infrastructure

2014-09-14 Thread Nicholas Chammas
I fully support this. A smoothly running test infrastructure helps everybody’s work just flow better. The Jenkins Pull Request Builder is mostly functioning again. However, we are working on a simpler technical pipeline for testing patches, as this plug-in has been a constant source of downtime an

Re: don't trigger tests when only .md files are changed

2014-09-12 Thread Nicholas Chammas
We could still have Jenkins post a message to the effect of “this patch only modifies .md files; no tests will be run”. ​ On Fri, Sep 12, 2014 at 3:48 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Would it make sense to have Jenkins *not* trigger tests when the only >

don't trigger tests when only .md files are changed

2014-09-12 Thread Nicholas Chammas
Would it make sense to have Jenkins *not* trigger tests when the only files that have changed are .md files (example )? Those don’t even need RAT checks, right? I can make this change if it makes sense. Nick ​

Re: Announcing Spark 1.1.0!

2014-09-11 Thread Nicholas Chammas
Nice work everybody! I'm looking forward to trying out this release! On Thu, Sep 11, 2014 at 8:12 PM, Patrick Wendell wrote: > I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is > the second release on the API-compatible 1.X line. It is Spark's > largest release ever, with co

Re: yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

2014-09-10 Thread Nicholas Chammas
I'm looking forward to this. :) Looks like Jenkins is having trouble triggering builds for new commits or after user requests (e.g. ). Hopefully that will be resolved tomorrow. Nick On Tue, Sep 9, 2014 at 5:00 PM, shane knapp wrot

Re: Unit tests in < 5 minutes

2014-09-07 Thread Nicholas Chammas
On Fri, Aug 8, 2014 at 1:12 PM, Reynold Xin wrote: > Nick, > > Would you like to file a ticket to track this? > SPARK-3431 : Parallelize execution of tests > Sub-task: SPARK-3432 : Fix logging of

Re: jenkins failed all tests?

2014-09-07 Thread Nicholas Chammas
Yeah, it feels like Jenkins has become a lot more flaky recently. Or maybe it’s just our tests. Here are some more examples: - https://github.com/apache/spark/pull/2310#issuecomment-54741169 - https://github.com/apache/spark/pull/2313#issuecomment-54752766 Nick ​ On Sun, Sep 7, 2014 at 4:

Re: Scala's Jenkins setup looks neat

2014-09-06 Thread Nicholas Chammas
has been no for security reasons. > > On Saturday, September 6, 2014, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> After reading Erik's email, I found this Scala PR >> <https://github.com/scala/scala/pull/3963> and immediately noticed a few

Scala's Jenkins setup looks neat

2014-09-06 Thread Nicholas Chammas
After reading Erik's email, I found this Scala PR and immediately noticed a few cool things: - Jenkins is hooked directly into GitHub somehow, so you get the "All is well" message in the merge status window, presumably based on the last test stat

trimming unnecessary test output

2014-09-06 Thread Nicholas Chammas
Continuing the discussion started here , I’m wondering if people already know that certain test output is unnecessary and should be trimmed. For example , I see a bunch

Re: amplab jenkins is down

2014-09-05 Thread Nicholas Chammas
Looks like Jenkins is back! lol The poor guy has like a million builds <https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/> to catch up on. On Fri, Sep 5, 2014 at 4:15 PM, Nicholas Chammas wrote: > How's it going? > > It l

Re: amplab jenkins is down

2014-09-05 Thread Nicholas Chammas
for testing are triggering builds. On Fri, Sep 5, 2014 at 1:23 PM, shane knapp wrote: > it's looking like everything except the pull request builders are working. > i'm going to be working on getting this resolved today. > > > On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Ch

Re: amplab jenkins is down

2014-09-05 Thread Nicholas Chammas
;s exactly the behavior i saw earlier, and will be figuring out > first thing tomorrow morning. i bet it's an environment issues on the > slaves. > > > On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Looks like duri

Re: amplab jenkins is down

2014-09-04 Thread Nicholas Chammas
if that fixes things. > > > On Thu, Sep 4, 2014 at 4:56 PM, shane knapp wrote: > >> looking >> >> >> On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>> It appears that our main man is h

Re: amplab jenkins is down

2014-09-04 Thread Nicholas Chammas
4, 2014 at 5:49 PM, shane knapp wrote: > i'd ping the Jenkinsmench... the master was completely offline, so any new > jobs wouldn't have reached it. any jobs that were queued when power was > lost probably started up, but jobs that were running would fail. > > >

Re: amplab jenkins is down

2014-09-04 Thread Nicholas Chammas
Woohoo! Thanks Shane. Do you know if queued PR builds will automatically be picked up? Or do we have to ping the Jenkinmensch manually from each PR? Nick On Thu, Sep 4, 2014 at 5:37 PM, shane knapp wrote: > AND WE'RE UP! > > sorry that this took so long... i'll send out a more detailed expla

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-04 Thread Nicholas Chammas
On Thu, Sep 4, 2014 at 1:50 PM, Gurvinder Singh wrote: > There is a regression when using pyspark to read data > from HDFS. > Could you open a JIRA with a brief repro? We'll look into it. (You could also provide a repro in a separate thread.) Nick

spark-ec2 depends on stuff in the Mesos repo

2014-09-03 Thread Nicholas Chammas
Spawned by this discussion . See these 2 lines in spark_ec2.py: - spark_ec2 L42 - spark_ec2 L566

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Nicholas Chammas
On Wed, Sep 3, 2014 at 3:24 AM, Patrick Wendell wrote: > == What default changes should I be aware of? == > 1. The default value of "spark.io.compression.codec" is now "snappy" > --> Old behavior can be restored by switching to "lzf" > > 2. PySpark now performs external spilling during aggregatio

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-02 Thread Nicholas Chammas
In light of the discussion on SPARK-, I'll revoke my "-1" vote. The issue does not appear to be serious. On Sun, Aug 31, 2014 at 5:14 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > -1: I believe I've found a regression from 1.0.2. The report is ca

Re: hey spark developers! intro from shane knapp, devops engineer @ AMPLab

2014-09-02 Thread Nicholas Chammas
Hi Shane! Thank you for doing the Jenkins upgrade last week. It's nice to know that infrastructure is gonna get some dedicated TLC going forward. Welcome aboard! Nick On Tue, Sep 2, 2014 at 1:35 PM, shane knapp wrote: > so, i had a meeting w/the databricks guys on friday and they recommended

Re: Run the "Big Data Benchmark" for new releases

2014-09-01 Thread Nicholas Chammas
ths of specific tasks, etc). > > > > Matei > > > > On September 1, 2014 at 10:03:20 PM, Nicholas Chammas ( > nicholas.cham...@gmail.com) wrote: > > > > Oh, that's sweet. So, a related question then. > > > > Did those tests pick up the performance issue

Re: Run the "Big Data Benchmark" for new releases

2014-09-01 Thread Nicholas Chammas
t; Hi Nicholas, > > At Databricks we already run https://github.com/databricks/spark-perf for > each release, which is a more comprehensive performance test suite. > > Matei > > On September 1, 2014 at 8:22:05 PM, Nicholas Chammas ( > nicholas.cham...@gmail.com) wrote: >

Run the "Big Data Benchmark" for new releases

2014-09-01 Thread Nicholas Chammas
What do people think of running the Big Data Benchmark (repo ) as part of preparing every new release of Spark? We'd run it just for Spark and effectively use it as another type of test to track any performance progre

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-01 Thread Nicholas Chammas
If this is not a confirmed regression from 1.0.2, I think it's better to report it in a separate thread or JIRA. I believe serious regressions are generally the only reason to block a new release. Otherwise, if this is an old issue, it should be handled separately. 2014년 9월 1일 월요일, chutium님이 작성한

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-08-31 Thread Nicholas Chammas
On Sun, Aug 31, 2014 at 6:38 PM, chutium wrote: > has anyone tried to build it on hadoop.version=2.0.0-mr1-cdh4.3.0 or > hadoop.version=1.0.3-mapr-3.0.3 ? > Is the behavior you're seeing a regression from 1.0.2, or does 1.0.2 have this same problem? Nick

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-08-31 Thread Nicholas Chammas
-1: I believe I've found a regression from 1.0.2. The report is captured in SPARK- . On Sat, Aug 30, 2014 at 6:07 PM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.1.0! > > The tag to b

Re: Handling stale PRs

2014-08-30 Thread Nicholas Chammas
On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell wrote: > it's actually precedurally difficult for us to close pull requests Just an FYI: Seems like the GitHub-sanctioned work-around to having issues-only permissions is to have a second, issues-only repository

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Nicholas Chammas
> Oh darn - I missed this update. GRR, unfortunately I think this means > I'll need to cut a new RC. Thanks for catching this Nick. > > On Fri, Aug 29, 2014 at 10:18 AM, Nicholas Chammas > wrote: > > [Let me know if I should be posting these comments in a different >

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Nicholas Chammas
:55 PM, Patrick Wendell wrote: > Hey Nicholas, > > Thanks for this, we can merge in doc changes outside of the actual > release timeline, so we'll make sure to loop those changes in before > we publish the final 1.1 docs. > > - Patrick > > On Fri, Aug 29, 2014

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Nicholas Chammas
There were several formatting and typographical errors in the SQL docs that I've fixed in this PR . Dunno if we want to roll that into the release. On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell wrote: > Okay I'll plan to add cdh4 binary as well for

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-27 Thread Nicholas Chammas
Looks like we're currently at 1.568 so we should be getting a nice slew of UI tweaks and bug fixes. Neat! On Wed, Aug 27, 2014 at 7:13 PM, shane knapp wrote: > tomorrow morning i will be upgrading jenkins to the latest/greatest > (1.577). > > at 730am, i will put jenkins in to a quiet period, s

Re: Handling stale PRs

2014-08-27 Thread Nicholas Chammas
. > > The source is at https://github.com/databricks/spark-pr-dashboard (pull > requests and issues welcome!) > > On August 27, 2014 at 2:11:41 PM, Nicholas Chammas ( > nicholas.cham...@gmail.com) wrote: > > On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen wrote: > >> L

Re: Handling stale PRs

2014-08-27 Thread Nicholas Chammas
On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen wrote: > Last weekend, I started hacking on a Google App Engine app for helping > with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). > BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue, perhaps? Nick

Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas
p with Spark QA’s credentials in order to allow it to post comments on > issues, etc. > > - Josh > > On August 26, 2014 at 11:38:08 AM, Nicholas Chammas ( > nicholas.cham...@gmail.com) wrote: > > OK, that sounds pretty cool. > > Josh, > > Do you see this a

spark-ec2 1.0.2 creates EC2 cluster at wrong version

2014-08-26 Thread Nicholas Chammas
I downloaded the source code release for 1.0.2 from here and launched an EC2 cluster using spark-ec2. After the cluster finishes launching, I fire up the shell and check the version: scala> sc.version res1: String = 1.0.1 The startup banner also shows the

Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas
c preview > version; if we find this tool useful, I’ll clean it up and open-source the > app so folks can contribute to it. > > - Josh > > On August 26, 2014 at 8:16:46 AM, Nicholas Chammas ( > nicholas.cham...@gmail.com) wrote: > > On Tue, Aug 26, 2014 at 2:02 AM, Patrick We

Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas
On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell wrote: > I'd prefer if we took the approach of politely explaining why in the > current form the patch isn't acceptable and closing it (potentially w/ tips > on how to improve it or narrow the scope). Amen to this. Aiming for such a culture would

<    1   2   3   4   5   6   >