Re: Should spark-ec2 get its own repo?

Shivaram Venkataraman Fri, 17 Jul 2015 10:59:59 -0700

Some replies inline

On Wed, Jul 15, 2015 at 1:08 AM, Sean Owen <so...@cloudera.com> wrote:


> The code can continue to be a good reference implementation, no matter
> where it lives. In fact, it can be a better more complete one, and
> easier to update.
>
> I agree that ec2/ needs to retain some kind of pointer to the new
> location. Yes, maybe a script as well that does the checkout as you
> say. We have to be careful that the effect here isn't to make people
> think this code is still part of the blessed bits of a Spark release,
> since it isn't. But I suppose the point is that it isn't quite now
> either (isn't tested, isn't fully contained in apache/spark) and
> that's what we're fixing.
>
> I still don't like the idea of using the ASF JIRA for Spark to track
> issues in a separate project, as these kinds of splits are what we're
> trying to get rid of. I think it's a plus to be able to only bother
> with the Github PR/issue system, and not parallel JIRAs as well. I
> also worry that this blurs the line between code that is formally
> tested and blessed in a Spark release, and that which is not. You fix
> an issue in this separate repo and marked it "fixed in Spark 1.5" --
> what does that imply?
>
> I am not sure why the ASF JIRA can be only used to track one set of
artifacts that are packaged and released together. I agree that marking a
fix version as 1.5 for a change in another repo doesn't make a lot of
sense, but we could just not use fix versions for the EC2 issues ?


> I think the issue is people don't like the sense this is getting
> pushed outside the wall, or 'removed' from Spark. On the one hand I
> argue it hasn't really properly been part of Spark -- that's why we
> need this change to happen. But, I also think this is easy to resolve
> other ways: spark-packages.org, the pointer in the repo, prominent
> notes in the wiki, etc.
>
> My concerns are less about it being pushed out etc. For better or worse we
have had EC2 scripts be a part of the Spark distribution from a very early
stage (from version 0.5.0 if my git history reading is correct).  So users
will assume that any error with EC2 scripts belong to the Spark project. In
addition almost all the contributions to the EC2 scripts come from Spark
developers and so keeping the issues in the same mailing list / JIRA seems
natural. This I guess again relates to the question of managing issues for
code that isn't part of the Spark release artifact.

I suggest Shivaram owns this, and that amplab/spark-ec2 is used to
> host? I'm not qualified to help make the new copy or repo admin but
> would be happy to help with the rest, like triaging, if you can give
> me rights to open issues.
>
> I'll create the amplab/spark-ec2 repo over the next couple of days unless
there are more comments on this thread. This will at least alleviate some
of the naming confusion over using a repository in mesos and I'll give
Sean, Nick, Matthew commit access to it. I am still not convinced about
moving the issues over though.

Thanks
Shivaram

Re: Should spark-ec2 get its own repo?

Reply via email to