Re: Suggestion for SPARK-1825

Patrick Wendell Fri, 25 Jul 2014 19:04:25 -0700

Yeah I agree reflection is the best solution. Whenever we do
reflection we should clearly document in the code which YARN API
version corresponds to which code path. I'm guessing since YARN is
adding new features... we'll just have to do this over time.


- Patrick

On Fri, Jul 25, 2014 at 3:35 PM, Reynold Xin <r...@databricks.com> wrote:
> Actually reflection is probably a better, lighter weight process for this.
> An extra project brings more overhead for something simple.
>
>
>
>
>
> On Fri, Jul 25, 2014 at 3:09 PM, Colin McCabe <cmcc...@alumni.cmu.edu>
> wrote:
>
>> So, I'm leaning more towards using reflection for this.  Maven profiles
>> could work, but it's tough since we have new stuff coming in in 2.4, 2.5,
>> etc.  and the number of profiles will multiply quickly if we have to do it
>> that way.  Reflection is the approach HBase took in a similar situation.
>>
>> best,
>> Colin
>>
>>
>> On Fri, Jul 25, 2014 at 11:23 AM, Colin McCabe <cmcc...@alumni.cmu.edu>
>> wrote:
>>
>> > I have a similar issue with SPARK-1767.  There are basically three ways
>> to
>> > resolve the issue:
>> >
>> > 1. Use reflection to access classes newer than 0.21 (or whatever the
>> > oldest version of Hadoop is that Spark supports)
>> > 2. Add a build variant (in Maven this would be a profile) that deals with
>> > this.
>> > 3. Auto-detect which classes are available and use those.
>> >
>> > #1 is the easiest for end-users, but it can lead to some ugly code.
>> >
>> > #2 makes the code look nicer, but requires some effort on the part of
>> > people building spark.  This can also lead to headaches for IDEs, if
>> people
>> > don't remember to select the new profile.  (For example, in IntelliJ, you
>> > can't see any of the yarn classes when you import the project from Maven
>> > without the YARN profile selected.)
>> >
>> > #3 is something that... I don't know how to do in sbt or Maven.  I've
>> been
>> > told that an antrun task might work here, but it seems like it could get
>> > really tricky.
>> >
>> > Overall, I'd lean more towards #2 here.
>> >
>> > best,
>> > Colin
>> >
>> >
>> > On Tue, Jul 22, 2014 at 12:47 AM, innowireless TaeYun Kim <
>> > taeyun....@innowireless.co.kr> wrote:
>> >
>> >> (I'm resending this mail since it seems that it was not sent. Sorry if
>> >> this
>> >> was already sent.)
>> >>
>> >> Hi,
>> >>
>> >>
>> >>
>> >> A couple of month ago, I made a pull request to fix
>> >> https://issues.apache.org/jira/browse/SPARK-1825.
>> >>
>> >> My pull request is here: https://github.com/apache/spark/pull/899
>> >>
>> >>
>> >>
>> >> But that pull request has problems:
>> >>
>> >> l  It is Hadoop 2.4.0+ only. It won't compile on the versions below it.
>> >>
>> >> l  The related Hadoop API is marked as '@Unstable'.
>> >>
>> >>
>> >>
>> >> Here is an idea to remedy the problems: a new Spark configuration
>> >> variable.
>> >>
>> >> Maybe it can be named as "spark.yarn.submit.crossplatform".
>> >>
>> >> If it is set to "true"(default is false), the related Spark code can use
>> >> the
>> >> hard-coded strings that is the same as the Hadoop API provides, thus
>> >> avoiding compile error on the Hadoop versions below 2.4.0.
>> >>
>> >>
>> >>
>> >> Can someone implement this feature, if this idea is acceptable?
>> >>
>> >> Currently my knowledge on Spark source code and Scala is limited to
>> >> implement it myself.
>> >>
>> >> To the right person, the modification should be trivial.
>> >>
>> >> You can refer to the source code changes of my pull request.
>> >>
>> >>
>> >>
>> >> Thanks.
>> >>
>> >>
>> >>
>> >>
>> >
>>

Re: Suggestion for SPARK-1825

Reply via email to