We could also do this, though it would be great if the Hadoop project provided this version number as at least a baseline. It's up to distributors to decide which version they report but I imagine they won't remove stuff that's in the reported version number.
Matei On Jul 27, 2014, at 1:57 PM, Sean Owen <so...@cloudera.com> wrote: > Good idea, although it gets difficult in the context of multiple > distributions. Say change X is not present in version A, but present > in version B. If you depend on X, what version can you look for to > detect it? The distribution will return "A" or "A+X" or somesuch, but > testing for "A" will give an incorrect answer, and the code can't be > expected to look for everyone's "A+X" versions. Actually inspecting > the code is more robust if a bit messier. > > On Sun, Jul 27, 2014 at 9:50 PM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: >> For this particular issue, it would be good to know if Hadoop provides an >> API to determine the Hadoop version. If not, maybe that can be added to >> Hadoop in its next release, and we can check for it with reflection. We >> recently added a SparkContext.version() method in Spark to let you tell the >> version. >> >> Matei >> >> On Jul 27, 2014, at 12:19 PM, Patrick Wendell <pwend...@gmail.com> wrote: >> >>> Hey Ted, >>> >>> We always intend Spark to work with the newer Hadoop versions and >>> encourage Spark users to use the newest Hadoop versions for best >>> performance. >>> >>> We do try to be liberal in terms of supporting older versions as well. >>> This is because many people run older HDFS versions and we want Spark >>> to read and write data from them. So far we've been willing to do this >>> despite some maintenance cost. >>> >>> The reason is that for many users it's very expensive to do a >>> whole-sale upgrade of HDFS, but trying out new versions of Spark is >>> much easier. For instance, some of the largest scale Spark users run >>> fairly old or forked HDFS versions. >>> >>> - Patrick >>> >>> On Sun, Jul 27, 2014 at 12:01 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> Thanks for replying, Patrick. >>>> >>>> The intention of my first email was for utilizing newer hadoop releases for >>>> their bug fixes. I am still looking for clean way of passing hadoop release >>>> version number to individual classes. >>>> Using newer hadoop releases would encourage pushing bug fixes / new >>>> features upstream. Ultimately Spark code would become cleaner. >>>> >>>> Cheers >>>> >>>> On Sun, Jul 27, 2014 at 8:52 AM, Patrick Wendell <pwend...@gmail.com> >>>> wrote: >>>> >>>>> Ted - technically I think you are correct, although I wouldn't >>>>> recommend disabling this lock. This lock is not expensive (acquired >>>>> once per task, as are many other locks already). Also, we've seen some >>>>> cases where Hadoop concurrency bugs ended up requiring multiple fixes >>>>> - concurrency of client access is not well tested in the Hadoop >>>>> codebase since most of the Hadoop tools to not use concurrent access. >>>>> So in general it's good to be conservative in what we expect of the >>>>> Hadoop client libraries. >>>>> >>>>> If you'd like to discuss this further, please fork a new thread, since >>>>> this is a vote thread. Thanks! >>>>> >>>>> On Fri, Jul 25, 2014 at 10:14 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>> HADOOP-10456 is fixed in hadoop 2.4.1 >>>>>> >>>>>> Does this mean that synchronization >>>>>> on HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK can be bypassed for hadoop >>>>>> 2.4.1 ? >>>>>> >>>>>> Cheers >>>>>> >>>>>> >>>>>> On Fri, Jul 25, 2014 at 6:00 PM, Patrick Wendell <pwend...@gmail.com> >>>>> wrote: >>>>>> >>>>>>> The most important issue in this release is actually an ammendment to >>>>>>> an earlier fix. The original fix caused a deadlock which was a >>>>>>> regression from 1.0.0->1.0.1: >>>>>>> >>>>>>> Issue: >>>>>>> https://issues.apache.org/jira/browse/SPARK-1097 >>>>>>> >>>>>>> 1.0.1 Fix: >>>>>>> https://github.com/apache/spark/pull/1273/files (had a deadlock) >>>>>>> >>>>>>> 1.0.2 Fix: >>>>>>> https://github.com/apache/spark/pull/1409/files >>>>>>> >>>>>>> I failed to correctly label this on JIRA, but I've updated it! >>>>>>> >>>>>>> On Fri, Jul 25, 2014 at 5:35 PM, Michael Armbrust >>>>>>> <mich...@databricks.com> wrote: >>>>>>>> That query is looking at "Fix Version" not "Target Version". The fact >>>>>>> that >>>>>>>> the first one is still open is only because the bug is not resolved in >>>>>>>> master. It is fixed in 1.0.2. The second one is partially fixed in >>>>>>> 1.0.2, >>>>>>>> but is not worth blocking the release for. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jul 25, 2014 at 4:23 PM, Nicholas Chammas < >>>>>>>> nicholas.cham...@gmail.com> wrote: >>>>>>>> >>>>>>>>> TD, there are a couple of unresolved issues slated for 1.0.2 >>>>>>>>> < >>>>>>>>> >>>>>>> >>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%201.0.2%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC >>>>>>>>>> . >>>>>>>>> Should they be edited somehow? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jul 25, 2014 at 7:08 PM, Tathagata Das < >>>>>>>>> tathagata.das1...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>> version >>>>>>>>>> 1.0.2. >>>>>>>>>> >>>>>>>>>> This release fixes a number of bugs in Spark 1.0.1. >>>>>>>>>> Some of the notable ones are >>>>>>>>>> - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix >>>>> for >>>>>>>>>> SPARK-1199. The fix was reverted for 1.0.2. >>>>>>>>>> - SPARK-2576: NoClassDefFoundError when executing Spark QL query on >>>>>>>>>> HDFS CSV file. >>>>>>>>>> The full list is at http://s.apache.org/9NJ >>>>>>>>>> >>>>>>>>>> The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e): >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f >>>>>>>>>> >>>>>>>>>> The release files, including signatures, digests, etc can be found >>>>> at: >>>>>>>>>> http://people.apache.org/~tdas/spark-1.0.2-rc1/ >>>>>>>>>> >>>>>>>>>> Release artifacts are signed with the following key: >>>>>>>>>> https://people.apache.org/keys/committer/tdas.asc >>>>>>>>>> >>>>>>>>>> The staging repository for this release can be found at: >>>>>>>>>> >>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1024/ >>>>>>>>>> >>>>>>>>>> The documentation corresponding to this release can be found at: >>>>>>>>>> http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/ >>>>>>>>>> >>>>>>>>>> Please vote on releasing this package as Apache Spark 1.0.2! >>>>>>>>>> >>>>>>>>>> The vote is open until Tuesday, July 29, at 23:00 UTC and passes if >>>>>>>>>> a majority of at least 3 +1 PMC votes are cast. >>>>>>>>>> [ ] +1 Release this package as Apache Spark 1.0.2 >>>>>>>>>> [ ] -1 Do not release this package because ... >>>>>>>>>> >>>>>>>>>> To learn more about Apache Spark, please see >>>>>>>>>> http://spark.apache.org/ >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>