Binding +1 and the vote passes. I’ll upload the release today/this weekend.
On a personal note, I hope everyone is doing as well as possible with pandemic and police violence. I’ve been grappling with the implications of our work as a community. +1s (* binding): Wenchen Fan * Sean Owen * DB Tsai * Prashant Sharma * Mridul Muralidharan * Tom Graves * Dongjoon Hyun * 0s: None -1s: None Comment without vote: Nicholas Chammas Xiao Li On Wed, Jun 3, 2020 at 12:49 PM Holden Karau <hol...@pigscanfly.ca> wrote: > If this is something we expect to mostly impact new users I think we can > push them towards Spark 3 instead of introducing a behaviour change in 2.4.6 > > On Wed, Jun 3, 2020 at 12:34 PM Mridul Muralidharan <mri...@gmail.com> > wrote: > >> >> Is this a behavior change in 2.4.x from earlier version ? >> Or are we proposing to introduce a functionality to help with adoption ? >> >> Regards, >> Mridul >> >> >> On Wed, Jun 3, 2020 at 10:32 AM Xiao Li <gatorsm...@gmail.com> wrote: >> >>> Yes. Spark 3.0 RC2 works well. >>> >>> I think the current behavior in Spark 2.4 affects the adoption, >>> especially for the new users who want to try Spark in their local >>> environment. >>> >>> It impacts all our built-in clients, like Scala Shell and PySpark. >>> Should we consider back-porting it to 2.4? >>> >>> Although this fixes the bug, it will also introduce the behavior change. >>> We should publicly document it and mention it in the release note. Let us >>> review it more carefully and understand the risk and impact. >>> >>> Thanks, >>> >>> Xiao >>> >>> Nicholas Chammas <nicholas.cham...@gmail.com> 于2020年6月3日周三 上午10:12写道: >>> >>>> I believe that was fixed in 3.0 and there was a decision not to >>>> backport the fix: SPARK-31170 >>>> <https://issues.apache.org/jira/browse/SPARK-31170> >>>> >>>> On Wed, Jun 3, 2020 at 1:04 PM Xiao Li <gatorsm...@gmail.com> wrote: >>>> >>>>> Just downloaded it in my local macbook. Trying to create a table using >>>>> the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir" >>>>> does not take an effect. It is trying to create a directory in >>>>> "file:/user/hive/warehouse/t1". I have not done any investigation yet. >>>>> Have >>>>> any of you hit the same issue? >>>>> >>>>> C02XT0U7JGH5:bin lixiao$ ./pyspark --conf >>>>> spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6" >>>>> >>>>> Python 2.7.16 (default, Jan 27 2020, 04:46:15) >>>>> >>>>> [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin >>>>> >>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>> >>>>> 20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop >>>>> library for your platform... using builtin-java classes where applicable >>>>> >>>>> Using Spark's default log4j profile: >>>>> org/apache/spark/log4j-defaults.properties >>>>> >>>>> Setting default log level to "WARN". >>>>> >>>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use >>>>> setLogLevel(newLevel). >>>>> >>>>> Welcome to >>>>> >>>>> ____ __ >>>>> >>>>> / __/__ ___ _____/ /__ >>>>> >>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>> >>>>> /__ / .__/\_,_/_/ /_/\_\ version 2.4.6 >>>>> >>>>> /_/ >>>>> >>>>> >>>>> Using Python version 2.7.16 (default, Jan 27 2020 04:46:15) >>>>> >>>>> SparkSession available as 'spark'. >>>>> >>>>> >>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False) >>>>> >>>>> >>>>> +-----------------------+-------------------------------------------------+ >>>>> >>>>> |key |value >>>>> | >>>>> >>>>> >>>>> +-----------------------+-------------------------------------------------+ >>>>> >>>>> >>>>> |spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6| >>>>> >>>>> >>>>> +-----------------------+-------------------------------------------------+ >>>>> >>>>> >>>>> >>> spark.sql("create table t1 (col1 int)") >>>>> >>>>> 20/06/03 09:56:29 WARN HiveMetaStore: Location: >>>>> file:/user/hive/warehouse/t1 specified for non-external table:t1 >>>>> >>>>> Traceback (most recent call last): >>>>> >>>>> File "<stdin>", line 1, in <module> >>>>> >>>>> File >>>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py", >>>>> line 767, in sql >>>>> >>>>> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) >>>>> >>>>> File >>>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", >>>>> line 1257, in __call__ >>>>> >>>>> File >>>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py", >>>>> line 69, in deco >>>>> >>>>> raise AnalysisException(s.split(': ', 1)[1], stackTrace) >>>>> >>>>> pyspark.sql.utils.AnalysisException: >>>>> u'org.apache.hadoop.hive.ql.metadata.HiveException: >>>>> MetaException(message:file:/user/hive/warehouse/t1 is not a directory or >>>>> unable to create one);' >>>>> >>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2020年6月3日周三 上午9:18写道: >>>>> >>>>>> +1 >>>>>> >>>>>> Bests, >>>>>> Dongjoon >>>>>> >>>>>> On Wed, Jun 3, 2020 at 5:59 AM Tom Graves >>>>>> <tgraves...@yahoo.com.invalid> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> Tom >>>>>>> >>>>>>> On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau < >>>>>>> hol...@pigscanfly.ca> wrote: >>>>>>> >>>>>>> >>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>> version 2.4.6. >>>>>>> >>>>>>> The vote is open until June 5th at 9AM PST and passes if a majority >>>>>>> +1 PMC votes are cast, with a minimum of 3 +1 votes. >>>>>>> >>>>>>> [ ] +1 Release this package as Apache Spark 2.4.6 >>>>>>> [ ] -1 Do not release this package because ... >>>>>>> >>>>>>> To learn more about Apache Spark, please see >>>>>>> http://spark.apache.org/ >>>>>>> >>>>>>> There are currently no issues targeting 2.4.6 (try project = SPARK >>>>>>> AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In >>>>>>> Progress")) >>>>>>> >>>>>>> The tag to be voted on is v2.4.6-rc8 (commit >>>>>>> 807e0a484d1de767d1f02bd8a622da6450bdf940): >>>>>>> https://github.com/apache/spark/tree/v2.4.6-rc8 >>>>>>> >>>>>>> The release files, including signatures, digests, etc. can be found >>>>>>> at: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/ >>>>>>> >>>>>>> Signatures used for Spark RCs can be found in this file: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>> >>>>>>> The staging repository for this release can be found at: >>>>>>> >>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1349/ >>>>>>> >>>>>>> The documentation corresponding to this release can be found at: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/ >>>>>>> >>>>>>> The list of bug fixes going into 2.4.6 can be found at the following >>>>>>> URL: >>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12346781 >>>>>>> >>>>>>> This release is using the release script of the tag v2.4.6-rc8. >>>>>>> >>>>>>> FAQ >>>>>>> >>>>>>> ========================= >>>>>>> What happened to the other RCs? >>>>>>> ========================= >>>>>>> >>>>>>> The parallel maven build caused some flakiness so I wasn't >>>>>>> comfortable releasing them. I backported the fix from the 3.0 branch for >>>>>>> this release. I've got a proposed change to the build script so that we >>>>>>> only push tags when once the build is a success for the future, but it >>>>>>> does >>>>>>> not block this release. >>>>>>> >>>>>>> ========================= >>>>>>> How can I help test this release? >>>>>>> ========================= >>>>>>> >>>>>>> If you are a Spark user, you can help us test this release by taking >>>>>>> an existing Spark workload and running on this release candidate, >>>>>>> then >>>>>>> reporting any regressions. >>>>>>> >>>>>>> If you're working in PySpark you can set up a virtual env and install >>>>>>> the current RC and see if anything important breaks, in the >>>>>>> Java/Scala >>>>>>> you can add the staging repository to your projects resolvers and >>>>>>> test >>>>>>> with the RC (make sure to clean up the artifact cache before/after so >>>>>>> you don't end up building with an out of date RC going forward). >>>>>>> >>>>>>> =========================================== >>>>>>> What should happen to JIRA tickets still targeting 2.4.6? >>>>>>> =========================================== >>>>>>> >>>>>>> The current list of open tickets targeted at 2.4.6 can be found at: >>>>>>> https://issues.apache.org/jira/projects/SPARK and search for >>>>>>> "Target Version/s" = 2.4.6 >>>>>>> >>>>>>> Committers should look at those and triage. Extremely important bug >>>>>>> fixes, documentation, and API tweaks that impact compatibility should >>>>>>> be worked on immediately. Everything else please retarget to an >>>>>>> appropriate release. >>>>>>> >>>>>>> ================== >>>>>>> But my bug isn't fixed? >>>>>>> ================== >>>>>>> >>>>>>> In order to make timely releases, we will typically not hold the >>>>>>> release unless the bug in question is a regression from the previous >>>>>>> release. That being said, if there is something which is a regression >>>>>>> that has not been correctly targeted please ping me or a committer to >>>>>>> help target the issue. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>> >>>>>> -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >