Re: [VOTE] Release Spark 2.4.6 (RC8)

Holden Karau Fri, 05 Jun 2020 10:46:24 -0700

Binding +1 and the vote passes. I’ll upload the release today/this weekend.


On a personal note, I hope everyone is doing as well as possible with
pandemic and police violence. I’ve been grappling with the implications of
our work as a community.

+1s (* binding):
Wenchen Fan *
Sean Owen *
DB Tsai *
Prashant Sharma *
Mridul Muralidharan *
Tom Graves *
Dongjoon Hyun *

0s:
None

-1s:
None

Comment without vote:
Nicholas Chammas
Xiao Li

On Wed, Jun 3, 2020 at 12:49 PM Holden Karau <hol...@pigscanfly.ca> wrote:

> If this is something we expect to mostly impact new users I think we can
> push them towards Spark 3 instead of introducing a behaviour change in 2.4.6
>
> On Wed, Jun 3, 2020 at 12:34 PM Mridul Muralidharan <mri...@gmail.com>
> wrote:
>
>>
>>   Is this a behavior change in 2.4.x from earlier version ?
>> Or are we proposing to introduce  a functionality to help with adoption ?
>>
>> Regards,
>> Mridul
>>
>>
>> On Wed, Jun 3, 2020 at 10:32 AM Xiao Li <gatorsm...@gmail.com> wrote:
>>
>>> Yes. Spark 3.0 RC2 works well.
>>>
>>> I think the current behavior in Spark 2.4 affects the adoption,
>>> especially for the new users who want to try Spark in their local
>>> environment.
>>>
>>> It impacts all our built-in clients, like Scala Shell and PySpark.
>>> Should we consider back-porting it to 2.4?
>>>
>>> Although this fixes the bug, it will also introduce the behavior change.
>>> We should publicly document it and mention it in the release note. Let us
>>> review it more carefully and understand the risk and impact.
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>> Nicholas Chammas <nicholas.cham...@gmail.com> 于2020年6月3日周三 上午10:12写道：
>>>
>>>> I believe that was fixed in 3.0 and there was a decision not to
>>>> backport the fix: SPARK-31170
>>>> <https://issues.apache.org/jira/browse/SPARK-31170>
>>>>
>>>> On Wed, Jun 3, 2020 at 1:04 PM Xiao Li <gatorsm...@gmail.com> wrote:
>>>>
>>>>> Just downloaded it in my local macbook. Trying to create a table using
>>>>> the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir"
>>>>> does not take an effect. It is trying to create a directory in
>>>>> "file:/user/hive/warehouse/t1". I have not done any investigation yet. 
>>>>> Have
>>>>> any of you hit the same issue?
>>>>>
>>>>> C02XT0U7JGH5:bin lixiao$ ./pyspark --conf
>>>>> spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"
>>>>>
>>>>> Python 2.7.16 (default, Jan 27 2020, 04:46:15)
>>>>>
>>>>> [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
>>>>>
>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>
>>>>> 20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>> library for your platform... using builtin-java classes where applicable
>>>>>
>>>>> Using Spark's default log4j profile:
>>>>> org/apache/spark/log4j-defaults.properties
>>>>>
>>>>> Setting default log level to "WARN".
>>>>>
>>>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>>>> setLogLevel(newLevel).
>>>>>
>>>>> Welcome to
>>>>>
>>>>>       ____              __
>>>>>
>>>>>      / __/__  ___ _____/ /__
>>>>>
>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>
>>>>>    /__ / .__/\_,_/_/ /_/\_\   version 2.4.6
>>>>>
>>>>>       /_/
>>>>>
>>>>>
>>>>> Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)
>>>>>
>>>>> SparkSession available as 'spark'.
>>>>>
>>>>> >>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)
>>>>>
>>>>>
>>>>> +-----------------------+-------------------------------------------------+
>>>>>
>>>>> |key                    |value
>>>>>     |
>>>>>
>>>>>
>>>>> +-----------------------+-------------------------------------------------+
>>>>>
>>>>>
>>>>> |spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|
>>>>>
>>>>>
>>>>> +-----------------------+-------------------------------------------------+
>>>>>
>>>>>
>>>>> >>> spark.sql("create table t1 (col1 int)")
>>>>>
>>>>> 20/06/03 09:56:29 WARN HiveMetaStore: Location:
>>>>> file:/user/hive/warehouse/t1 specified for non-external table:t1
>>>>>
>>>>> Traceback (most recent call last):
>>>>>
>>>>>   File "<stdin>", line 1, in <module>
>>>>>
>>>>>   File
>>>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py",
>>>>> line 767, in sql
>>>>>
>>>>>     return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>>>>>
>>>>>   File
>>>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
>>>>> line 1257, in __call__
>>>>>
>>>>>   File
>>>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py",
>>>>> line 69, in deco
>>>>>
>>>>>     raise AnalysisException(s.split(': ', 1)[1], stackTrace)
>>>>>
>>>>> pyspark.sql.utils.AnalysisException:
>>>>> u'org.apache.hadoop.hive.ql.metadata.HiveException:
>>>>> MetaException(message:file:/user/hive/warehouse/t1 is not a directory or
>>>>> unable to create one);'
>>>>>
>>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2020年6月3日周三 上午9:18写道：
>>>>>
>>>>>> +1
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon
>>>>>>
>>>>>> On Wed, Jun 3, 2020 at 5:59 AM Tom Graves
>>>>>> <tgraves...@yahoo.com.invalid> wrote:
>>>>>>
>>>>>>>  +1
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>> On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <
>>>>>>> hol...@pigscanfly.ca> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>> version 2.4.6.
>>>>>>>
>>>>>>> The vote is open until June 5th at 9AM PST and passes if a majority
>>>>>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>>>
>>>>>>> [ ] +1 Release this package as Apache Spark 2.4.6
>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>
>>>>>>> To learn more about Apache Spark, please see
>>>>>>> http://spark.apache.org/
>>>>>>>
>>>>>>> There are currently no issues targeting 2.4.6 (try project = SPARK
>>>>>>> AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In
>>>>>>> Progress"))
>>>>>>>
>>>>>>> The tag to be voted on is v2.4.6-rc8 (commit
>>>>>>> 807e0a484d1de767d1f02bd8a622da6450bdf940):
>>>>>>> https://github.com/apache/spark/tree/v2.4.6-rc8
>>>>>>>
>>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>>> at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>>>>>>>
>>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>>
>>>>>>> The staging repository for this release can be found at:
>>>>>>>
>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1349/
>>>>>>>
>>>>>>> The documentation corresponding to this release can be found at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>>>>>>>
>>>>>>> The list of bug fixes going into 2.4.6 can be found at the following
>>>>>>> URL:
>>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>>>>>>>
>>>>>>> This release is using the release script of the tag v2.4.6-rc8.
>>>>>>>
>>>>>>> FAQ
>>>>>>>
>>>>>>> =========================
>>>>>>> What happened to the other RCs?
>>>>>>> =========================
>>>>>>>
>>>>>>> The parallel maven build caused some flakiness so I wasn't
>>>>>>> comfortable releasing them. I backported the fix from the 3.0 branch for
>>>>>>> this release. I've got a proposed change to the build script so that we
>>>>>>> only push tags when once the build is a success for the future, but it 
>>>>>>> does
>>>>>>> not block this release.
>>>>>>>
>>>>>>> =========================
>>>>>>> How can I help test this release?
>>>>>>> =========================
>>>>>>>
>>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>>> an existing Spark workload and running on this release candidate,
>>>>>>> then
>>>>>>> reporting any regressions.
>>>>>>>
>>>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>>>> the current RC and see if anything important breaks, in the
>>>>>>> Java/Scala
>>>>>>> you can add the staging repository to your projects resolvers and
>>>>>>> test
>>>>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>>>>> you don't end up building with an out of date RC going forward).
>>>>>>>
>>>>>>> ===========================================
>>>>>>> What should happen to JIRA tickets still targeting 2.4.6?
>>>>>>> ===========================================
>>>>>>>
>>>>>>> The current list of open tickets targeted at 2.4.6 can be found at:
>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for
>>>>>>> "Target Version/s" = 2.4.6
>>>>>>>
>>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>>>>> be worked on immediately. Everything else please retarget to an
>>>>>>> appropriate release.
>>>>>>>
>>>>>>> ==================
>>>>>>> But my bug isn't fixed?
>>>>>>> ==================
>>>>>>>
>>>>>>> In order to make timely releases, we will typically not hold the
>>>>>>> release unless the bug in question is a regression from the previous
>>>>>>> release. That being said, if there is something which is a regression
>>>>>>> that has not been correctly targeted please ping me or a committer to
>>>>>>> help target the issue.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>
>>>>>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: [VOTE] Release Spark 2.4.6 (RC8)

Reply via email to