Re: [VOTE] Release Spark 2.4.6 (RC8)

Holden Karau Wed, 03 Jun 2020 12:50:37 -0700

If this is something we expect to mostly impact new users I think we can
push them towards Spark 3 instead of introducing a behaviour change in 2.4.6


On Wed, Jun 3, 2020 at 12:34 PM Mridul Muralidharan <mri...@gmail.com>
wrote:

>
>   Is this a behavior change in 2.4.x from earlier version ?
> Or are we proposing to introduce  a functionality to help with adoption ?
>
> Regards,
> Mridul
>
>
> On Wed, Jun 3, 2020 at 10:32 AM Xiao Li <gatorsm...@gmail.com> wrote:
>
>> Yes. Spark 3.0 RC2 works well.
>>
>> I think the current behavior in Spark 2.4 affects the adoption,
>> especially for the new users who want to try Spark in their local
>> environment.
>>
>> It impacts all our built-in clients, like Scala Shell and PySpark. Should
>> we consider back-porting it to 2.4?
>>
>> Although this fixes the bug, it will also introduce the behavior change.
>> We should publicly document it and mention it in the release note. Let us
>> review it more carefully and understand the risk and impact.
>>
>> Thanks,
>>
>> Xiao
>>
>> Nicholas Chammas <nicholas.cham...@gmail.com> 于2020年6月3日周三 上午10:12写道：
>>
>>> I believe that was fixed in 3.0 and there was a decision not to backport
>>> the fix: SPARK-31170 <https://issues.apache.org/jira/browse/SPARK-31170>
>>>
>>> On Wed, Jun 3, 2020 at 1:04 PM Xiao Li <gatorsm...@gmail.com> wrote:
>>>
>>>> Just downloaded it in my local macbook. Trying to create a table using
>>>> the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir"
>>>> does not take an effect. It is trying to create a directory in
>>>> "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have
>>>> any of you hit the same issue?
>>>>
>>>> C02XT0U7JGH5:bin lixiao$ ./pyspark --conf
>>>> spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"
>>>>
>>>> Python 2.7.16 (default, Jan 27 2020, 04:46:15)
>>>>
>>>> [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
>>>>
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>
>>>> 20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>>
>>>> Using Spark's default log4j profile:
>>>> org/apache/spark/log4j-defaults.properties
>>>>
>>>> Setting default log level to "WARN".
>>>>
>>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>>> setLogLevel(newLevel).
>>>>
>>>> Welcome to
>>>>
>>>>       ____              __
>>>>
>>>>      / __/__  ___ _____/ /__
>>>>
>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>
>>>>    /__ / .__/\_,_/_/ /_/\_\   version 2.4.6
>>>>
>>>>       /_/
>>>>
>>>>
>>>> Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)
>>>>
>>>> SparkSession available as 'spark'.
>>>>
>>>> >>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)
>>>>
>>>>
>>>> +-----------------------+-------------------------------------------------+
>>>>
>>>> |key                    |value
>>>>     |
>>>>
>>>>
>>>> +-----------------------+-------------------------------------------------+
>>>>
>>>>
>>>> |spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|
>>>>
>>>>
>>>> +-----------------------+-------------------------------------------------+
>>>>
>>>>
>>>> >>> spark.sql("create table t1 (col1 int)")
>>>>
>>>> 20/06/03 09:56:29 WARN HiveMetaStore: Location:
>>>> file:/user/hive/warehouse/t1 specified for non-external table:t1
>>>>
>>>> Traceback (most recent call last):
>>>>
>>>>   File "<stdin>", line 1, in <module>
>>>>
>>>>   File
>>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py",
>>>> line 767, in sql
>>>>
>>>>     return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>>>>
>>>>   File
>>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
>>>> line 1257, in __call__
>>>>
>>>>   File
>>>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py",
>>>> line 69, in deco
>>>>
>>>>     raise AnalysisException(s.split(': ', 1)[1], stackTrace)
>>>>
>>>> pyspark.sql.utils.AnalysisException:
>>>> u'org.apache.hadoop.hive.ql.metadata.HiveException:
>>>> MetaException(message:file:/user/hive/warehouse/t1 is not a directory or
>>>> unable to create one);'
>>>>
>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2020年6月3日周三 上午9:18写道：
>>>>
>>>>> +1
>>>>>
>>>>> Bests,
>>>>> Dongjoon
>>>>>
>>>>> On Wed, Jun 3, 2020 at 5:59 AM Tom Graves <tgraves...@yahoo.com.invalid>
>>>>> wrote:
>>>>>
>>>>>>  +1
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <
>>>>>> hol...@pigscanfly.ca> wrote:
>>>>>>
>>>>>>
>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>> version 2.4.6.
>>>>>>
>>>>>> The vote is open until June 5th at 9AM PST and passes if a majority
>>>>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>>
>>>>>> [ ] +1 Release this package as Apache Spark 2.4.6
>>>>>> [ ] -1 Do not release this package because ...
>>>>>>
>>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>>
>>>>>> There are currently no issues targeting 2.4.6 (try project = SPARK
>>>>>> AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In
>>>>>> Progress"))
>>>>>>
>>>>>> The tag to be voted on is v2.4.6-rc8 (commit
>>>>>> 807e0a484d1de767d1f02bd8a622da6450bdf940):
>>>>>> https://github.com/apache/spark/tree/v2.4.6-rc8
>>>>>>
>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>> at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>>>>>>
>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>
>>>>>> The staging repository for this release can be found at:
>>>>>>
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1349/
>>>>>>
>>>>>> The documentation corresponding to this release can be found at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>>>>>>
>>>>>> The list of bug fixes going into 2.4.6 can be found at the following
>>>>>> URL:
>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>>>>>>
>>>>>> This release is using the release script of the tag v2.4.6-rc8.
>>>>>>
>>>>>> FAQ
>>>>>>
>>>>>> =========================
>>>>>> What happened to the other RCs?
>>>>>> =========================
>>>>>>
>>>>>> The parallel maven build caused some flakiness so I wasn't
>>>>>> comfortable releasing them. I backported the fix from the 3.0 branch for
>>>>>> this release. I've got a proposed change to the build script so that we
>>>>>> only push tags when once the build is a success for the future, but it 
>>>>>> does
>>>>>> not block this release.
>>>>>>
>>>>>> =========================
>>>>>> How can I help test this release?
>>>>>> =========================
>>>>>>
>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>> an existing Spark workload and running on this release candidate, then
>>>>>> reporting any regressions.
>>>>>>
>>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>>>> you can add the staging repository to your projects resolvers and test
>>>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>>>> you don't end up building with an out of date RC going forward).
>>>>>>
>>>>>> ===========================================
>>>>>> What should happen to JIRA tickets still targeting 2.4.6?
>>>>>> ===========================================
>>>>>>
>>>>>> The current list of open tickets targeted at 2.4.6 can be found at:
>>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>>>> Version/s" = 2.4.6
>>>>>>
>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>>>> be worked on immediately. Everything else please retarget to an
>>>>>> appropriate release.
>>>>>>
>>>>>> ==================
>>>>>> But my bug isn't fixed?
>>>>>> ==================
>>>>>>
>>>>>> In order to make timely releases, we will typically not hold the
>>>>>> release unless the bug in question is a regression from the previous
>>>>>> release. That being said, if there is something which is a regression
>>>>>> that has not been correctly targeted please ping me or a committer to
>>>>>> help target the issue.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: [VOTE] Release Spark 2.4.6 (RC8)

Reply via email to