Re: [VOTE] Apache Spark 2.1.1 (RC3)

Wenchen Fan Mon, 24 Apr 2017 22:23:32 -0700

see https://issues.apache.org/jira/browse/SPARK-19611


On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau <[email protected]> wrote:

> Whats the regression this fixed in 2.1 from 2.0?
>
> On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan <[email protected]>
> wrote:
>
>> IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will
>> only scan all table files only once, and write back the inferred schema to
>> metastore so that we don't need to do the schema inference again.
>>
>> So technically this will introduce a performance regression for the first
>> query, but compared to branch-2.0, it's not performance regression. And
>> this patch fixed a regression in branch-2.1, which can run in branch-2.0.
>> Personally, I think we should keep INFER_AND_SAVE as the default mode.
>>
>> + [Eric], what do you think?
>>
>> On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <[email protected]
>> > wrote:
>>
>>> Thanks for pointing this out, Michael.  Based on the conversation on
>>> the PR
>>> <https://github.com/apache/spark/pull/16944#issuecomment-285529275>
>>> this seems like a risky change to include in a release branch with a
>>> default other than NEVER_INFER.
>>>
>>> +Wenchen?  What do you think?
>>>
>>> On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman <[email protected]>
>>> wrote:
>>>
>>>> We've identified the cause of the change in behavior. It is related to
>>>> the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key
>>>> and its related functionality was absent from our previous build. The
>>>> default setting in the current build was causing Spark to attempt to scan
>>>> all table files during query analysis. Changing this setting to NEVER_INFER
>>>> disabled this operation and resolved the issue we had.
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Apr 20, 2017, at 3:42 PM, Michael Allman <[email protected]>
>>>> wrote:
>>>>
>>>> I want to caution that in testing a build from this morning's
>>>> branch-2.1 we found that Hive partition pruning was not working. We found
>>>> that Spark SQL was fetching all Hive table partitions for a very simple
>>>> query whereas in a build from several weeks ago it was fetching only the
>>>> required partitions. I cannot currently think of a reason for the
>>>> regression outside of some difference between branch-2.1 from our previous
>>>> build and branch-2.1 from this morning.
>>>>
>>>> That's all I know right now. We are actively investigating to find the
>>>> root cause of this problem, and specifically whether this is a problem in
>>>> the Spark codebase or not. I will report back when I have an answer to that
>>>> question.
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Apr 18, 2017, at 11:59 AM, Michael Armbrust <[email protected]>
>>>> wrote:
>>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00
>>>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 2.1.1
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> The tag to be voted on is v2.1.1-rc3
>>>> <https://github.com/apache/spark/tree/v2.1.1-rc3> (2ed19cff2f6ab79
>>>> a718526e5d16633412d8c4dd4)
>>>>
>>>> List of JIRA tickets resolved can be found with this filter
>>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1>
>>>> .
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>>>>
>>>> Release artifacts are signed with the following key:
>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1230/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>>>>
>>>>
>>>> *FAQ*
>>>>
>>>> *How can I help test this release?*
>>>>
>>>> If you are a Spark user, you can help us test this release by taking an
>>>> existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> *What should happen to JIRA tickets still targeting 2.1.1?*
>>>>
>>>> Committers should look at those and triage. Extremely important bug
>>>> fixes, documentation, and API tweaks that impact compatibility should be
>>>> worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>>>>
>>>> *But my bug isn't fixed!??!*
>>>>
>>>> In order to make timely releases, we will typically not hold the
>>>> release unless the bug in question is a regression from 2.1.0.
>>>>
>>>> *What happened to RC1?*
>>>>
>>>> There were issues with the release packaging and as a result was
>>>> skipped.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Reply via email to