see https://issues.apache.org/jira/browse/SPARK-19611
On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > Whats the regression this fixed in 2.1 from 2.0? > > On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan <wenc...@databricks.com> > wrote: > >> IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will >> only scan all table files only once, and write back the inferred schema to >> metastore so that we don't need to do the schema inference again. >> >> So technically this will introduce a performance regression for the first >> query, but compared to branch-2.0, it's not performance regression. And >> this patch fixed a regression in branch-2.1, which can run in branch-2.0. >> Personally, I think we should keep INFER_AND_SAVE as the default mode. >> >> + [Eric], what do you think? >> >> On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <mich...@databricks.com >> > wrote: >> >>> Thanks for pointing this out, Michael. Based on the conversation on >>> the PR >>> <https://github.com/apache/spark/pull/16944#issuecomment-285529275> >>> this seems like a risky change to include in a release branch with a >>> default other than NEVER_INFER. >>> >>> +Wenchen? What do you think? >>> >>> On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman <mich...@videoamp.com> >>> wrote: >>> >>>> We've identified the cause of the change in behavior. It is related to >>>> the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key >>>> and its related functionality was absent from our previous build. The >>>> default setting in the current build was causing Spark to attempt to scan >>>> all table files during query analysis. Changing this setting to NEVER_INFER >>>> disabled this operation and resolved the issue we had. >>>> >>>> Michael >>>> >>>> >>>> On Apr 20, 2017, at 3:42 PM, Michael Allman <mich...@videoamp.com> >>>> wrote: >>>> >>>> I want to caution that in testing a build from this morning's >>>> branch-2.1 we found that Hive partition pruning was not working. We found >>>> that Spark SQL was fetching all Hive table partitions for a very simple >>>> query whereas in a build from several weeks ago it was fetching only the >>>> required partitions. I cannot currently think of a reason for the >>>> regression outside of some difference between branch-2.1 from our previous >>>> build and branch-2.1 from this morning. >>>> >>>> That's all I know right now. We are actively investigating to find the >>>> root cause of this problem, and specifically whether this is a problem in >>>> the Spark codebase or not. I will report back when I have an answer to that >>>> question. >>>> >>>> Michael >>>> >>>> >>>> On Apr 18, 2017, at 11:59 AM, Michael Armbrust <mich...@databricks.com> >>>> wrote: >>>> >>>> Please vote on releasing the following candidate as Apache Spark >>>> version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 >>>> PST and passes if a majority of at least 3 +1 PMC votes are cast. >>>> >>>> [ ] +1 Release this package as Apache Spark 2.1.1 >>>> [ ] -1 Do not release this package because ... >>>> >>>> >>>> To learn more about Apache Spark, please see http://spark.apache.org/ >>>> >>>> The tag to be voted on is v2.1.1-rc3 >>>> <https://github.com/apache/spark/tree/v2.1.1-rc3> (2ed19cff2f6ab79 >>>> a718526e5d16633412d8c4dd4) >>>> >>>> List of JIRA tickets resolved can be found with this filter >>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1> >>>> . >>>> >>>> The release files, including signatures, digests, etc. can be found at: >>>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/ >>>> >>>> Release artifacts are signed with the following key: >>>> https://people.apache.org/keys/committer/pwendell.asc >>>> >>>> The staging repository for this release can be found at: >>>> https://repository.apache.org/content/repositories/orgapachespark-1230/ >>>> >>>> The documentation corresponding to this release can be found at: >>>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/ >>>> >>>> >>>> *FAQ* >>>> >>>> *How can I help test this release?* >>>> >>>> If you are a Spark user, you can help us test this release by taking an >>>> existing Spark workload and running on this release candidate, then >>>> reporting any regressions. >>>> >>>> *What should happen to JIRA tickets still targeting 2.1.1?* >>>> >>>> Committers should look at those and triage. Extremely important bug >>>> fixes, documentation, and API tweaks that impact compatibility should be >>>> worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0. >>>> >>>> *But my bug isn't fixed!??!* >>>> >>>> In order to make timely releases, we will typically not hold the >>>> release unless the bug in question is a regression from 2.1.0. >>>> >>>> *What happened to RC1?* >>>> >>>> There were issues with the release packaging and as a result was >>>> skipped. >>>> >>>> >>>> >>>> >>> >> > > > -- > Cell : 425-233-8271 <(425)%20233-8271> > Twitter: https://twitter.com/holdenkarau >