Re: Spark 3.0 branch cut and code freeze on Jan 31?

Xiao Li Tue, 04 Feb 2020 14:29:14 -0800

Thank you, Shane!

Xiao


On Tue, Feb 4, 2020 at 2:16 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Thank you, Shane! :D
>
> Bests,
> Dongjoon
>
> On Tue, Feb 4, 2020 at 13:28 shane knapp ☠ <skn...@berkeley.edu> wrote:
>
>> all the 3.0 builds have been created and are currently churning away!
>>
>> (the failed builds were to a silly bug in the build scripts sneaking it's
>> way back in, but that's resolved now)
>>
>> shane
>>
>> On Sat, Feb 1, 2020 at 6:16 PM Reynold Xin <r...@databricks.com> wrote:
>>
>>> Note that branch-3.0 was cut. Please focus on testing, polish, and let's
>>> get the release out!
>>>
>>>
>>> On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin <r...@databricks.com>
>>> wrote:
>>>
>>>> Just a reminder - code freeze is coming this Fri!
>>>>
>>>> There can always be exceptions, but those should be exceptions and
>>>> discussed on a case by case basis rather than becoming the norm.
>>>>
>>>>
>>>>
>>>> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim <
>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>
>>>>> Jan 31 sounds good to me.
>>>>>
>>>>> Just curious, do we allow some exception on code freeze? One thing
>>>>> came into my mind is that some feature could have multiple subtasks and
>>>>> part of subtasks have been merged and other subtask(s) are in reviewing. 
>>>>> In
>>>>> this case do we allow these subtasks to have more days to get reviewed and
>>>>> merged later?
>>>>>
>>>>> Happy Holiday!
>>>>>
>>>>> Thanks,
>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>
>>>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <
>>>>> linguin....@gmail.com> wrote:
>>>>>
>>>>>> Looks nice, happy holiday, all!
>>>>>>
>>>>>> Bests,
>>>>>> Takeshi
>>>>>>
>>>>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <
>>>>>> dongjoon.h...@gmail.com> wrote:
>>>>>>
>>>>>>> +1 for January 31st.
>>>>>>>
>>>>>>> Bests,
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <lix...@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>>>>>>
>>>>>>>> Xiao
>>>>>>>>
>>>>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sro...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>>>>>>> arbitrary but indeed this has been in progress for a while, and 
>>>>>>>>> there's a
>>>>>>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>>>>>>> On my end I don't know of anything that's holding up a release; is
>>>>>>>>> it basically DSv2?
>>>>>>>>>
>>>>>>>>> BTW these are the items still targeted to 3.0.0, some of which may
>>>>>>>>> not have been legitimately tagged. It may be worth reviewing what's 
>>>>>>>>> still
>>>>>>>>> open and necessary, and what should be untargeted.
>>>>>>>>>
>>>>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>>>>>>> arbitrary metrics on streaming queries
>>>>>>>>> SPARK-29348 Add observable metrics
>>>>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2
>>>>>>>>> test
>>>>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>>>>>>>>> after some operations
>>>>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>>>>> SPARK-28301 fix the behavior of table name resolution with
>>>>>>>>> multi-catalog
>>>>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>>>>>> relation table properly
>>>>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>>>>>>> smoother upgrade
>>>>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when
>>>>>>>>> the # of joined tables > 12
>>>>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>>>>> hadoopConfiguration
>>>>>>>>> SPARK-24625 put all the backward compatible behavior change
>>>>>>>>> configs under spark.sql.legacy.*
>>>>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures
>>>>>>>>> by default
>>>>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>>>>>>>> backend cause driver pods to hang
>>>>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>>>>> SPARK-24942 Improve cluster resource management with jobs
>>>>>>>>> containing barrier stage
>>>>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>>>>>>>> logical Aggregate
>>>>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>>>>>>>> standard
>>>>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>>>>> SPARK-26425 Add more constraint checks in file streaming source to
>>>>>>>>> avoid checkpoint corruption
>>>>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>>>>>>> produce named output from CleanupAliases
>>>>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>>>>>>>> window aggregate
>>>>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>>>>>>> Kubernetes
>>>>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode +
>>>>>>>>> Mesos
>>>>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>>>>> MesosFineGrainedSchedulerBackend
>>>>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for
>>>>>>>>> barrier execution mode
>>>>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>>>>>>>> Partition Spec
>>>>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>>>>> SPARK-19842 Informational Referential Integrity Constraints
>>>>>>>>> Support in Spark
>>>>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in
>>>>>>>>> nested list of structures
>>>>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode +
>>>>>>>>> YARN
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <r...@databricks.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>>>>>>> documented on the website
>>>>>>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>>>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering 
>>>>>>>>>> a bit
>>>>>>>>>> from the tragedy of the commons, that nobody is pushing for getting 
>>>>>>>>>> the
>>>>>>>>>> release out. I understand the natural tendency for each individual 
>>>>>>>>>> is to
>>>>>>>>>> finish or extend the feature/bug that the person has been working 
>>>>>>>>>> on. At
>>>>>>>>>> some point we need to say "this is it" and get the release out. I'm 
>>>>>>>>>> happy
>>>>>>>>>> to help drive this process.
>>>>>>>>>>
>>>>>>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>>>>>>> Although we have updated the website, contributors have all been 
>>>>>>>>>> operating
>>>>>>>>>> under the assumption that all active developments are still going 
>>>>>>>>>> on. I
>>>>>>>>>> propose we *cut the branch on **Jan 31**, and code freeze and
>>>>>>>>>> switch over to bug squashing mode, and try to get the 3.0 official 
>>>>>>>>>> release
>>>>>>>>>> out in Q1*. That is, by default no new features can go into the
>>>>>>>>>> branch starting Jan 31.
>>>>>>>>>>
>>>>>>>>>> What do you think?
>>>>>>>>>>
>>>>>>>>>> And happy holidays everybody.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> [image: Databricks Summit - Watch the talks]
>>>>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ---
>>>>>> Takeshi Yamamuro
>>>>>>
>>>>>
>>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
> --
<https://databricks.com/sparkaisummit/north-america>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Reply via email to