I'm in favor of adding SPARK-25299 <https://issues.apache.org/jira/browse/SPARK-25299> - Use remote storage for persisting shuffle data https://issues.apache.org/jira/browse/SPARK-25299
If that is far enough along to get onto the roadmap. On Wed, Sep 11, 2019 at 11:37 AM Sean Owen <sro...@apache.org> wrote: > I'm curious what current feelings are about ramping down towards a > Spark 3 release. It feels close to ready. There is no fixed date, > though in the past we had informally tossed around "back end of 2019". > For reference, Spark 1 was May 2014, Spark 2 was July 2016. I'd expect > Spark 2 to last longer, so to speak, but feels like Spark 3 is coming > due. > > What are the few major items that must get done for Spark 3, in your > opinion? Below are all of the open JIRAs for 3.0 (which everyone > should feel free to update with things that aren't really needed for > Spark 3; I already triaged some). > > For me, it's: > - DSv2? > - Finishing touches on the Hive, JDK 11 update > > What about considering a preview release earlier, as happened for > Spark 2, to get feedback much earlier than the RC cycle? Could that > even happen ... about now? > > I'm also wondering what a realistic estimate of Spark 3 release is. My > guess is quite early 2020, from here. > > > > SPARK-29014 DataSourceV2: Clean up current, default, and session catalog > uses > SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests > SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite > SPARK-28717 Update SQL ALTER TABLE RENAME to use TableCatalog API > SPARK-28588 Build a SQL reference doc > SPARK-28629 Capture the missing rules in HiveSessionStateBuilder > SPARK-28684 Hive module support JDK 11 > SPARK-28548 explain() shows wrong result for persisted DataFrames > after some operations > SPARK-28372 Document Spark WEB UI > SPARK-28476 Support ALTER DATABASE SET LOCATION > SPARK-28264 Revisiting Python / pandas UDF > SPARK-28301 fix the behavior of table name resolution with multi-catalog > SPARK-28155 do not leak SaveMode to file source v2 > SPARK-28103 Cannot infer filters from union table with empty local > relation table properly > SPARK-28024 Incorrect numeric values when out of range > SPARK-27936 Support local dependency uploading from --py-files > SPARK-27884 Deprecate Python 2 support in Spark 3.0 > SPARK-27763 Port test cases from PostgreSQL to Spark SQL > SPARK-27780 Shuffle server & client should be versioned to enable > smoother upgrade > SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # > of joined tables > 12 > SPARK-27471 Reorganize public v2 catalog API > SPARK-27520 Introduce a global config system to replace hadoopConfiguration > SPARK-24625 put all the backward compatible behavior change configs > under spark.sql.legacy.* > SPARK-24640 size(null) returns null > SPARK-24702 Unable to cast to calendar interval in spark sql. > SPARK-24838 Support uncorrelated IN/EXISTS subqueries for more operators > SPARK-24941 Add RDDBarrier.coalesce() function > SPARK-25017 Add test suite for ContextBarrierState > SPARK-25083 remove the type erasure hack in data source scan > SPARK-25383 Image data source supports sample pushdown > SPARK-27272 Enable blacklisting of node/executor on fetch failures by > default > SPARK-27296 User Defined Aggregating Functions (UDAFs) have a major > efficiency problem > SPARK-25128 multiple simultaneous job submissions against k8s backend > cause driver pods to hang > SPARK-26731 remove EOLed spark jobs from jenkins > SPARK-26664 Make DecimalType's minimum adjusted scale configurable > SPARK-21559 Remove Mesos fine-grained mode > SPARK-24942 Improve cluster resource management with jobs containing > barrier stage > SPARK-25914 Separate projection from grouping and aggregate in logical > Aggregate > SPARK-26022 PySpark Comparison with Pandas > SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard > SPARK-26221 Improve Spark SQL instrumentation and metrics > SPARK-26425 Add more constraint checks in file streaming source to > avoid checkpoint corruption > SPARK-25843 Redesign rangeBetween API > SPARK-25841 Redesign window function rangeBetween API > SPARK-25752 Add trait to easily whitelist logical operators that > produce named output from CleanupAliases > SPARK-23210 Introduce the concept of default value to schema > SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window > aggregate > SPARK-25531 new write APIs for data source v2 > SPARK-25547 Pluggable jdbc connection factory > SPARK-20845 Support specification of column names in INSERT INTO > SPARK-24417 Build and Run Spark on JDK11 > SPARK-24724 Discuss necessary info and access in barrier mode + Kubernetes > SPARK-24725 Discuss necessary info and access in barrier mode + Mesos > SPARK-25074 Implement maxNumConcurrentTasks() in > MesosFineGrainedSchedulerBackend > SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 > SPARK-25186 Stabilize Data Source V2 API > SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier > execution mode > SPARK-25390 data source V2 API refactoring > SPARK-7768 Make user-defined type (UDT) API public > SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition Spec > SPARK-15691 Refactor and improve Hive support > SPARK-15694 Implement ScriptTransformation in sql/core > SPARK-16217 Support SELECT INTO statement > SPARK-16452 basic INFORMATION_SCHEMA support > SPARK-18134 SQL: MapType in Group BY and Joins not working > SPARK-18245 Improving support for bucketed table > SPARK-19842 Informational Referential Integrity Constraints Support in > Spark > SPARK-22231 Support of map, filter, withColumn, dropColumn in nested > list of structures > SPARK-22632 Fix the behavior of timestamp values for R's DataFrame to > respect session timezone > SPARK-22386 Data Source V2 improvements > SPARK-24723 Discuss necessary info and access in barrier mode + YARN > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >