Thank you, Fokko. Probably best to discuss further off-list. I'm almost embarrassed to describe our current workaround — it involves among other things a custom Shader implementation for the Maven Shade plugin.
michael > On Sep 13, 2019, at 3:07 AM, Driesprong, Fokko <fo...@driesprong.frl> wrote: > > Michael Heuer, that's an interesting issue. > > 1.8.2 to 1.9.0 is almost binary compatible (94%): > http://people.apache.org/~busbey/avro/1.9.0-RC4/1.8.2_to_1.9.0RC4_compat_report.html > > <http://people.apache.org/~busbey/avro/1.9.0-RC4/1.8.2_to_1.9.0RC4_compat_report.html>. > Most of the stuff is removing the Jackson and Netty API from Avro's public > API and deprecating the Joda library. I would strongly advise moving to 1.9.1 > since there are some regression issues, for Java most important: > https://jira.apache.org/jira/browse/AVRO-2400 > <https://jira.apache.org/jira/browse/AVRO-2400> > > I'd love to dive into the issue that you describe and I'm curious if the > issue is still there with Avro 1.9.1. I'm a bit busy at the moment but might > have some time this weekend to dive into it. > > Cheers, Fokko Driesprong > > > Op vr 13 sep. 2019 om 02:32 schreef Reynold Xin <r...@databricks.com > <mailto:r...@databricks.com>>: > +1! Long due for a preview release. > > > On Thu, Sep 12, 2019 at 5:26 PM, Holden Karau <hol...@pigscanfly.ca > <mailto:hol...@pigscanfly.ca>> wrote: > I like the idea from the PoV of giving folks something to start testing > against and exploring so they can raise issues with us earlier in the process > and we have more time to make calls around this. > > On Thu, Sep 12, 2019 at 4:15 PM John Zhuge <jzh...@apache.org > <mailto:jzh...@apache.org>> wrote: > +1 Like the idea as a user and a DSv2 contributor. > > On Thu, Sep 12, 2019 at 4:10 PM Jungtaek Lim <kabh...@gmail.com > <mailto:kabh...@gmail.com>> wrote: > +1 (as a contributor) from me to have preview release on Spark 3 as it would > help to test the feature. When to cut preview release is questionable, as > major works are ideally to be done before that - if we are intended to > introduce new features before official release, that should work regardless > of this, but if we are intended to have opportunity to test earlier, ideally > it should. > > As a one of contributors in structured streaming area, I'd like to add some > items for Spark 3.0, both "must be done" and "better to have". For "better to > have", I pick some items for new features which committers reviewed couple of > rounds and dropped off without soft-reject (No valid reason to stop). For > Spark 2.4 users, only added feature for structured streaming is Kafka > delegation token. (given we assume revising Kafka consumer pool as > improvement) I hope we provide some gifts for structured streaming users in > Spark 3.0 envelope. > > > must be done > * SPARK-26154 Stream-stream joins - left outer join gives inconsistent output > It's a correctness issue with multiple users reported, being reported at Nov. > 2018. There's a way to reproduce it consistently, and we have a patch > submitted at Jan. 2019 to fix it. > > > better to have > * SPARK-23539 Add support for Kafka headers in Structured Streaming > * SPARK-26848 Introduce new option to Kafka source - specify timestamp to > start and end offset > * SPARK-20568 Delete files after processing in structured streaming > > There're some more new features/improvements items in SS, but given we're > talking about ramping-down, above list might be realistic one. > > > > On Thu, Sep 12, 2019 at 9:53 AM Jean Georges Perrin <j...@jgp.net > <mailto:j...@jgp.net>> wrote: > As a user/non committer, +1 > > I love the idea of an early 3.0.0 so we can test current dev against it, I > know the final 3.x will probably need another round of testing when it gets > out, but less for sure... I know I could checkout and compile, but having a > “packaged” preversion is great if it does not take too much time to the > team... > > jg > > > On Sep 11, 2019, at 20:40, Hyukjin Kwon <gurwls...@gmail.com > <mailto:gurwls...@gmail.com>> wrote: > >> +1 from me too but I would like to know what other people think too. >> >> 2019년 9월 12일 (목) 오전 9:07, Dongjoon Hyun <dongjoon.h...@gmail.com >> <mailto:dongjoon.h...@gmail.com>>님이 작성: >> Thank you, Sean. >> >> I'm also +1 for the following three. >> >> 1. Start to ramp down (by the official branch-3.0 cut) >> 2. Apache Spark 3.0.0-preview in 2019 >> 3. Apache Spark 3.0.0 in early 2020 >> >> For JDK11 clean-up, it will meet the timeline and `3.0.0-preview` helps it a >> lot. >> >> After this discussion, can we have some timeline for `Spark 3.0 Release >> Window` in our versioning-policy page? >> >> - https://spark.apache.org/versioning-policy.html >> <https://spark.apache.org/versioning-policy.html> >> >> Bests, >> Dongjoon. >> >> >> On Wed, Sep 11, 2019 at 11:54 AM Michael Heuer <heue...@gmail.com >> <mailto:heue...@gmail.com>> wrote: >> I would love to see Spark + Hadoop + Parquet + Avro compatibility problems >> resolved, e.g. >> >> https://issues.apache.org/jira/browse/SPARK-25588 >> <https://issues.apache.org/jira/browse/SPARK-25588> >> https://issues.apache.org/jira/browse/SPARK-27781 >> <https://issues.apache.org/jira/browse/SPARK-27781> >> >> Note that Avro is now at 1.9.1, binary-incompatible with 1.8.x. As far as I >> know, Parquet has not cut a release based on this new version. >> >> Then out of curiosity, are the new Spark Graph APIs targeting 3.0? >> >> https://github.com/apache/spark/pull/24851 >> <https://github.com/apache/spark/pull/24851> >> https://github.com/apache/spark/pull/24297 >> <https://github.com/apache/spark/pull/24297> >> >> michael >> >> >>> On Sep 11, 2019, at 1:37 PM, Sean Owen <sro...@apache.org >>> <mailto:sro...@apache.org>> wrote: >>> >>> I'm curious what current feelings are about ramping down towards a >>> Spark 3 release. It feels close to ready. There is no fixed date, >>> though in the past we had informally tossed around "back end of 2019". >>> For reference, Spark 1 was May 2014, Spark 2 was July 2016. I'd expect >>> Spark 2 to last longer, so to speak, but feels like Spark 3 is coming >>> due. >>> >>> What are the few major items that must get done for Spark 3, in your >>> opinion? Below are all of the open JIRAs for 3.0 (which everyone >>> should feel free to update with things that aren't really needed for >>> Spark 3; I already triaged some). >>> >>> For me, it's: >>> - DSv2? >>> - Finishing touches on the Hive, JDK 11 update >>> >>> What about considering a preview release earlier, as happened for >>> Spark 2, to get feedback much earlier than the RC cycle? Could that >>> even happen ... about now? >>> >>> I'm also wondering what a realistic estimate of Spark 3 release is. My >>> guess is quite early 2020, from here. >>> >>> >>> >>> SPARK-29014 DataSourceV2: Clean up current, default, and session catalog >>> uses >>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests >>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite >>> SPARK-28717 Update SQL ALTER TABLE RENAME to use TableCatalog API >>> SPARK-28588 Build a SQL reference doc >>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder >>> SPARK-28684 Hive module support JDK 11 >>> SPARK-28548 explain() shows wrong result for persisted DataFrames >>> after some operations >>> SPARK-28372 Document Spark WEB UI >>> SPARK-28476 Support ALTER DATABASE SET LOCATION >>> SPARK-28264 Revisiting Python / pandas UDF >>> SPARK-28301 fix the behavior of table name resolution with multi-catalog >>> SPARK-28155 do not leak SaveMode to file source v2 >>> SPARK-28103 Cannot infer filters from union table with empty local >>> relation table properly >>> SPARK-28024 Incorrect numeric values when out of range >>> SPARK-27936 Support local dependency uploading from --py-files >>> SPARK-27884 Deprecate Python 2 support in Spark 3.0 >>> SPARK-27763 Port test cases from PostgreSQL to Spark SQL >>> SPARK-27780 Shuffle server & client should be versioned to enable >>> smoother upgrade >>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # >>> of joined tables > 12 >>> SPARK-27471 Reorganize public v2 catalog API >>> SPARK-27520 Introduce a global config system to replace hadoopConfiguration >>> SPARK-24625 put all the backward compatible behavior change configs >>> under spark.sql.legacy.* >>> SPARK-24640 size(null) returns null >>> SPARK-24702 Unable to cast to calendar interval in spark sql. >>> SPARK-24838 Support uncorrelated IN/EXISTS subqueries for more operators >>> SPARK-24941 Add RDDBarrier.coalesce() function >>> SPARK-25017 Add test suite for ContextBarrierState >>> SPARK-25083 remove the type erasure hack in data source scan >>> SPARK-25383 Image data source supports sample pushdown >>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by >>> default >>> SPARK-27296 User Defined Aggregating Functions (UDAFs) have a major >>> efficiency problem >>> SPARK-25128 multiple simultaneous job submissions against k8s backend >>> cause driver pods to hang >>> SPARK-26731 remove EOLed spark jobs from jenkins >>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable >>> SPARK-21559 Remove Mesos fine-grained mode >>> SPARK-24942 Improve cluster resource management with jobs containing >>> barrier stage >>> SPARK-25914 Separate projection from grouping and aggregate in logical >>> Aggregate >>> SPARK-26022 PySpark Comparison with Pandas >>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard >>> SPARK-26221 Improve Spark SQL instrumentation and metrics >>> SPARK-26425 Add more constraint checks in file streaming source to >>> avoid checkpoint corruption >>> SPARK-25843 Redesign rangeBetween API >>> SPARK-25841 Redesign window function rangeBetween API >>> SPARK-25752 Add trait to easily whitelist logical operators that >>> produce named output from CleanupAliases >>> SPARK-23210 Introduce the concept of default value to schema >>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window >>> aggregate >>> SPARK-25531 new write APIs for data source v2 >>> SPARK-25547 Pluggable jdbc connection factory >>> SPARK-20845 Support specification of column names in INSERT INTO >>> SPARK-24417 Build and Run Spark on JDK11 >>> SPARK-24724 Discuss necessary info and access in barrier mode + Kubernetes >>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos >>> SPARK-25074 Implement maxNumConcurrentTasks() in >>> MesosFineGrainedSchedulerBackend >>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 >>> SPARK-25186 Stabilize Data Source V2 API >>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier >>> execution mode >>> SPARK-25390 data source V2 API refactoring >>> SPARK-7768 Make user-defined type (UDT) API public >>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition Spec >>> SPARK-15691 Refactor and improve Hive support >>> SPARK-15694 Implement ScriptTransformation in sql/core >>> SPARK-16217 Support SELECT INTO statement >>> SPARK-16452 basic INFORMATION_SCHEMA support >>> SPARK-18134 SQL: MapType in Group BY and Joins not working >>> SPARK-18245 Improving support for bucketed table >>> SPARK-19842 Informational Referential Integrity Constraints Support in Spark >>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested >>> list of structures >>> SPARK-22632 Fix the behavior of timestamp values for R's DataFrame to >>> respect session timezone >>> SPARK-22386 Data Source V2 improvements >>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> <mailto:dev-unsubscr...@spark.apache.org> >>> >> > > > -- > Name : Jungtaek Lim > Blog : http://medium.com/@heartsavior <http://medium.com/@heartsavior> > Twitter : http://twitter.com/heartsavior <http://twitter.com/heartsavior> > LinkedIn : http://www.linkedin.com/in/heartsavior > <http://www.linkedin.com/in/heartsavior> > > -- > John Zhuge > > > -- > Twitter: https://twitter.com/holdenkarau <https://twitter.com/holdenkarau> > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 > <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > <https://www.youtube.com/user/holdenkarau>