I would like to know how we should handle the two Kinesis-related modules in Spark 4.0. They have a very low frequency of code updates, and because the corresponding tests are not continuously executed in any GitHub Actions pipeline, so I think they significantly lack quality assurance. On top of that, I am not certain if the test cases, which require AWS credentials in these modules, get verified during each Spark version release.
Thanks, Jiie Yang On 2023/08/08 08:28:37 Cheng Pan wrote: > What do you think about removing HiveContext and even SQLContext? > > And as an extension of this question, should we re-implement the Hive using > DSv2 API in Spark 4? > > For developers who want to implement a custom DataSource plugin, he/she may > want to learn something from the Spark built-in one[1], and Hive is a good > candidate. A kind of legacy implementation may confuse the developers. > > It was discussed/requested in [2][3][4][5] > > There were some requests for multiple Hive metastores support[6], and I have > experienced that users choose Presto/Trino instead of Spark because the > former supports multi HMS. > > BTW, there are known third-party Hive DSv2 implementations[7][8]. > > [1] https://www.mail-archive.com/dev@spark.apache.org/msg30353.html > [2] https://www.mail-archive.com/dev@spark.apache.org/msg25715.html > [3] https://issues.apache.org/jira/browse/SPARK-31241 > [4] https://issues.apache.org/jira/browse/SPARK-39797 > [5] https://issues.apache.org/jira/browse/SPARK-44518 > [6] https://www.mail-archive.com/dev@spark.apache.org/msg30228.html > [7] https://github.com/permanentstar/spark-sql-dsv2-extension > [8] > https://github.com/apache/kyuubi/tree/master/extensions/spark/kyuubi-spark-connector-hive > > Thanks, > Cheng Pan > > > > On Aug 8, 2023, at 10:09, Wenchen Fan <cloud0...@gmail.com> wrote: > > > > I think the principle is we should remove things that block us from > > supporting new things like Java 21, or come with a significant maintenance > > cost. If there is no benefit to removing deprecated APIs (just to keep the > > codebase clean?), I'd prefer to leave them there and not bother. > > > > On Tue, Aug 8, 2023 at 9:00 AM Jia Fan <fanjiaemi...@qq.com.invalid> wrote: > > Thanks Sean for open this discussion. > > > > 1. I think drop Scala 2.12 is a good option. > > > > 2. Personally, I think we should remove most methods that are deprecated > > since 2.x/1.x unless it can't find a good replacement. There is already a > > 3.x version as a buffer and I don't think it is good practice to use the > > deprecated method of 2.x on 4.x. > > > > 3. For Mesos, I think we should remove it from doc first. > > ________________________ > > > > Jia Fan > > > > > > > >> 2023年8月8日 05:47,Sean Owen <sro...@gmail.com> 写道: > >> > >> While we're noodling on the topic, what else might be worth removing in > >> Spark 4? > >> > >> For example, looks like we're finally hitting problems supporting Java 8 > >> through 21 all at once, related to Scala 2.13.x updates. It would be > >> reasonable to require Java 11, or even 17, as a baseline for the > >> multi-year lifecycle of Spark 4. > >> > >> Dare I ask: drop Scala 2.12? supporting 2.12 / 2.13 / 3.0 might get hard > >> otherwise. > >> > >> There was a good discussion about whether old deprecated methods should be > >> removed. They can't be removed at other times, but, doesn't mean they all > >> should be. createExternalTable was brought up as a first example. What > >> deprecated methods are worth removing? > >> > >> There's Mesos support, long since deprecated, which seems like something > >> to prune. > >> > >> Are there old Hive/Hadoop version combos we should just stop supporting? > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org