I would like to know how we should handle the two Kinesis-related modules in 
Spark 4.0. They have a very low frequency of code updates, and because the 
corresponding tests are not continuously executed in any GitHub Actions 
pipeline, so I think they significantly lack quality assurance. On top of that, 
I am not certain if the test cases, which require AWS credentials in these 
modules, get verified during each Spark version release.

Thanks,
Jiie Yang

On 2023/08/08 08:28:37 Cheng Pan wrote:
> What do you think about removing HiveContext and even SQLContext?
> 
> And as an extension of this question, should we re-implement the Hive using 
> DSv2 API in Spark 4?
> 
> For developers who want to implement a custom DataSource plugin, he/she may 
> want to learn something from the Spark built-in one[1], and Hive is a good 
> candidate. A kind of legacy implementation may confuse the developers.
> 
> It was discussed/requested in [2][3][4][5]
> 
> There were some requests for multiple Hive metastores support[6], and I have 
> experienced that users choose Presto/Trino instead of Spark because the 
> former supports multi HMS.
> 
> BTW, there are known third-party Hive DSv2 implementations[7][8].
> 
> [1] https://www.mail-archive.com/dev@spark.apache.org/msg30353.html
> [2] https://www.mail-archive.com/dev@spark.apache.org/msg25715.html
> [3] https://issues.apache.org/jira/browse/SPARK-31241
> [4] https://issues.apache.org/jira/browse/SPARK-39797
> [5] https://issues.apache.org/jira/browse/SPARK-44518
> [6] https://www.mail-archive.com/dev@spark.apache.org/msg30228.html
> [7] https://github.com/permanentstar/spark-sql-dsv2-extension
> [8] 
> https://github.com/apache/kyuubi/tree/master/extensions/spark/kyuubi-spark-connector-hive
> 
> Thanks,
> Cheng Pan
> 
> 
> > On Aug 8, 2023, at 10:09, Wenchen Fan <cloud0...@gmail.com> wrote:
> > 
> > I think the principle is we should remove things that block us from 
> > supporting new things like Java 21, or come with a significant maintenance 
> > cost. If there is no benefit to removing deprecated APIs (just to keep the 
> > codebase clean?), I'd prefer to leave them there and not bother.
> > 
> > On Tue, Aug 8, 2023 at 9:00 AM Jia Fan <fanjiaemi...@qq.com.invalid> wrote:
> > Thanks Sean  for open this discussion.
> > 
> > 1. I think drop Scala 2.12 is a good option.
> > 
> > 2. Personally, I think we should remove most methods that are deprecated 
> > since 2.x/1.x unless it can't find a good replacement. There is already a 
> > 3.x version as a buffer and I don't think it is good practice to use the 
> > deprecated method of 2.x on 4.x.
> > 
> > 3. For Mesos, I think we should remove it from doc first.
> > ________________________
> > 
> > Jia Fan
> > 
> > 
> > 
> >> 2023年8月8日 05:47,Sean Owen <sro...@gmail.com> 写道:
> >> 
> >> While we're noodling on the topic, what else might be worth removing in 
> >> Spark 4?
> >> 
> >> For example, looks like we're finally hitting problems supporting Java 8 
> >> through 21 all at once, related to Scala 2.13.x updates. It would be 
> >> reasonable to require Java 11, or even 17, as a baseline for the 
> >> multi-year lifecycle of Spark 4.
> >> 
> >> Dare I ask: drop Scala 2.12? supporting 2.12 / 2.13 / 3.0 might get hard 
> >> otherwise.
> >> 
> >> There was a good discussion about whether old deprecated methods should be 
> >> removed. They can't be removed at other times, but, doesn't mean they all 
> >> should be. createExternalTable was brought up as a first example. What 
> >> deprecated methods are worth removing?
> >> 
> >> There's Mesos support, long since deprecated, which seems like something 
> >> to prune.
> >> 
> >> Are there old Hive/Hadoop version combos we should just stop supporting?
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to