A few additional observations about StarRocks... - As far as I can tell, StarRocks has an ASF incompatible license (Elastic License 2.0). - It appears to be a hard fork of Apache Doris, a project still in the incubator (and looks like it probably is destructive to the Doris project) - The project has only existed for ~2 months.
On Sun, Nov 7, 2021 at 7:34 PM OpenInx <open...@gmail.com> wrote: > Any thoughts for adding StarRocks integration to the roadmap ? > > I think the guys from StarRocks community can provide more background and > inputs. > > On Thu, Nov 4, 2021 at 5:59 PM OpenInx <open...@gmail.com> wrote: > >> Update: >> >> StarRocks[1] is a next-gen sub-second MPP database for full analysis >> scenarios, including multi-dimensional analytics, real-time analytics and >> ad-hoc query. Their team is planning to integrate iceberg tables as >> StarRocks external tables in the next month [2], so that people could >> connect the data lake and StarRocks warehouse in the same engine. >> The excellent performance of StarRocks will also help accelerate the >> analysis and access of the iceberg table, I think this is a great thing for >> both the iceberg community and the StarRocks community. I think we can >> add an extra project about StarRocks integration work in the apache iceberg >> roadmap [3] ? >> >> [1]. https://github.com/StarRocks/starrocks >> [2]. https://github.com/StarRocks/starrocks/issues/1030 >> [3]. https://github.com/apache/iceberg/projects >> >> On Mon, Nov 1, 2021 at 11:52 PM Ryan Blue <b...@tabular.io> wrote: >> >>> I closed the upgrade project and marked the FLIP-27 project priority 1. >>> Thanks for all the work to get this done! >>> >>> On Sun, Oct 31, 2021 at 8:10 PM OpenInx <open...@gmail.com> wrote: >>> >>>> Update: >>>> >>>> I think the project [Flink: Upgrade to 1.13.2][1] in RoadMap can be >>>> closed now, because all of the issues have been addressed. >>>> >>>> [1]. https://github.com/apache/iceberg/projects/12 >>>> >>>> On Tue, Sep 21, 2021 at 6:17 PM Eduard Tudenhoefner <edu...@dremio.com> >>>> wrote: >>>> >>>>> I created a Roadmap section in >>>>> https://github.com/apache/iceberg/pull/3163 >>>>> <https://github.com/apache/iceberg/pull/3163> that links to the >>>>> planning boards that Jack created. I figured it makes sense if we link >>>>> available Design Docs directly on those Boards (as was already done), >>>>> because then the Design docs are closer to the set of related issues. >>>>> >>>>> On Mon, Sep 20, 2021 at 10:02 PM Ryan Blue <b...@tabular.io> wrote: >>>>> >>>>>> Thanks, Jack! >>>>>> >>>>>> Eduard, I think that's a good idea. We should have a roadmap page as >>>>>> well that links to the projects that Jack just created. >>>>>> >>>>>> On Mon, Sep 20, 2021 at 12:57 PM Jack Ye <yezhao...@gmail.com> wrote: >>>>>> >>>>>>> It seems like we have reached some consensus around the projects >>>>>>> listed here. I have created corresponding Github projects for each: >>>>>>> https://github.com/apache/iceberg/projects >>>>>>> >>>>>>> Related design docs are also linked there. >>>>>>> >>>>>>> Best, >>>>>>> Jack Ye >>>>>>> >>>>>>> On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner < >>>>>>> edu...@dremio.com> wrote: >>>>>>> >>>>>>>> Would it make sense to have a section on the website where we >>>>>>>> collect all the links to the design docs/specs as that would be easier >>>>>>>> to >>>>>>>> find than searching for things on the ML? >>>>>>>> >>>>>>>> I was thinking about something like for each component: >>>>>>>> * link to the ML discussion >>>>>>>> * link to the actual Spec/Design Doc >>>>>>>> >>>>>>>> Thoughts? >>>>>>>> >>>>>>>> On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <b...@tabular.io> wrote: >>>>>>>> >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> At the last sync meeting, we brought up publishing a community >>>>>>>>> roadmap and brainstormed the many features and initiatives that the >>>>>>>>> community is working on. In this thread, I want to make sure that we >>>>>>>>> have a >>>>>>>>> good list of what people are thinking about and I think we should try >>>>>>>>> to >>>>>>>>> categorize the projects by size and general priority. When we reach a >>>>>>>>> rough >>>>>>>>> agreement, I’ll write this up and post it on the ASF site along with >>>>>>>>> links >>>>>>>>> to some projects in Github. >>>>>>>>> >>>>>>>>> My rationale for attempting to prioritize projects is that if we >>>>>>>>> try to do too many things, it will be slower progress across >>>>>>>>> everything >>>>>>>>> rather than getting a few important items done. I know that priorities >>>>>>>>> don’t align very cleanly in practice, but it is hopefully worth >>>>>>>>> trying. To >>>>>>>>> come up with a priority, I’m trying to keep top priority items to a >>>>>>>>> minimum >>>>>>>>> by including only one from each group (Spark, Flink, Python, etc.). >>>>>>>>> The >>>>>>>>> remaining items are split between priority 2 and 3. Priority 3 is not >>>>>>>>> urgent, including things that can be plugged in (like other IO >>>>>>>>> libraries), >>>>>>>>> docs, etc. Everything else is priority 2. >>>>>>>>> >>>>>>>>> That something isn’t priority 1 doesn’t mean it isn’t important or >>>>>>>>> progressing, just that it isn’t the current focus. I think of it this >>>>>>>>> way: >>>>>>>>> if someone has extra time to review something, what should be next? >>>>>>>>> That’s >>>>>>>>> top priority. >>>>>>>>> >>>>>>>>> Here’s my rough categorization. If you disagree, please speak up: >>>>>>>>> >>>>>>>>> - If you think that something should be top priority, what >>>>>>>>> gets moved to priority 2? >>>>>>>>> - Should the priority for a project in 2 or 3 change? >>>>>>>>> - Is the S/M/L size of a project wrong? >>>>>>>>> >>>>>>>>> Top priority, 1: >>>>>>>>> >>>>>>>>> - API: Iceberg 1.0 [medium] >>>>>>>>> - Spark: Merge-on-read plans [large] >>>>>>>>> - Maintenance: Delete file compaction [medium] >>>>>>>>> - >>>>>>>>> >>>>>>>>> Flink: Upgrade to 1.13.2 (document compatibility) [medium] >>>>>>>>> - >>>>>>>>> >>>>>>>>> Python: Pythonic refactor [medium] >>>>>>>>> >>>>>>>>> Priority 2: >>>>>>>>> >>>>>>>>> - ORC: Support delete files stored as ORC [small] >>>>>>>>> - Spark: DSv2 streaming improvements [small] >>>>>>>>> - Flink: Inline file compaction [small] >>>>>>>>> - Flink: Support UPSERT [small] >>>>>>>>> - Views: Spec [medium] >>>>>>>>> - Spec: Z-ordering / Space-filling curves [medium] >>>>>>>>> - Spec: Snapshot tagging and branching [small] >>>>>>>>> - Spec: Secondary indexes [large] >>>>>>>>> - Spec v3: Encryption [large] >>>>>>>>> - >>>>>>>>> >>>>>>>>> Spec v3: Relative paths [large] >>>>>>>>> - >>>>>>>>> >>>>>>>>> Spec v3: Default field values [medium] >>>>>>>>> >>>>>>>>> Priority 3: >>>>>>>>> >>>>>>>>> - Docs: versioned docs [medium] >>>>>>>>> - IO: Support Aliyun OSS/DLF [medium] >>>>>>>>> - IO: Support Dell ECS [medium] >>>>>>>>> >>>>>>>>> External: >>>>>>>>> >>>>>>>>> - Trino: Bucketed joins [small] >>>>>>>>> - Trino: Row-level delete support [medium] >>>>>>>>> - Trino: Merge-on-read plans [medium] >>>>>>>>> - Trino: Multi-catalog support [small] >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ryan Blue >>>>>>>>> Tabular >>>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Tabular >>>>>> >>>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >>