Any thoughts for adding StarRocks integration to the roadmap ? I think the guys from StarRocks community can provide more background and inputs.
On Thu, Nov 4, 2021 at 5:59 PM OpenInx <open...@gmail.com> wrote: > Update: > > StarRocks[1] is a next-gen sub-second MPP database for full analysis > scenarios, including multi-dimensional analytics, real-time analytics and > ad-hoc query. Their team is planning to integrate iceberg tables as > StarRocks external tables in the next month [2], so that people could > connect the data lake and StarRocks warehouse in the same engine. > The excellent performance of StarRocks will also help accelerate the > analysis and access of the iceberg table, I think this is a great thing for > both the iceberg community and the StarRocks community. I think we can > add an extra project about StarRocks integration work in the apache iceberg > roadmap [3] ? > > [1]. https://github.com/StarRocks/starrocks > [2]. https://github.com/StarRocks/starrocks/issues/1030 > [3]. https://github.com/apache/iceberg/projects > > On Mon, Nov 1, 2021 at 11:52 PM Ryan Blue <b...@tabular.io> wrote: > >> I closed the upgrade project and marked the FLIP-27 project priority 1. >> Thanks for all the work to get this done! >> >> On Sun, Oct 31, 2021 at 8:10 PM OpenInx <open...@gmail.com> wrote: >> >>> Update: >>> >>> I think the project [Flink: Upgrade to 1.13.2][1] in RoadMap can be >>> closed now, because all of the issues have been addressed. >>> >>> [1]. https://github.com/apache/iceberg/projects/12 >>> >>> On Tue, Sep 21, 2021 at 6:17 PM Eduard Tudenhoefner <edu...@dremio.com> >>> wrote: >>> >>>> I created a Roadmap section in >>>> https://github.com/apache/iceberg/pull/3163 >>>> <https://github.com/apache/iceberg/pull/3163> that links to the >>>> planning boards that Jack created. I figured it makes sense if we link >>>> available Design Docs directly on those Boards (as was already done), >>>> because then the Design docs are closer to the set of related issues. >>>> >>>> On Mon, Sep 20, 2021 at 10:02 PM Ryan Blue <b...@tabular.io> wrote: >>>> >>>>> Thanks, Jack! >>>>> >>>>> Eduard, I think that's a good idea. We should have a roadmap page as >>>>> well that links to the projects that Jack just created. >>>>> >>>>> On Mon, Sep 20, 2021 at 12:57 PM Jack Ye <yezhao...@gmail.com> wrote: >>>>> >>>>>> It seems like we have reached some consensus around the projects >>>>>> listed here. I have created corresponding Github projects for each: >>>>>> https://github.com/apache/iceberg/projects >>>>>> >>>>>> Related design docs are also linked there. >>>>>> >>>>>> Best, >>>>>> Jack Ye >>>>>> >>>>>> On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner < >>>>>> edu...@dremio.com> wrote: >>>>>> >>>>>>> Would it make sense to have a section on the website where we >>>>>>> collect all the links to the design docs/specs as that would be easier >>>>>>> to >>>>>>> find than searching for things on the ML? >>>>>>> >>>>>>> I was thinking about something like for each component: >>>>>>> * link to the ML discussion >>>>>>> * link to the actual Spec/Design Doc >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <b...@tabular.io> wrote: >>>>>>> >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> At the last sync meeting, we brought up publishing a community >>>>>>>> roadmap and brainstormed the many features and initiatives that the >>>>>>>> community is working on. In this thread, I want to make sure that we >>>>>>>> have a >>>>>>>> good list of what people are thinking about and I think we should try >>>>>>>> to >>>>>>>> categorize the projects by size and general priority. When we reach a >>>>>>>> rough >>>>>>>> agreement, I’ll write this up and post it on the ASF site along with >>>>>>>> links >>>>>>>> to some projects in Github. >>>>>>>> >>>>>>>> My rationale for attempting to prioritize projects is that if we >>>>>>>> try to do too many things, it will be slower progress across everything >>>>>>>> rather than getting a few important items done. I know that priorities >>>>>>>> don’t align very cleanly in practice, but it is hopefully worth >>>>>>>> trying. To >>>>>>>> come up with a priority, I’m trying to keep top priority items to a >>>>>>>> minimum >>>>>>>> by including only one from each group (Spark, Flink, Python, etc.). The >>>>>>>> remaining items are split between priority 2 and 3. Priority 3 is not >>>>>>>> urgent, including things that can be plugged in (like other IO >>>>>>>> libraries), >>>>>>>> docs, etc. Everything else is priority 2. >>>>>>>> >>>>>>>> That something isn’t priority 1 doesn’t mean it isn’t important or >>>>>>>> progressing, just that it isn’t the current focus. I think of it this >>>>>>>> way: >>>>>>>> if someone has extra time to review something, what should be next? >>>>>>>> That’s >>>>>>>> top priority. >>>>>>>> >>>>>>>> Here’s my rough categorization. If you disagree, please speak up: >>>>>>>> >>>>>>>> - If you think that something should be top priority, what gets >>>>>>>> moved to priority 2? >>>>>>>> - Should the priority for a project in 2 or 3 change? >>>>>>>> - Is the S/M/L size of a project wrong? >>>>>>>> >>>>>>>> Top priority, 1: >>>>>>>> >>>>>>>> - API: Iceberg 1.0 [medium] >>>>>>>> - Spark: Merge-on-read plans [large] >>>>>>>> - Maintenance: Delete file compaction [medium] >>>>>>>> - >>>>>>>> >>>>>>>> Flink: Upgrade to 1.13.2 (document compatibility) [medium] >>>>>>>> - >>>>>>>> >>>>>>>> Python: Pythonic refactor [medium] >>>>>>>> >>>>>>>> Priority 2: >>>>>>>> >>>>>>>> - ORC: Support delete files stored as ORC [small] >>>>>>>> - Spark: DSv2 streaming improvements [small] >>>>>>>> - Flink: Inline file compaction [small] >>>>>>>> - Flink: Support UPSERT [small] >>>>>>>> - Views: Spec [medium] >>>>>>>> - Spec: Z-ordering / Space-filling curves [medium] >>>>>>>> - Spec: Snapshot tagging and branching [small] >>>>>>>> - Spec: Secondary indexes [large] >>>>>>>> - Spec v3: Encryption [large] >>>>>>>> - >>>>>>>> >>>>>>>> Spec v3: Relative paths [large] >>>>>>>> - >>>>>>>> >>>>>>>> Spec v3: Default field values [medium] >>>>>>>> >>>>>>>> Priority 3: >>>>>>>> >>>>>>>> - Docs: versioned docs [medium] >>>>>>>> - IO: Support Aliyun OSS/DLF [medium] >>>>>>>> - IO: Support Dell ECS [medium] >>>>>>>> >>>>>>>> External: >>>>>>>> >>>>>>>> - Trino: Bucketed joins [small] >>>>>>>> - Trino: Row-level delete support [medium] >>>>>>>> - Trino: Merge-on-read plans [medium] >>>>>>>> - Trino: Multi-catalog support [small] >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Tabular >>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Tabular >>>>> >>>> >> >> -- >> Ryan Blue >> Tabular >> >