Re: [DISCUSS] Iceberg roadmap

OpenInx Sun, 07 Nov 2021 19:34:42 -0800

Any thoughts for adding StarRocks integration to the roadmap ?

I think the guys from StarRocks community can provide more background and
inputs.


On Thu, Nov 4, 2021 at 5:59 PM OpenInx <[email protected]> wrote:

> Update:
>
> StarRocks[1] is a next-gen sub-second MPP database for full analysis
> scenarios, including multi-dimensional analytics, real-time analytics and
> ad-hoc query.  Their team is planning to integrate iceberg tables as
> StarRocks external tables in the next month [2], so that people could
> connect the data lake and StarRocks warehouse in the same engine.
> The excellent performance of StarRocks will also help accelerate the
> analysis and access of the iceberg table, I think this is a great thing for
> both the iceberg community and the StarRocks community.   I think we can
> add an extra project about StarRocks integration work in the apache iceberg
> roadmap [3] ?
>
> [1].  https://github.com/StarRocks/starrocks
> [2].  https://github.com/StarRocks/starrocks/issues/1030
> [3].  https://github.com/apache/iceberg/projects
>
> On Mon, Nov 1, 2021 at 11:52 PM Ryan Blue <[email protected]> wrote:
>
>> I closed the upgrade project and marked the FLIP-27 project priority 1.
>> Thanks for all the work to get this done!
>>
>> On Sun, Oct 31, 2021 at 8:10 PM OpenInx <[email protected]> wrote:
>>
>>> Update:
>>>
>>> I think the project  [Flink: Upgrade to 1.13.2][1] in RoadMap can be
>>> closed now, because all of the issues have been addressed.
>>>
>>> [1]. https://github.com/apache/iceberg/projects/12
>>>
>>> On Tue, Sep 21, 2021 at 6:17 PM Eduard Tudenhoefner <[email protected]>
>>> wrote:
>>>
>>>> I created a Roadmap section in
>>>>  https://github.com/apache/iceberg/pull/3163
>>>> <https://github.com/apache/iceberg/pull/3163> that links to the
>>>> planning boards that Jack created. I figured it makes sense if we link
>>>> available Design Docs directly on those Boards (as was already done),
>>>> because then the Design docs are closer to the set of related issues.
>>>>
>>>> On Mon, Sep 20, 2021 at 10:02 PM Ryan Blue <[email protected]> wrote:
>>>>
>>>>> Thanks, Jack!
>>>>>
>>>>> Eduard, I think that's a good idea. We should have a roadmap page as
>>>>> well that links to the projects that Jack just created.
>>>>>
>>>>> On Mon, Sep 20, 2021 at 12:57 PM Jack Ye <[email protected]> wrote:
>>>>>
>>>>>> It seems like we have reached some consensus around the projects
>>>>>> listed here. I have created corresponding Github projects for each:
>>>>>> https://github.com/apache/iceberg/projects
>>>>>>
>>>>>> Related design docs are also linked there.
>>>>>>
>>>>>> Best,
>>>>>> Jack Ye
>>>>>>
>>>>>> On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Would it make sense to have a section on the website where we
>>>>>>> collect all the links to the design docs/specs as that would be easier 
>>>>>>> to
>>>>>>> find than searching for things on the ML?
>>>>>>>
>>>>>>> I was thinking about something like for each component:
>>>>>>> * link to the ML discussion
>>>>>>> * link to the actual Spec/Design Doc
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> At the last sync meeting, we brought up publishing a community
>>>>>>>> roadmap and brainstormed the many features and initiatives that the
>>>>>>>> community is working on. In this thread, I want to make sure that we 
>>>>>>>> have a
>>>>>>>> good list of what people are thinking about and I think we should try 
>>>>>>>> to
>>>>>>>> categorize the projects by size and general priority. When we reach a 
>>>>>>>> rough
>>>>>>>> agreement, I’ll write this up and post it on the ASF site along with 
>>>>>>>> links
>>>>>>>> to some projects in Github.
>>>>>>>>
>>>>>>>> My rationale for attempting to prioritize projects is that if we
>>>>>>>> try to do too many things, it will be slower progress across everything
>>>>>>>> rather than getting a few important items done. I know that priorities
>>>>>>>> don’t align very cleanly in practice, but it is hopefully worth 
>>>>>>>> trying. To
>>>>>>>> come up with a priority, I’m trying to keep top priority items to a 
>>>>>>>> minimum
>>>>>>>> by including only one from each group (Spark, Flink, Python, etc.). The
>>>>>>>> remaining items are split between priority 2 and 3. Priority 3 is not
>>>>>>>> urgent, including things that can be plugged in (like other IO 
>>>>>>>> libraries),
>>>>>>>> docs, etc. Everything else is priority 2.
>>>>>>>>
>>>>>>>> That something isn’t priority 1 doesn’t mean it isn’t important or
>>>>>>>> progressing, just that it isn’t the current focus. I think of it this 
>>>>>>>> way:
>>>>>>>> if someone has extra time to review something, what should be next? 
>>>>>>>> That’s
>>>>>>>> top priority.
>>>>>>>>
>>>>>>>> Here’s my rough categorization. If you disagree, please speak up:
>>>>>>>>
>>>>>>>>    - If you think that something should be top priority, what gets
>>>>>>>>    moved to priority 2?
>>>>>>>>    - Should the priority for a project in 2 or 3 change?
>>>>>>>>    - Is the S/M/L size of a project wrong?
>>>>>>>>
>>>>>>>> Top priority, 1:
>>>>>>>>
>>>>>>>>    - API: Iceberg 1.0 [medium]
>>>>>>>>    - Spark: Merge-on-read plans [large]
>>>>>>>>    - Maintenance: Delete file compaction [medium]
>>>>>>>>    -
>>>>>>>>
>>>>>>>>    Flink: Upgrade to 1.13.2 (document compatibility) [medium]
>>>>>>>>    -
>>>>>>>>
>>>>>>>>    Python: Pythonic refactor [medium]
>>>>>>>>
>>>>>>>> Priority 2:
>>>>>>>>
>>>>>>>>    - ORC: Support delete files stored as ORC [small]
>>>>>>>>    - Spark: DSv2 streaming improvements [small]
>>>>>>>>    - Flink: Inline file compaction [small]
>>>>>>>>    - Flink: Support UPSERT [small]
>>>>>>>>    - Views: Spec [medium]
>>>>>>>>    - Spec: Z-ordering / Space-filling curves [medium]
>>>>>>>>    - Spec: Snapshot tagging and branching [small]
>>>>>>>>    - Spec: Secondary indexes [large]
>>>>>>>>    - Spec v3: Encryption [large]
>>>>>>>>    -
>>>>>>>>
>>>>>>>>    Spec v3: Relative paths [large]
>>>>>>>>    -
>>>>>>>>
>>>>>>>>    Spec v3: Default field values [medium]
>>>>>>>>
>>>>>>>> Priority 3:
>>>>>>>>
>>>>>>>>    - Docs: versioned docs [medium]
>>>>>>>>    - IO: Support Aliyun OSS/DLF [medium]
>>>>>>>>    - IO: Support Dell ECS [medium]
>>>>>>>>
>>>>>>>> External:
>>>>>>>>
>>>>>>>>    - Trino: Bucketed joins [small]
>>>>>>>>    - Trino: Row-level delete support [medium]
>>>>>>>>    - Trino: Merge-on-read plans [medium]
>>>>>>>>    - Trino: Multi-catalog support [small]
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>> Tabular
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Tabular
>>>>>
>>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Re: [DISCUSS] Iceberg roadmap

Reply via email to