Hi everyone,

At the last sync meeting, we brought up publishing a community roadmap and
brainstormed the many features and initiatives that the community is
working on. In this thread, I want to make sure that we have a good list of
what people are thinking about and I think we should try to categorize the
projects by size and general priority. When we reach a rough agreement,
I’ll write this up and post it on the ASF site along with links to some
projects in Github.

My rationale for attempting to prioritize projects is that if we try to do
too many things, it will be slower progress across everything rather than
getting a few important items done. I know that priorities don’t align very
cleanly in practice, but it is hopefully worth trying. To come up with a
priority, I’m trying to keep top priority items to a minimum by including
only one from each group (Spark, Flink, Python, etc.). The remaining items
are split between priority 2 and 3. Priority 3 is not urgent, including
things that can be plugged in (like other IO libraries), docs, etc.
Everything else is priority 2.

That something isn’t priority 1 doesn’t mean it isn’t important or
progressing, just that it isn’t the current focus. I think of it this way:
if someone has extra time to review something, what should be next? That’s
top priority.

Here’s my rough categorization. If you disagree, please speak up:

   - If you think that something should be top priority, what gets moved to
   priority 2?
   - Should the priority for a project in 2 or 3 change?
   - Is the S/M/L size of a project wrong?

Top priority, 1:

   - API: Iceberg 1.0 [medium]
   - Spark: Merge-on-read plans [large]
   - Maintenance: Delete file compaction [medium]
   -

   Flink: Upgrade to 1.13.2 (document compatibility) [medium]
   -

   Python: Pythonic refactor [medium]

Priority 2:

   - ORC: Support delete files stored as ORC [small]
   - Spark: DSv2 streaming improvements [small]
   - Flink: Inline file compaction [small]
   - Flink: Support UPSERT [small]
   - Views: Spec [medium]
   - Spec: Z-ordering / Space-filling curves [medium]
   - Spec: Snapshot tagging and branching [small]
   - Spec: Secondary indexes [large]
   - Spec v3: Encryption [large]
   -

   Spec v3: Relative paths [large]
   -

   Spec v3: Default field values [medium]

Priority 3:

   - Docs: versioned docs [medium]
   - IO: Support Aliyun OSS/DLF [medium]
   - IO: Support Dell ECS [medium]

External:

   - Trino: Bucketed joins [small]
   - Trino: Row-level delete support [medium]
   - Trino: Merge-on-read plans [medium]
   - Trino: Multi-catalog support [small]

-- 
Ryan Blue
Tabular

Reply via email to