The RFC-13 Flink writer has some bottlenecks that make it hard to adapter
to production:

- The InstantGeneratorOperator is parallelism 1, which is a limit for
high-throughput consumption; because all the split inputs drain to a single
thread, the network IO would gains pressure too
- The WriteProcessOperator handles inputs by partition, that means, within
each partition write process, the BUCKETs are written one by one, the FILE
IO is limit to adapter to high-throughput inputs
- It buffers the data by checkpoints, which is too hard to be robust for
production, the checkpoint function is blocking and should not have IO
operations.
- The FlinkHoodieIndex is only valid for a per-job scope, it does not work
for existing bootstrap data or for different Flink jobs

Thus, here I propose a new design for the Flink writer to solve these
problems[1]. Overall, the new design tries to remove the single parallelism
operators and make the index more powerful and scalable.

I plan to solve these bottlenecks incrementally (4 steps), there are
already some local POCs for these proposals.

I'm looking forward to your feedback. Any suggestions are appreciated ~

[1]
https://docs.google.com/document/d/1oOcU0VNwtEtZfTRt3v9z4xNQWY-Hy5beu7a1t5B-75I/edit?usp=sharing

Reply via email to