Re: [DISCUSS] PIP-381: Handle large PositionInfo state

PengHui Li Tue, 24 Sep 2024 15:17:28 -0700

Thanks for driving the proposal.

I would like to share the related context that happened many years ago


- https://lists.apache.org/thread/y0r9kk0968ydpxtf16x6ql3x6kwy7dc1
- https://lists.apache.org/thread/hfv18cg0yckt5cqd0fc66rp7tth036kf

We have two major approaches:

1. Minimize the persistent size of cursor data:
• Example: PR:9292 and cursor data compression, possibly with a compressed
bitset implementation (RoaringBitmap).

2. Split the ack cursor data into multiple chunks:
• Example: PIP-81, PIP-381.

LinLin and I previously worked on PIP-81. Personally, I am not a big fan of
this solution.
While working on PIP-81 and cursor data compression, we found that
compression works well in most cases,
even when there are millions or tens of millions of ack ranges. I recall we
shared data on this before, though I can’t seem to find it now.

>From a user perspective, most users are satisfied with the current
solution, and only a few need compression enabled.
The simplicity of the solution is vital for community users, which was the
main reason we gave up on PIP-81 earlier.
Pulsar is already complex, so having a pluggable solution for the long term
would be more beneficial.
This way, most users get a clear, simple version, while others needing
enhanced solutions can create their plugins, managing the complexity
themselves.

I’m not going to block this proposal, but a few points need clarification:

• Feature Toggle: Add a flag that allows users to enable this feature
(keeping it disabled by default until there is higher demand).
Managed ledger and cursor complexities are well-known, so a smooth opt-in
process is crucial for users to adopt new features gradually.

• Compatibility Concerns: Since the persistent data structure will change,
we need to address rollback scenarios.
For instance, if a user has 10MB of cursor data, upgrades to a new version
with the PIP changes, and then needs to roll back to the older version,
will that user lose their 10MB cursor data? What steps are required for a
rollback to ensure data consistency?

Regards,
Penghui

On Tue, Sep 24, 2024 at 1:42 AM Lari Hotari <lhot...@apache.org> wrote:

> On Tue, 24 Sept 2024 at 05:01, Rajan Dhabalia <rdhaba...@apache.org>
> wrote:
> > However, there are multiple other PRs related to key-shared sub, stats,
> > cursor performance, and other PRs are still blocked by others and people
> > just block it because they think they don't have this usecase. It's so
> > unfortunate that people easily merge implementations which only handle
> > small-scale usecases  but the usecases for which Pulsar was built  are
> > being blocked or take a long time to merge. It's just that I don't have
> > that energy to keep following up for useful and important changes for
> > Pulsar. And this is one of these examples as well. I have also started
> > discussion about improving the PIP process because it has become painful
> in
> > many cases.
>
> It's not that individuals want to block changes for no reason. It
> seems that the main reason for blocking changes is the fear of
> regressions. Some areas of the Pulsar codebase aren't well covered in
> our test suites. For example, we don't have performance tests as part
> of the Apache Pulsar repositories. We have a lot of tests, but most of
> them are written in a way that tests the code as the author expects it
> to work. There are very few tests that evaluate features from the
> end-user API perspective or as system tests.
>
> Writing new tests is slow, and the developer experience is poor with
> the current test infrastructure. Adding more tests to the main build
> would slow down Pulsar CI even more. This isn't a new problem; it's
> been around for many years. I'd love to see more proposals and active
> contributions to improve the "safety nets" of Apache Pulsar so that we
> wouldn't fear change. I'm not saying that this is only a testing
> problem. Testability impacts architecture too. Balancing all different
> aspects of the system isn't easy, and it requires effort and
> dedication. We don't currently have enough contributors who are
> investing their time in enabling others to contribute effectively. I
> hope that we can improve together and address the problems we have
> that cause the fear of change. When that is addressed, there would be
> more confidence in accepting new PIPs and changes even when the
> reviewer doesn't have the use case or when they aren't familiar with
> the problem that the PIP is targeting to solve.
>
> -Lari
>

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

Reply via email to