Hi all,

I’d like to restart the object-storage-mock discussion.
The PR discussion has gone in a few directions, and I think we should
decide the test-infra question explicitly.

A quick recap of where we are:

- The earlier `[DISCUSS] Object store functionality` [1] thread was about
the
  broader object-storage-ops and purge-table work.
- In review of that broader work [3], there was concern about depending on
Nessie
  test artifacts directly.
- So the test utilities were split out into a separate PR [4].
- Review of that split-out PR then raised the other question: should Polaris
  accept and maintain that copied code, or should we use existing libraries
such
  as Adobe S3Mock instead?
- The current `object-storage-mock` PR [2] is narrower than both earlier
PRs. It
  is only about the object-storage mock test utility.

So the question here is not whether to approve the full object-storage-ops
work.
The question is what test infrastructure Polaris wants for object-store
behavior.

For the object-storage-ops and purge-table work, we need tests that go
through real SDK/FileIO HTTP interactions,
but where the test can still control and check object-store behavior
precisely.
For example: generated objects, synthetic listings, metadata, conditional
responses, intercepted writes/deletes, and targeted failures.

A filesystem fixture, a Map-backed fixture, or a normal local S3 emulator
are all useful for other tests, but they do not give that level of
operation-level control.

Adobe S3Mock is useful when a test needs a local S3-compatible service.
The object-storage-mock is different: it exposes selected S3/GCS/ADLS/STS
protocol surfaces while letting the test define bucket behavior per
operation.
That is what lets the current object-storage-ops and purge-table tests
validate real client interactions without depending on cloud services.

Across the reviews, two reasonable concerns came up:

- avoiding a Nessie test dependency;
- avoiding unnecessary copied code.

However, we need to choose a path, because the object-storage-ops and
purge-table work depend on this level of testing.

I see at least these options:

1. accept `object-storage-mock` into Polaris as test-only infrastructure,
   subject to the normal ASF provenance/license checks
2. use the Nessie test artifacts directly
3. identify existing libraries that satisfy the same requirements
4. defer the object-storage-ops / purge-table work that depends on this
testing
   until the test-infra question is resolved.

My preference is option 1: keep it test-only, limit it to protocol behavior
needed by Polaris tests, and require future protocol additions to come with
concrete Polaris test cases.

If option 1 or 2 is not acceptable, then option 3 needs to name the
specific library or combination of libraries and check it against the
requirements above.
If there is another path, I would like to understand it.
Otherwise we are effectively choosing option 4 for the work that depends on
these tests.

Robert

[1] https://lists.apache.org/thread/0z8nb3w58zb9s617gsoyhzlnz53rt9zx
([DISCUSS] Object store functionality)
[2] https://github.com/apache/polaris/pull/4570 (Add object-storage-mock
test utility)
[3] https://github.com/apache/polaris/pull/3256 (Object store functionality)
[4] https://github.com/apache/polaris/pull/3513 (Test libraries for storage
operations, closed)

Reply via email to