Hi all, I’d like to restart the object-storage-mock discussion. The PR discussion has gone in a few directions, and I think we should decide the test-infra question explicitly.
A quick recap of where we are: - The earlier `[DISCUSS] Object store functionality` [1] thread was about the broader object-storage-ops and purge-table work. - In review of that broader work [3], there was concern about depending on Nessie test artifacts directly. - So the test utilities were split out into a separate PR [4]. - Review of that split-out PR then raised the other question: should Polaris accept and maintain that copied code, or should we use existing libraries such as Adobe S3Mock instead? - The current `object-storage-mock` PR [2] is narrower than both earlier PRs. It is only about the object-storage mock test utility. So the question here is not whether to approve the full object-storage-ops work. The question is what test infrastructure Polaris wants for object-store behavior. For the object-storage-ops and purge-table work, we need tests that go through real SDK/FileIO HTTP interactions, but where the test can still control and check object-store behavior precisely. For example: generated objects, synthetic listings, metadata, conditional responses, intercepted writes/deletes, and targeted failures. A filesystem fixture, a Map-backed fixture, or a normal local S3 emulator are all useful for other tests, but they do not give that level of operation-level control. Adobe S3Mock is useful when a test needs a local S3-compatible service. The object-storage-mock is different: it exposes selected S3/GCS/ADLS/STS protocol surfaces while letting the test define bucket behavior per operation. That is what lets the current object-storage-ops and purge-table tests validate real client interactions without depending on cloud services. Across the reviews, two reasonable concerns came up: - avoiding a Nessie test dependency; - avoiding unnecessary copied code. However, we need to choose a path, because the object-storage-ops and purge-table work depend on this level of testing. I see at least these options: 1. accept `object-storage-mock` into Polaris as test-only infrastructure, subject to the normal ASF provenance/license checks 2. use the Nessie test artifacts directly 3. identify existing libraries that satisfy the same requirements 4. defer the object-storage-ops / purge-table work that depends on this testing until the test-infra question is resolved. My preference is option 1: keep it test-only, limit it to protocol behavior needed by Polaris tests, and require future protocol additions to come with concrete Polaris test cases. If option 1 or 2 is not acceptable, then option 3 needs to name the specific library or combination of libraries and check it against the requirements above. If there is another path, I would like to understand it. Otherwise we are effectively choosing option 4 for the work that depends on these tests. Robert [1] https://lists.apache.org/thread/0z8nb3w58zb9s617gsoyhzlnz53rt9zx ([DISCUSS] Object store functionality) [2] https://github.com/apache/polaris/pull/4570 (Add object-storage-mock test utility) [3] https://github.com/apache/polaris/pull/3256 (Object store functionality) [4] https://github.com/apache/polaris/pull/3513 (Test libraries for storage operations, closed)
