[
https://issues.apache.org/jira/browse/HDDS-14967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079945#comment-18079945
]
Krishna Kumar Asawa commented on HDDS-14967:
--------------------------------------------
cc [~ashishkr][~schintalapati]
> S3 Vector Support
> -----------------
>
> Key: HDDS-14967
> URL: https://issues.apache.org/jira/browse/HDDS-14967
> Project: Apache Ozone
> Issue Type: New Feature
> Components: OM, s3gateway, SCM
> Reporter: Chu Cheng Li
> Assignee: Chu Cheng Li
> Priority: Major
>
> h2. Background
> Amazon S3 Vectors introduces a new S3 API family for vector buckets,
> vector indexes, vector upserts, deletes, listing, and approximate nearest
> neighbor queries. AWS positions it as a low-cost, durable vector storage
> service with strong write consistency and sub-second query latency for
> infrequent queries.
> That API surface is a good match for Apache Ozone:
> - Ozone already has a stateless `s3gateway` tier that can scale out
> horizontally.
> - Ozone Manager (OM) already provides a strongly consistent metadata plane
> backed by Raft and RocksDB.
> - SCM and DataNodes already provide a durable distributed block layer for
> large immutable artifacts.
> The public architecture discussions around object-storage-native vector
> systems are also instructive:
> - AWS documents S3 Vectors as strongly consistent and built around vector
> buckets, vector indexes, float32 vectors, `cosine` / `euclidean`
> distance, and metadata filtering.
> - turbopuffer documents a stateless compute layer, an object-storage WAL,
> NVMe / memory cache, and a split between recent unindexed data and
> asynchronously indexed data.
> - The [SPFresh paper|https://arxiv.org/abs/2410.14452] shows that
> incremental updates on a centroid-based ANN index can avoid full
> rebuilds.
> - The OpenData Vector RFCs describe a practical SPANN-style storage model
> with centroids in memory, posting lists on disk, exact metadata indexes,
> delete bitmaps, and background split / merge / reassign maintenance.
> Ozone can combine these ideas in a way that fits its own architecture:
> - Keep the online compute, cache, query planning, and SPFresh build work in
> s3gateway.
> - Use OM only for coordination, durability, and cross-gateway visibility.
> - Use SCM and DataNodes only for the immutable flushed and compacted
> vector-storage artifacts.
> This should let Ozone provide an S3 Vectors-compatible API while improving
> on two areas that are especially important for production systems:
> - stronger read-your-own-write and query-session visibility across multiple
> gateways
> - exact metadata filtering over both recent inline data and flushed data
> h2. Goals
> - Support the core Amazon S3 Vectors resource model:
> -- vector buckets
> -- vector indexes
> -- _*PutVectors*_
> -- _*DeleteVectors*_
> -- _*GetVectors*_
> -- _*ListVectors*_
> -- _*QueryVectors*_
> - Keep the hot write and query path gateway-centric.
> - Use OM RocksDB as the durable inline write layer for recent updates.
> - Use OM Raft log and RocksDB WAL as the durability mechanism for the
> inline path.
> - Flush and compact inline data into immutable artifacts stored on Ozone's
> distributed block layer.
> - Use SPFresh as the on-disk ANN index for flushed data.
> - Support memory and local-NVMe cache in `s3gateway`.
> - Support query-session visibility across multiple gateways without
> requiring gateway affinity.
> - Support union reads from:
> -- visible inline data in OM
> -- visible flushed data in the distributed block layer
> - Preserve strong default semantics for write visibility and listing.
> - Leave room for an eventual-consistency mode as an Ozone extension for
> lower-latency warm queries.
> h2. Non-Goals
> - DataNode-native vector indexing or vector-aware SCM scheduling in the
> first phase.
> - Hybrid BM25 + vector search in the first phase.
> - Cross-index transactions.
> - Full server-side SQL-style query planning.
> - Rebuilding SCM or DN storage formats specifically for vectors in the
> first phase.
> - Requiring a dedicated indexing service outside of `s3gateway`.
> h2. Use Cases
> h3. AWS S3 Vectors Compatibility
> - Use the `s3vectors` API family with SigV4 authentication.
> - Create vector buckets and indexes with the same high-level semantics as
> AWS.
> - Upsert and delete vectors using float32 embeddings.
> - Query vectors by ANN search with filterable and non-filterable metadata.
> h3. Read-Your-Own-Write Across Multiple Gateways
> - A client writes vectors through gateway A.
> - A follow-up query lands on gateway B.
> - The query must still see the write without waiting for background flush.
> h3. Repeatable Query Sessions
> - A client performs paginated `ListVectors`.
> - A client performs multiple `QueryVectors` calls in one logical search
> session.
> - Each call should be able to reuse a stable snapshot token so that the
> visible state does not move underneath the client.
> h3. High-Rate Ingest Without DN Allocation on Every Write
> - Small and medium upserts should become durable after OM Raft commit,
> without paying block allocation and DN replication on each request.
> - Background flush should amortize the cost of block-layer writes.
> h3. Exact Metadata Filtering
> - Filterable metadata should be indexed exactly.
> - Queries should not rely on post-filtering only.
> - Filters should work across both recent inline data and flushed data.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]