Re: [PR] SOLR-17780: Add support for scalar quantized dense vectors [solr]

via GitHub Mon, 04 Aug 2025 14:46:49 -0700


liangkaiwen commented on code in PR #3385:
URL: https://github.com/apache/solr/pull/3385#discussion_r2252640594



##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -240,6 +240,66 @@ client.add(Arrays.asList(d1, d2));
 ====
 ======
 
+=== ScalarQuantizedDenseVectorField
+Because dense vectors can have a costly storage footprint, it may be 
worthwhile to use a technique called "quantization"
+to reduce the stored representation size at the cost of some precision.
+
+This dense vector type uses a conversion that projects a 32 bit float 
precision feature down to an 8 bit int (or smaller)
+by linearly mapping the float range down to evenly sized "buckets" of values 
that fit into an int. A more detailed explanation
+can be found in this 
https://www.elastic.co/search-labs/blog/scalar-quantization-101[blog post].
+
+As a specific type of DenseVectorField, this field type supports all the same 
configurable properties outlined above as well
+as some additional ones.
+
+Here is how a ScalarQuantizedDenseVectorField can be defined in the schema:
+
+[source,xml]
+<fieldType name="scalar_quantized_vector" 
class="solr.ScalarQuantizedDenseVectorField" vectorDimension="4" 
similarityFunction="cosine"/>
+<field name="vector" type="scalar_quantized_vector" indexed="true" 
stored="true"/>
+
+`bits`::
++
+[%autowidth,frame=none]
+|===
+s|Optional |Default: `7`
+|===
++
+The number of bits to use for each quantized dimension value
++
+Accepted values: 4 (half byte) or 7 (unsigned byte).
+
+`confidenceInterval`::
++
+[%autowidth,frame=none]
+|===
+s|Optional |Default: `dimension-scaled`
+|===
++
+Statistically, outlier values are rarely meaningfully relevant to searches, so 
to increase the size of each bucket for
+quantization (and therefore information gain) we can scale the quantization 
intervals to the middle n % of values and place the remaining
+outliers in the outermost intervals.
++
+For example: 0.9 means scale interval sizes to the middle 90% of values
++
+If this param is omitted a default is used; scaled to the number of dimensions 
according to `1-1/(vector_dimensions + 1)`
++
+If `0` is provided, confidence intervals will be dynamically adjusted (at 
segment merge time) optimized by sampling values
++
+Accepted values: `FLOAT32`  (within 0.9 and 1.0) or 0 for dynamically adjusted 
confidence interval

Review Comment:
   This accepted values section was copied and I forgot to update. For this 
param, it's simply a bool



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17780: Add support for scalar quantized dense vectors [solr]

Reply via email to