Alex created SOLR-16977:
---------------------------
Summary: DenseVector queries fail with unclear error message when
either query or document is an all 0’s vector
Key: SOLR-16977
URL: https://issues.apache.org/jira/browse/SOLR-16977
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: query
Affects Versions: 9.2.1
Environment: Solr 9.2.1
Reporter: Alex
Attachments: solr_vector_error_example-1.py
When doing a dense vector search, if the query passes in an all-zero vector
Solr will fail with uninformative error message.
e.g.
q=\{!knn f=vector topK=1}[0.0, 0.0, 0.0, 0.0]
Error Message from solr:
```
{'error': \{'msg': 'docID must be >= 0 and < maxDoc=2 (got docID=2147483647)',
'trace': 'java.lang.IllegalArgumentException: docID must be >= 0 and < maxDoc=2
(got docID=2147483647)\n\tat
org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:225)\n\tat
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:153)\n\tat
org.apache.solr.search.SolrDocumentFetcher.docNC(SolrDocumentFetcher.java:274)\n\tat
org.apache.solr.search.SolrDocumentFetcher.lambda$doc$0(SolrDocumentFetcher.java:258)\n\tat
org.apache.solr.search.CaffeineCache.computeAsync(CaffeineCache.java:234)\n\tat
org.apache.solr.search.CaffeineCache.computeIfAbsent(CaffeineCache.java:250)\n\tat
org.apache.solr.search.SolrDocumentFetcher.doc(SolrDocumentFetcher.java:258)\n\tat
org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.getSolrDoc(SolrDocumentFetcher.java:855)\n\tat
org.apache.solr.search.SolrDocumentFetcher.solrDoc(SolrDocumentFetcher.java:307)\n\tat
org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:94)\n\tat
org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:56)\n\tat
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:257)\n\tat
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:196)\n\tat
org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:47)\n\tat
org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:403)\n\tat
org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:311)\n\tat
org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:77)\n\tat
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:63)\n\tat
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:71)\n\tat
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:988)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:593)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:252)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:220)\n\tat
org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:257)\n\tat
org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:227)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)\n\tat
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210)\n\tat
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:527)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1570)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1383)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1543)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1305)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149)\n\tat
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:228)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:141)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:301)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)\n\tat
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:822)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:563)\n\tat
org.eclipse.jetty.server.HttpChannel.lambda$handle$0(HttpChannel.java:505)\n\tat
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:762)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:497)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)\n\tat
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:558)\n\tat
org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:379)\n\tat
org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:146)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)\n\tat
org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)\n\tat
org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:416)\n\tat
org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:385)\n\tat
org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:272)\n\tat
org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:140)\n\tat
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:934)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1078)\n\tat
java.base/java.lang.Thread.run(Thread.java:833)\n', 'code': 500}}
```
Similarly, even if query contains non-zero vector but a document contains an
all zero embedding vector similar error will occur. None of the test cases
[https://github.com/apache/solr/blob/bee3accb8a38b7420da0919ebacf05ef6060fc94/solr/core/src/test/org/apache/solr/search/neural/KnnQParserTest.java#L420]
have any examples with all zeros vector.
This is tested on solr 9.2.1 with a DenseVector field using cosine similarity.
Cosine similarity divides by the vector norm and in the case of an all 0s
vector that norm will be zero. Still, this issue of an all zero vector can
occur from underflow issues when generating the embedding vector.
Minimal example in python:
[^solr_vector_error_example-1.py]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]