benwtrent commented on PR #15621: URL: https://github.com/apache/lucene/pull/15621#issuecomment-3810900302
> They'll always return 0 values for dot product and Max inner product, and search won't really be able to differentiate based on those similarity scores. Technically, this is the same exact problem that vectors have in general. Two different vectors can return the same scores. I don't think this is a good reason. Additionally, "same scores" is possible even in term based search. > It will silently affect scores, graph geometry and result set, which feels trappy? I am not sure it will do so silently, again, users do all sorts of things to test, and it seems to me plausible for there to be zero vectors. > Are there meaningful scenarios where we want to allow zero vectors? I don't know. But `zero` vectors feel the same as any "uniform" component vector (e.g. all `1` or all `1.23`). Preventing this in cosine only keeps the system sane. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
