[
https://issues.apache.org/jira/browse/SOLR-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ishan Chattopadhyaya updated SOLR-10159:
----------------------------------------
Description:
h2. Background/History
If a recently updated (in-place) value is used for DBQ, the DBQ doesn't work at
Lucene level, unless there's an explicit commit between the update and the DBQ,
due to LUCENE-7344. To work around this, Yonik suggested that we use
ulog.openRealtimeSearcher() just before the DBQ is performed. This worked fine.
Example:
{code}
ADD: [id=0, dv=200, title="mytitle", \_version\_=100]
UPD: [id=0, dv=300, \_version\_=200]
DBQ: q="dv:300"}}, \_version\_=300
{code}
h2. Problem discovered now
Suppose, in the above example, the last two commands are reordered at the
replica. What would happen is: \(i\) the full document (\_version\_ 100) is
received and indexed, (ii) the DBQ is received (out of ordered) and applied,
and no document is deleted \[so far so good\] and this DBQ is buffered in
ulog.deleteByQueries map, (iii) the in-place update arrives (_version 200), it
is applied to the document that was added in step i. After that, the buffered
DBQ is applied (at DUH2.addAndDelete()). This buffered DBQ, based on a value
updated immediately before (step ii), fails to delete the document.
h2. What happens exactly?
The initial DBQ query is {{"dv:300"}}, but when it is applied, it is expanded
to {{"\+dv:\[300 TO 300\] -ConstantScore(frange(long(\_version\_)):\[300 TO
*\])"}}. In spite of doing a ulog.openRealtimeSearcher() just before the DBQ,
it doesn't work.
A different version of the query, i.e. {{"\+dv:\[300 TO 300\]
\+\_version\_:\[200 TO 200\]"}} also doesn't work. As I found out, *this
happened due to the presence of two clauses*! {{"\+dv:\[300 TO 300\]"}} works,
and so does {{"\+\_version\_:\[200 TO 200\]"}}, but both clauses don't work
together. Also, surprisingly, even {{"\+dv:\[300 TO 300\] \+dv:\[300 TO
300\]"}} doesn't work (same clause repeated).
h2. Investigation at Lucene level
Upon some tedious investigation into the internals of Lucene, I discovered that
if I change the internal search (at BufferedUpdates) to use Sort.RELEVANCE
instead of Sort.INDEXORDER (which, I think is the default when using
weight/scorer), the DBQ is applied correctly.
was:
h2. Background/History
If a recently updated (in-place) value is used for DBQ, the DBQ doesn't work at
Lucene level, unless there's an explicit commit between the update and the DBQ,
due to LUCENE-7344. To work around this, Yonik suggested that we use
ulog.openRealtimeSearcher() just before the DBQ is performed. This worked fine.
Example:
{code}
ADD: [id=0, dv=200, title="mytitle", \_version\_=100]
UPD: [id=0, dv=300, \_version\_=200]
DBQ: q="dv:300"}}, \_version\_=300
{code}
h2. Problem discovered now
Suppose, in the above example, the last two commands are reordered at the
replica. What would happen is: \(i\) the full document (\_version\_ 100) is
received and indexed, (ii) the DBQ is received (out of ordered) and applied,
and no document is deleted \[so far so good\] and this DBQ is buffered in
ulog.deleteByQueries map, (iii) the in-place update arrives (_version 200), it
is applied to the document that was added in step i. After that, the buffered
DBQ is applied (at DUH2.addAndDelete()). This buffered DBQ, based on a value
updated immediately before (step ii), fails to delete the document.
h2. What happens exactly?
The initial DBQ query is {{"dv:300"}}, but when it is applied, it is expanded
to {{"\+dv:\[300 TO 300\] -ConstantScore(frange(long(\_version\_)):\[300 TO
*\])"}}. In spite of doing a ulog.openRealtimeSearcher() just before the DBQ,
it doesn't work.
A different version of the query, i.e. {{"\+dv:\[300 TO 300\]
\+\_version\_:\[200 TO 200\]"}} also doesn't work. As I found out, this
happened due to the presence of two clauses! {{"\+dv:\[300 TO 300\]"}} works,
and so does {{"\+\_version\_:\[200 TO 200\]"}}, but both clauses don't work
together. Also, surprisingly, even {{"\+dv:\[300 TO 300\] \+dv:\[300 TO
300\]"}} doesn't work (same clause repeated).
h2. Investigation at Lucene level
Upon some tedious investigation into the internals of Lucene, I discovered that
if I change the internal search (at BufferedUpdates) to use Sort.RELEVANCE
instead of Sort.INDEXORDER (which, I think is the default when using
weight/scorer), the DBQ is applied correctly.
> DBQ, where query is based on updated value, reordered with the update doesn't
> work
> ----------------------------------------------------------------------------------
>
> Key: SOLR-10159
> URL: https://issues.apache.org/jira/browse/SOLR-10159
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-10159.patch
>
>
> h2. Background/History
> If a recently updated (in-place) value is used for DBQ, the DBQ doesn't work
> at Lucene level, unless there's an explicit commit between the update and the
> DBQ, due to LUCENE-7344. To work around this, Yonik suggested that we use
> ulog.openRealtimeSearcher() just before the DBQ is performed. This worked
> fine.
> Example:
> {code}
> ADD: [id=0, dv=200, title="mytitle", \_version\_=100]
> UPD: [id=0, dv=300, \_version\_=200]
> DBQ: q="dv:300"}}, \_version\_=300
> {code}
> h2. Problem discovered now
> Suppose, in the above example, the last two commands are reordered at the
> replica. What would happen is: \(i\) the full document (\_version\_ 100) is
> received and indexed, (ii) the DBQ is received (out of ordered) and applied,
> and no document is deleted \[so far so good\] and this DBQ is buffered in
> ulog.deleteByQueries map, (iii) the in-place update arrives (_version 200),
> it is applied to the document that was added in step i. After that, the
> buffered DBQ is applied (at DUH2.addAndDelete()). This buffered DBQ, based on
> a value updated immediately before (step ii), fails to delete the document.
> h2. What happens exactly?
> The initial DBQ query is {{"dv:300"}}, but when it is applied, it is expanded
> to {{"\+dv:\[300 TO 300\] -ConstantScore(frange(long(\_version\_)):\[300 TO
> *\])"}}. In spite of doing a ulog.openRealtimeSearcher() just before the DBQ,
> it doesn't work.
> A different version of the query, i.e. {{"\+dv:\[300 TO 300\]
> \+\_version\_:\[200 TO 200\]"}} also doesn't work. As I found out, *this
> happened due to the presence of two clauses*! {{"\+dv:\[300 TO 300\]"}}
> works, and so does {{"\+\_version\_:\[200 TO 200\]"}}, but both clauses don't
> work together. Also, surprisingly, even {{"\+dv:\[300 TO 300\] \+dv:\[300 TO
> 300\]"}} doesn't work (same clause repeated).
> h2. Investigation at Lucene level
> Upon some tedious investigation into the internals of Lucene, I discovered
> that if I change the internal search (at BufferedUpdates) to use
> Sort.RELEVANCE instead of Sort.INDEXORDER (which, I think is the default when
> using weight/scorer), the DBQ is applied correctly.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]