cpoerschke commented on code in PR #935:
URL: https://github.com/apache/solr/pull/935#discussion_r925844560
##########
solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/StreamDecoratorTest.java:
##########
@@ -4370,6 +4370,17 @@ public void testClassifyStream() throws Exception {
updateRequest.add(id, String.valueOf(3), "text_s", "a b e e f");
updateRequest.commit(cluster.getSolrClient(), "uknownCollection");
+ expr =
+ "classify("
+ +
+ // use cacheMillis=0 to prevent cached results. it doesn't matter
on the first run,
+ // but we want to ensure that when we re-use this expression later
after
+ // training another model, we'll still get accurate results.
+ "model(modelCollection, id=\"model\", cacheMillis=0),"
+ + "topic(checkpointCollection, uknownCollection, q=\"*:*\",
fl=\"text_s, id\", id=\"1000000\"),"
+ + "field=\"text_s\","
+ + "analyzerField=\"tv_text\")";
Review Comment:
So this is the same expression as above but without the
`initialCheckpoint=0` ... though reading the (current) docs that means _"the
highest version in the index"_ though if the highest version in the index was
used then the first batch in the stream below would not include the documents
just added with ids 2 and 3?
Wondering if the documentation needs tweaking to account for persisted
checkpoints?
https://github.com/apache/solr/blob/releases/solr/9.0.0/solr/solr-ref-guide/modules/query-guide/pages/stream-source-reference.adoc#topic-parameters
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]