[ https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15792021#comment-15792021 ]
Joel Bernstein commented on SOLR-8530: -------------------------------------- I returned the HavingStream as part of SOLR-8593. What I found during the implementation is that both implementations described in this ticket are compatible in the same HavingStream implementation. What [~dpgove] originally described was indexing a document on the fly and the using a Lucene/Solr query to implement the boolean logic. What I described is implementing the boolean logic as stream operations that would handle typical SQL Having comparisons (=, <, >, <>, >=, <=). I have implemented the HavingStream I described as part of SOLR-8593 with syntax that looks like this: {code} having(expr, booleanOp) {code} Where booleanOp is a new type of operation that returns *TRUE* or *FALSE* for each tuple. The basic boolean operations have been implemented, such as: {code} having(expr, and(gt(field1, 5), lt(field1, 10))) {code} This would emit tuples from the underlying expr where field1 is greater the 5 and less then 10. To implement what [~dpgove] had in mind, we can add a new boolean operation called *match*. The match operation will index the tuple in a in-memory index and the match a Lucene/Solr query against it. Here is the sample syntax: {code} having(expr, match("field1:[5 TO 10]")) {code} The match boolean operation could then be intermingled with other boolean operations, for example: {code} having(expr, and(gt(field2, 8), match("body:(hello world)"))) {code} Depending on the progress of the SOLR-8593, I may strip out the HavingStream implementation and commit it with this ticket, so it can be ready for Solr 6.4. > Add HavingStream to Streaming API and StreamingExpressions > ---------------------------------------------------------- > > Key: SOLR-8530 > URL: https://issues.apache.org/jira/browse/SOLR-8530 > Project: Solr > Issue Type: Improvement > Components: SolrJ > Affects Versions: 6.0 > Reporter: Dennis Gove > Priority: Minor > > The goal here is to support something similar to SQL's HAVING clause where > one can filter documents based on data that is not available in the index. > For example, filter the output of a reduce(....) based on the calculated > metrics. > {code} > having( > reduce( > search(.....), > sum(cost), > on=customerId > ), > q="sum(cost):[500 TO *]" > ) > {code} > This example would return all where the total spent by each distinct customer > is >= 500. The total spent is calculated via the sum(cost) metric in the > reduce stream. > The intent is to support as the filters in the having(...) clause the full > query syntax of a search(...) clause. I see this being possible in one of two > ways. > 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying > stream creating an instance of MemoryIndex and apply the query to it. If the > result of that is >0 then the tuple should be returned from the HavingStream. > 2. Create an in-memory solr index via something like RamDirectory, read all > tuples into that in-memory index using the UpdateStream, and then stream out > of that all the matching tuples from the query. > There are benefits to each approach but I think the easiest and most direct > one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read > all incoming tuples before returning a single tuple. With a MemoryIndex there > is a need to parse the solr query parameters and create a valid Lucene query > but I suspect that can be done using existing QParser implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org