[
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090316#comment-15090316
]
Joel Bernstein commented on SOLR-8530:
--------------------------------------
I think it makes sense to have two implementations:
*MatchStream*: Uses an in-memory index to match Tuples.
*HavingStream*: Uses a ComparisionOperation to match Tuples.
One of the things we can think over is a specific stream for doing *parallel
alerting*. The MatchStream is step in that direction.
> Add HavingStream to Streaming API and StreamingExpressions
> ----------------------------------------------------------
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
> Issue Type: Improvement
> Components: SolrJ
> Affects Versions: Trunk
> Reporter: Dennis Gove
> Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where
> one can filter documents based on data that is not available in the index.
> For example, filter the output of a reduce(....) based on the calculated
> metrics.
> {code}
> having(
> reduce(
> search(.....),
> sum(cost),
> on=customerId
> ),
> q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer
> is >= 500. The total spent is calculated via the sum(cost) metric in the
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full
> query syntax of a search(...) clause. I see this being possible in one of two
> ways.
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying
> stream creating an instance of MemoryIndex and apply the query to it. If the
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all
> tuples into that in-memory index using the UpdateStream, and then stream out
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read
> all incoming tuples before returning a single tuple. With a MemoryIndex there
> is a need to parse the solr query parameters and create a valid Lucene query
> but I suspect that can be done using existing QParser implementations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]