[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

Joel Bernstein (JIRA) Fri, 08 Jan 2016 17:26:52 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090316#comment-15090316
 ]


Joel Bernstein commented on SOLR-8530:
--------------------------------------

I think it makes sense to have two implementations:

*MatchStream*: Uses an in-memory index to match Tuples.
*HavingStream*: Uses a ComparisionOperation to match Tuples.

One of the things we can think over is a specific stream for doing *parallel 
alerting*. The MatchStream is step in that direction.

> Add HavingStream to Streaming API and StreamingExpressions
> ----------------------------------------------------------
>
>                 Key: SOLR-8530
>                 URL: https://issues.apache.org/jira/browse/SOLR-8530
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrJ
>    Affects Versions: Trunk
>            Reporter: Dennis Gove
>            Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce(....) based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
>     search(.....),
>     sum(cost),
>     on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

Reply via email to