[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

Dennis Gove (JIRA) Fri, 08 Jan 2016 15:57:49 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090228#comment-15090228
 ]


Dennis Gove commented on SOLR-8530:
-----------------------------------

This is another good option. 

My thinking for using an index is three-fold. First, a desire to not ask users 
to learn yet another way to do comparisons. If they already know the Solr 
syntax they can use that directly in this stream. And second to support even 
the non simple comparisons without having to implement them. For example a date 
range filter. This assumes that at some point we'll support metrics over dates 
but I think that's a reasonable assumption. And third, given the JDBCStream 
this provides a way for someone to do textual based queries over a subset of 
documents out of a join of Solr and non-Solr supplied documents. Obviously one 
could do a textual search over the Solr supplied stream directly but that may 
not be possible over the JDBC supplied stream.

That said, I'm not adverse to a ComparisonOperation. I just feel that a full 
index support gives us a lot of power going forward.

> Add HavingStream to Streaming API and StreamingExpressions
> ----------------------------------------------------------
>
>                 Key: SOLR-8530
>                 URL: https://issues.apache.org/jira/browse/SOLR-8530
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrJ
>    Affects Versions: Trunk
>            Reporter: Dennis Gove
>            Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce(....) based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
>     search(.....),
>     sum(cost),
>     on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

Reply via email to