[jira] [Commented] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

Jason Gerlowski (JIRA) Thu, 17 Dec 2015 20:15:07 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063407#comment-15063407
 ]


Jason Gerlowski commented on SOLR-7525:
---------------------------------------

Hi all.

I wanted to take a stab at adding the missing parallel tests that Joel
alluded to in his most recent comment.

When I went to pull it down though, I realized that this patch no longer
applies cleanly on top of the recent changes to ReduceOperation/ReducerStream.

To main highlights of the recent ReducerStream changes are:
  1.) ReducerStream now requires a ReducerOperation.
  2.) (Currently), the only ReducerOperation implementation is 
{{GroupOperation}}
  3.) {{GroupOperation}} requires a {{StreamComparator}}, and an int 'size'.  
The 
      size is used to limit the number of tuples to hold on to in each grouping.
      When the upper bound is reached, the least tuple is dropped (according to 
the 
      comparator).
  4.) The only {{StreamComparator}} implementations are {{FieldComparator}}, 
and 
      {{MultiFieldComparator}}, both of which require a field name.

The net effect of these changes is that IntersectStream and ComplementStream 
need
a field name at creation time (because they rely on ReducerStream, which relies 
on
ReducerOperation, which...).

As I see it, {{IntersectStream}} and {{ComplementStream}} shouldn't need
this chain of objects.  AFAICT, since their job is to do logical operations,
it'd be wrong for their internal {{ReducerStream}} to drop tuples based on an
arbitrary limit.  And since we don't want to drop tuples, there's no need for a
StreamComparator either.

Two resolutions come to mind here:
  1.) Modify GroupOperation so that the 'size' (and comparator) can be optional.
  2.) Create a no-op StreamComparator, or one that always returns "equal", to 
pass
      into the existing GroupOperation.

I'm leaning towards the first option.  It seems more generally useful, and 
creating
a no-op class seems like a bit of a hack.

Anyone have opinions/thoughts on this?  Have I missed something obvious/simple 
here,
or misread the code entirely?  Is there another option to resolve this conflict 
that
I missed?

In any case, just wanted to get some feedback on the best way to resolve this 
change
before I move onto actually adding the new tests.


> Add ComplementStream to the Streaming API and Streaming Expressions
> -------------------------------------------------------------------
>
>                 Key: SOLR-7525
>                 URL: https://issues.apache.org/jira/browse/SOLR-7525
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrJ
>            Reporter: Joel Bernstein
>            Priority: Minor
>         Attachments: SOLR-7525.patch
>
>
> This ticket adds a ComplementStream to the Streaming API and Streaming 
> Expression language.
> The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit 
> Tuples from StreamA that are not in StreamB.
> Streaming API Syntax:
> {code}
> ComplementStream cstream = new ComplementStream(streamA, streamB, comp);
> {code}
> Streaming Expression syntax:
> {code}
> complement(search(...), search(...), on(...))
> {code}
> Internal implementation will rely on the ReducerStream. The ComplementStream 
> can be parallelized using the ParallelStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

Reply via email to