[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-7535:
----------------------------------
    Attachment: SOLR-7535.patch

After a bit of thought and a holiday break, I've got my first attempt at this 
ready for some feedback.

h5. Notes about this Patch
1.) No tests yet.  It does work (I tried it out manually), but it's getting 
close to the end of my night, and I wanted to get this out there on the off 
chance that someone has the time to take a look and give me some feedback 
before I sit back down to work on this again tomorrow evening.  But I am 
planning on adding tests to {{StreamExpressionTest}}, and 
{{StreamExpressionToExpessionTest}}.
2.) I didn't make any attempt to restrict the {{TupleStream}} implementations 
that {{UpdateStream}} can wrap.  Mainly because I didn't get around to it yet.  
But also because, IMO, there are use cases where a user wouldn't need to use a 
{{SelectStream}} (for example, if they're doing field filtering in their 
initial Solr query/search() expression).  Happy to change this in a subsequent 
patch.  Just wanted to see what people thought.
3.) I kept my original tuple-to-input-doc mapping in tact.  It's limited, but 
as Joel mentioned, will probably do the job for a first pass.

h5. Questions about Surrounding Code
These aren't necessarily related to this JIRA/patch, but working on this patch 
made me think of a few questions that I couldn't figure out answers to on my 
own.

1.) Many of the {{TupleStream}} implementations require a collection to be 
explicitly stated as the first argument (i.e. {{search(gettingstarted...)}}.  
However, the collection-name is already specified in the URL path (i.e. 
{{localhost:7574/solr/gettingstarted/stream?...}}).  Are these values ever 
allowed to be different?
2.) Many of the Stream Expressions are specified using a syntax that mixes 
named parameters (rows, sort, zkHost, etc.), and unnamed parameters 
('collection' is probably the most common).  Are there any guidelines/logic 
around which parameters are named, and which are unnamed?  If I'm creating a 
new TupleStream type (as we are here), are there any guidelines on what the 
expression interface should look like?


Thanks in advance if anyone can help clarify some of those things for me.  
Should be back online soon to revise this further. 

> Add UpdateStream to Streaming API and Streaming Expression
> ----------------------------------------------------------
>
>                 Key: SOLR-7535
>                 URL: https://issues.apache.org/jira/browse/SOLR-7535
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, SolrJ
>            Reporter: Joel Bernstein
>            Priority: Minor
>         Attachments: SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to