This is an interesting discussion. There is are really nice way to build up an expression from inside a client. Take a look at the toExpression() method in every TupleStream impl and you'll see how to build a StreamExpression. A StreamExpression is an intermediate format that can either become a live TupleStream or the String expression.
One of the key features coming very soon is a REPL client, that will support interactively building up expressions. This was made possible by the introduction of variables in Solr 6.6 described here: ( http://joelsolr.blogspot.com/2017/05/exploring-solrs-new-time-series-and.html ). The REPL client will be a java command line client that lives outside Solr and can connect to any SolrCloud via it's ZooKeeper URL. The REPL client will also ship with Solrj. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 26, 2017 at 10:55 AM, Dyer, James <[email protected]> wrote: > Thank you for the quick replies. I can see how it would be powerful to be > able to execute streaming expressions outside of solr, giving yourself the > option of moving some of the work to the client. I wouldn't necessarily > tie it into core because being able to join a solr stream with a rdbms > result -- either within solr, or in your driver program -- that could be a > nice set of options to have. But the patch on SOLR-1015 seems to get this > right in (it seems from a quick look) that it uses the core's classloader > when it is available, and falls back when it is not. It might be nice -- > especially as the streaming code base grows -- to consider packaging it > separately from the solrj client itself. > > > > Along these lines: I was initially confused by the examples in > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions in > that the cURL example at the top is materially different from the SolrJ > example following it. That is, with the cURL example, all of the work > occurs in Solr and only the final result is streamed back. With the SolrJ > example, some of that work is now being done in the client. This is easy > to discover if you try the JDBC expression: following the cURL example, > the query originates in Solr ; on the SolrJ example, the query originates > on the client -- the server has no involvement at all. > > > > Is my understanding here correct? I can see how this design has great > advantage as it gives us the ability to write driver programs that use the > solr cores as worker nodes. But this wasn't immediately clear to me. I > also wonder: do we have an (easy) way with SolrJ currently to simply > execute a (chain of) streaming expression(s) and get the result back, like > in the cURL example (besides using JDBC)? > > > > *James Dyer* > > Ingram Content Group > > > > *From:* Joel Bernstein [mailto:[email protected]] > *Sent:* Tuesday, April 25, 2017 6:25 PM > *To:* lucene dev <[email protected]> > *Subject:* Re: JDBCStream and loading drivers > > > > There are a few stream impl's that have access to SolrCore > (ClassifyStream, AnalyzeEvaluator) because they use analyzers. These > classes have been added to core. We could move the JdbcStream to core as > well if it makes the user experience nicer. > > > > Originally the idea was that you could run the Streaming API Java classes > like you would other Solrj clients. I think over time this may become > important again, as I believe there is work underway for spinning up worker > nodes that are not attached to a SolrCore. > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Tue, Apr 25, 2017 at 3:25 PM, Dyer, James <[email protected]> > wrote: > > Using JDBCStream, Solr cannot find my database driver if I put the .jar in > the shared lib directory ($SOLR_HOME/lib). In order for the classloader to > find it, the driver has to be in the server's lib directory. Looking at > why, I see that to get the full classpath, including what is in the shared > lib directory, we'd typically get a reference to a SolrCore, call > "getResourceLoader" and then "findClass". This makes use of the > URLClassLoader that knows about the shared lib. > > > > But fixing JDBCStream to do this might not be so easy? Best I can tell, > Streaming Expressions are written nearly stand-alone as client code that > merely executes in the Solr JVM. Is this correct? Indeed, the code itself > is included with the client, in the SolrJ package, despite it mostly being > server-side code … Maybe I misunderstand? > > > > On the one hand, it isn't a huge deal as to where you need to put your > drivers to make this work. But on the other hand, it isn't really the best > user experience, in my opinion at least, to have to dig around the server > directories to find where your driver needs to go. And also, if this is > truly server-side code, why do we ship it with the client jar? Unless > there is a desire to make a stand-alone Streaming Expression engine that > interacts with Solr as a client, would it be acceptable to somehow expose > the SolrCore to it for loading resources like this? > > > > *James Dyer* > > Ingram Content Group > > > > > > >
