Thank you for the quick replies. I can see how it would be powerful to be able to execute streaming expressions outside of solr, giving yourself the option of moving some of the work to the client. I wouldn't necessarily tie it into core because being able to join a solr stream with a rdbms result -- either within solr, or in your driver program -- that could be a nice set of options to have. But the patch on SOLR-1015 seems to get this right in (it seems from a quick look) that it uses the core's classloader when it is available, and falls back when it is not. It might be nice -- especially as the streaming code base grows -- to consider packaging it separately from the solrj client itself.
Along these lines: I was initially confused by the examples in https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions in that the cURL example at the top is materially different from the SolrJ example following it. That is, with the cURL example, all of the work occurs in Solr and only the final result is streamed back. With the SolrJ example, some of that work is now being done in the client. This is easy to discover if you try the JDBC expression: following the cURL example, the query originates in Solr ; on the SolrJ example, the query originates on the client -- the server has no involvement at all. Is my understanding here correct? I can see how this design has great advantage as it gives us the ability to write driver programs that use the solr cores as worker nodes. But this wasn't immediately clear to me. I also wonder: do we have an (easy) way with SolrJ currently to simply execute a (chain of) streaming expression(s) and get the result back, like in the cURL example (besides using JDBC)? James Dyer Ingram Content Group From: Joel Bernstein [mailto:[email protected]] Sent: Tuesday, April 25, 2017 6:25 PM To: lucene dev <[email protected]> Subject: Re: JDBCStream and loading drivers There are a few stream impl's that have access to SolrCore (ClassifyStream, AnalyzeEvaluator) because they use analyzers. These classes have been added to core. We could move the JdbcStream to core as well if it makes the user experience nicer. Originally the idea was that you could run the Streaming API Java classes like you would other Solrj clients. I think over time this may become important again, as I believe there is work underway for spinning up worker nodes that are not attached to a SolrCore. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Apr 25, 2017 at 3:25 PM, Dyer, James <[email protected]<mailto:[email protected]>> wrote: Using JDBCStream, Solr cannot find my database driver if I put the .jar in the shared lib directory ($SOLR_HOME/lib). In order for the classloader to find it, the driver has to be in the server's lib directory. Looking at why, I see that to get the full classpath, including what is in the shared lib directory, we'd typically get a reference to a SolrCore, call "getResourceLoader" and then "findClass". This makes use of the URLClassLoader that knows about the shared lib. But fixing JDBCStream to do this might not be so easy? Best I can tell, Streaming Expressions are written nearly stand-alone as client code that merely executes in the Solr JVM. Is this correct? Indeed, the code itself is included with the client, in the SolrJ package, despite it mostly being server-side code … Maybe I misunderstand? On the one hand, it isn't a huge deal as to where you need to put your drivers to make this work. But on the other hand, it isn't really the best user experience, in my opinion at least, to have to dig around the server directories to find where your driver needs to go. And also, if this is truly server-side code, why do we ship it with the client jar? Unless there is a desire to make a stand-alone Streaming Expression engine that interacts with Solr as a client, would it be acceptable to somehow expose the SolrCore to it for loading resources like this? James Dyer Ingram Content Group
