Thank you for the quick replies.  I can see how it would be powerful to be able 
to execute streaming expressions outside of solr, giving yourself the option of 
moving some of the work to the client.  I wouldn't necessarily tie it into core 
because being able to join a solr stream with a rdbms result -- either within 
solr, or in your driver program -- that could be a nice set of options to have. 
 But the patch on SOLR-1015 seems to get this right in (it seems from a quick 
look) that it uses the core's classloader when it is available, and falls back 
when it is not.  It might be nice -- especially as the streaming code base 
grows -- to consider packaging it separately from the solrj client itself.

Along these lines:  I was initially confused by the examples in 
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions in that 
the cURL example at the top is materially different from the SolrJ example 
following it.  That is, with the cURL example, all of the work occurs in Solr 
and only the final result is streamed back.  With the SolrJ example, some of 
that work is now being done in the client.  This is easy to discover if you try 
the JDBC expression:  following the cURL example, the query originates in Solr 
; on the SolrJ example, the query originates on the client -- the server has no 
involvement at all.

Is my understanding here correct?  I can see how this design has great 
advantage as it gives us the ability to write driver programs that use the solr 
cores as worker nodes.  But this wasn't immediately clear to me.  I also 
wonder:  do we have an (easy) way with SolrJ currently to simply execute a 
(chain of) streaming expression(s) and get the result back, like in the cURL 
example (besides using JDBC)?

James Dyer
Ingram Content Group

From: Joel Bernstein [mailto:[email protected]]
Sent: Tuesday, April 25, 2017 6:25 PM
To: lucene dev <[email protected]>
Subject: Re: JDBCStream and loading drivers

There are a few stream impl's that have access to SolrCore (ClassifyStream, 
AnalyzeEvaluator) because they use analyzers. These classes have been added to 
core. We could move the JdbcStream to core as well if it makes the user 
experience nicer.

Originally the idea was that you could run the Streaming API Java classes like 
you would other Solrj clients. I think over time this may become important 
again, as I believe there is work underway for spinning up worker nodes that 
are not attached to a SolrCore.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Apr 25, 2017 at 3:25 PM, Dyer, James 
<[email protected]<mailto:[email protected]>> wrote:
Using JDBCStream, Solr cannot find my database driver if I put the .jar in the 
shared lib directory ($SOLR_HOME/lib).  In order for the classloader to find 
it, the driver has to be in the server's lib directory.  Looking at why, I see 
that to get the full classpath, including what is in the shared lib directory, 
we'd typically get a reference to a SolrCore, call "getResourceLoader" and then 
"findClass".  This makes use of the URLClassLoader that knows about the shared 
lib.

But fixing JDBCStream to do this might not be so easy?  Best I can tell, 
Streaming Expressions are written nearly stand-alone as client code that merely 
executes in the Solr JVM.  Is this correct?  Indeed, the code itself is 
included with the client, in the SolrJ package, despite it mostly being 
server-side code … Maybe I misunderstand?

On the one hand, it isn't a huge deal as to where you need to put your drivers 
to make this work.  But on the other hand, it isn't really the best user 
experience, in my opinion at least, to have to dig around the server 
directories to find where your driver needs to go.  And also, if this is truly 
server-side code, why do we ship it with the client jar?  Unless there is a 
desire to make a stand-alone Streaming Expression engine that interacts with 
Solr as a client, would it be acceptable to somehow expose the SolrCore to it 
for loading resources like this?

James Dyer
Ingram Content Group



Reply via email to