After digging some more, I'm feeling like the SolrJ API feels slightly trappy WRT streaming. There's a positive example to follow in the ref guide but unless you have a handy dandy Gandalf to tell you "stay on the path" you could easily get yourself into murky territory :). Neither the availability of classes nor the package names nor the javadoc, nor the (official) documentation warn you against going down the "Hard way" in Erick Erickson's blog post <https://lucidworks.com/2017/12/06/streaming-expressions-in-solrj/>. In past projects I have typically used the recommended way more or less because it is a nice thing to be able to test stuff in the solr admin ui, but his blog post made me say "Yikes!" because I could totally see myself blundering into the hard way looking for ways to avoid 80 lines of string concatenation in my java class, or trying to re-use repetitive stuff from a 300 line expression that is mostly repeated patterns... both are pain points I have had but have not yet had time to refactor.
Also, streaming has grown to be an enormous number of classes adding a lot of weight to the solrj jar (343 out of 501 classes and 39,000 lines out of 71,000 total lines of java under o.a.s.c.solrj are in o.a.s.c.solrj.io). I spent my "code holiday" yesterday digging into the feasibility of making things more intuitive via hiding the majority of the "not recommended for direct use" classes in solr-core so here's my thoughts/finding: 1. Back compatibility is a key issue that might can the whole idea. Early implementations may have been based on the pattern shown in the 7.2 ref guide and reference the classes 2. StreamFactory probably should be/implement an interface 3. To keep all the low level classes out of the way we would need to not parse the expression on the client 4. Too do that we might create special expression that can be serialized to carry the string, and evaluated on the server. 5. That makes explain functionality need it's own round trip which is where things get hard/irritating Such a change is a fairly major undertaking, and maybe not worth it unless lots of other folks care. Looking back at Erick's post I think it's slightly biased toward the case where zk isn't available. Looking at the ref guide example it seems like it could be simplified further by letting CloudSolrClient manage the StreamFactory... then the ref guide example <https://lucene.apache.org/solr/guide/7_5/streaming-expressions.html#streaming-requests-and-responses>could look like: TupleStream stream = client.constructStream("...."); And maybe the easy to remember general guideline we should work towards in solrj is that the Clients listed in the ref guide <https://lucene.apache.org/solr/guide/7_2/using-solrj.html#types-of-solrclients> are "the path"... If you get it from the client it's safe/supported to play with (it's "on the path" so to speak). Anything else is "off the path, bring your eleven sword and be prepared to hire a hobbit to rescue you". I also have ideas about substitution/templating but that's for another thread/ticket. On Sat, Oct 20, 2018 at 4:32 PM Gus Heck <[email protected]> wrote: > Hi Shawn, > > Yeah, I understand that's the general logic but the recommendation in the > above link is to avoid using *most* of the classes and supply a string > representing the expression. A user seeing all these classes in the > javadoc (or ide) would easily think that they should be using them. > > After looking back at the 7.2 refguide I see that there used to be an > example that explicitly set up names for classes, but that's now done > automatically via o.a.s.streaming.Lang (yay for standardization! :) ), so > there is no need to reference the vast majority of the streaming classes > directly (unless ignoring Eric's advice in the linked post) > > This morning for fun I tested what it takes to move everything in the > o.a.s.solrj.io package to core as a package names o.a.s.streaming, and > only a few unit test configs got sticky. There were no cross refs to > constants or uses by outside classes. I think things may have progressed to > the point where stuff users shouldn't be using could be pulled down to core > before they do get entangled in dependencies. > > Of course one can't just move everything, one has to leave behind > something to facilitate the execution of streaming... but from the looks of > the example here: > https://lucene.apache.org/solr/guide/7_5/streaming-expressions.html this > process needs only a zkhost, a collection and a string containing the > expression, so it seems like the string should pass directly to a cloud > solr client (which knows about a zkhost and collection already) and not > require any special classes beyond the TupleStream return value and Tuple > (produced by TupleStream)... plus Explanation and StreamComparator which > are returned by other methods on TupleStream. > > -Gus > > On Sat, Oct 20, 2018, 2:39 PM Shawn Heisey <[email protected]> wrote: > >> On 10/20/2018 8:34 AM, Gus Heck wrote: >> > To put it another way, I'm not sure why this statement from that >> > article must be true: "SolrJ is what’s used for the communication >> > between the Solr node, so this level must be exposed." >> >> All communication between Solr nodes uses SolrJ. SolrJ is an integral >> part of the server as well as a jar providing a standalone client. >> >> Many of the string constants that Solr provides are actually located in >> SolrJ, because they are useful for both client and server operations. >> Take a look at the CommonParams class. >> >> Streaming expressions are something that users want to do with the >> client, so it makes sense for some significant parts of it to live in >> the client code. >> >> Thanks, >> Shawn >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> -- http://www.the111shift.com
