It's possible that we could find a design where /select could behave like /export. I think Noble's design of treating a Stream as an iterator is promising. We could change all document result sets to iterators and hide the implementation of how the docs are materialized. This would also impact how output from other search components would be handled. Since result sets aren't limited to top N, all summarized data, such as facets would need to come before the documents. Then Solrj would need to be able to read the summarized data into memory, and stream the documents. It's a nice design, but quite a bit of work.
Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Nov 17, 2016 at 9:26 PM, Yonik Seeley <[email protected]> wrote: > On Thu, Nov 17, 2016 at 9:16 PM, Joel Bernstein <[email protected]> > wrote: > > There were two issues that make the regular /select hander problematic > for > > large result sets: > > > > 1) Use of stored fields, which require lots of disk access. I believe > this > > has been resolved now that the field list can be pulled from the > docValues. > > > > 2) The /select handler sorts by loading the top N docs into a priority > > queue. > > That feels like it could be optional though. PQ makes sense for small > top-N that will go in the cache, but makes less sense when you want > all documents back. > > Look at it from the other perspective: if one is retrieving all > documents that match a query (and lets assume that the number of > matches is large), is /export ever less efficient in that case? If > /export is always better in that scenario, that sounds like an > optimization, not a tradeoff or different design goal, and /select > should always be using the superior algorithm/mechanism for that case. > > -Yonik > > > > This approach becomes untenable at a certain point. The export > > handler, iterates over a bitset of collected docs in multiple passes. > This > > keeps constant performance as the result set grows. This is harder to > make > > work without avoiding the current select logic. > > > > I'm not in full agreement that /select and /export need to come together. > > They really do have different design goals. /select tries to be very > > efficient and fast to support high QPS. /export tries to maintain > constant > > memory use and performance as the result set size increases. Trying to > find > > a way to accomplish both may just end up comprising the design so it > doesn't > > either use case. > > > > > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Thu, Nov 17, 2016 at 9:05 PM, Yonik Seeley <[email protected]> wrote: > >> > >> On Thu, Nov 17, 2016 at 6:54 PM, Kevin Risden <[email protected] > > > >> wrote: > >> > For reference, the SQL/JDBC piece needed ability to specify wildcard > and > >> > figure out the "schema" of the collection including defined dynamic > >> > fields. > >> > >> Out of curiosity, how is this used (and in what contexts)? > >> I'm wondering the implications of new fields appearing when new > >> documents are added. Will this mess up the JDBC driver? > >> > >> > When testing lately with supporting "select *" type semantics, it > would > >> > be > >> > nice to be able to limit to only DocValues fields. > >> > >> I'm not sure we should be segregating stored fields this way (by > >> whether they are column/docValues or not). > >> By default, all of our non-text fields already have docvalues enabled. > >> If someone wants to retrieve or operate on row-stored text fields, it > >> seems like they should be able to do so via the streaming API (or > >> SQL). > >> > >> I guess we could also go the other direction and *only* support > >> docValues (i.e. scrap row-stored fields). But that seems a little > >> more extreme, and I'm also not sure if binary docValues would work as > >> well or could hold text fields of the same size as row-stored fields > >> can. > >> > >> -Yonik > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
