There were two issues that make the regular /select hander problematic for large result sets:
1) Use of stored fields, which require lots of disk access. I believe this has been resolved now that the field list can be pulled from the docValues. 2) The /select handler sorts by loading the top N docs into a priority queue. This approach becomes untenable at a certain point. The export handler, iterates over a bitset of collected docs in multiple passes. This keeps constant performance as the result set grows. This is harder to make work without avoiding the current select logic. I'm not in full agreement that /select and /export need to come together. They really do have different design goals. /select tries to be very efficient and fast to support high QPS. /export tries to maintain constant memory use and performance as the result set size increases. Trying to find a way to accomplish both may just end up comprising the design so it doesn't either use case. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Nov 17, 2016 at 9:05 PM, Yonik Seeley <[email protected]> wrote: > On Thu, Nov 17, 2016 at 6:54 PM, Kevin Risden <[email protected]> > wrote: > > For reference, the SQL/JDBC piece needed ability to specify wildcard and > > figure out the "schema" of the collection including defined dynamic > fields. > > Out of curiosity, how is this used (and in what contexts)? > I'm wondering the implications of new fields appearing when new > documents are added. Will this mess up the JDBC driver? > > > When testing lately with supporting "select *" type semantics, it would > be > > nice to be able to limit to only DocValues fields. > > I'm not sure we should be segregating stored fields this way (by > whether they are column/docValues or not). > By default, all of our non-text fields already have docvalues enabled. > If someone wants to retrieve or operate on row-stored text fields, it > seems like they should be able to do so via the streaming API (or > SQL). > > I guess we could also go the other direction and *only* support > docValues (i.e. scrap row-stored fields). But that seems a little > more extreme, and I'm also not sure if binary docValues would work as > well or could hold text fields of the same size as row-stored fields > can. > > -Yonik > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
