Thanks Mike! I'll follow up with results once I have an opportunity to test out an alternate approach.
DrillSidewaysQuery certainly looks setup to avoid lots of duplicate work, and it was a little surprising to find such a different approach in the concurrent version. That said, the code is certainly much simpler to run a bunch of DrillSidewaysQueries in parallel! Cheers, -Greg On Tue, Feb 23, 2021 at 1:32 PM Michael McCandless < [email protected]> wrote: > Hi Greg, > > As far as I know nobody has experimented any further with concurrent > implementation for drill sideways. Patches welcome! > > I would be curious to know how those two concurrent solutions we support > today compare with the serial performance of DrillSidewaysQuery. The > redundant work is indeed frustrating and was the original motivation for > creating DrillSidewaysQuery in the first place. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Mon, Feb 15, 2021 at 12:05 PM Greg Miller <[email protected]> wrote: > >> Hi folks- >> >> I'm reaching out to understand if there's been any past exploration into >> alternative concurrent DrillSideways execution approaches. My understanding >> of the current approach is that we're achieving some concurrency by using a >> CollectorManager with IndexSearcher (allowing parallel execution across the >> shards) but also collecting the different facet results by executing N >> separate drill down queries, where N is the number of drill downs applied, >> each with one of the drill down restrictions removed. This approach seems >> like it would do a large amount of duplicate computational work when >> executing these queries (e.g., just think of the base query component of >> each drill down query being executing N times). >> >> Michael McCandless brought up >> <https://issues.apache.org/jira/browse/LUCENE-7588> an alternate >> approach of sticking with the existing "doc at a time" methodology (rather >> than implementing this "query at a time" approach), but it's not clear to >> me if it was explored further. It seems to me like the latency regression >> of "doc at a time" would likely be fairly small but the overall computation >> for these searches may drop significantly. Is there any more history on >> this approach that folks are aware of, or any thoughts on whether-or-not it >> would be valuable to explore a "doc at a time" approach (essentially create >> a single DrillSidewaysQuery and hand that off to IndexSearcher with the >> CollectorManager instead of scheduling N IndexSearcher searches as is done >> today)? >> >> Thanks in advance for any thoughts/info/discussion! >> >> Cheers, >> -Greg >> >
