Re: Merging segment parts concurrently (SegmentMerger)

Michael McCandless Tue, 26 Jan 2021 06:40:00 -0800

+1 to make a single merge concurrent!  It is horribly frustrating to watch
that last merge running on a single core :)  I have lost many hours of my
life to this frustration.


I do think we need to explore concurrency within terms/postings across
fields in one segment to really see gains in the common case where merge
time is dominated by postings.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jan 26, 2021 at 9:09 AM Robert Muir <rcm...@gmail.com> wrote:

> On Tue, Jan 26, 2021 at 8:29 AM Adrien Grand <jpou...@gmail.com> wrote:
> > For full text collections, I believe that the bottleneck is usually
> terms+postings so it might not save much time. Maybe we could also
> parallelize on a per-field basis by writing to temporary files and then
> copying the raw data to the target segment part. For instance for the
> Wikipedia dataset we use for nightly benchmarks, maybe the inverted indexes
> for 'title' and 'body' could be merged in parallel this way.
> >
> >
>
> if you want to experiment with something like that, you can hackishly
> simulate it today to quickly see the overhead, correct? its a small
> hack to PerFieldPostingsFormat to force it to emit "files-per-field"
> and then CFS will combine it all together.
>
> but doing it explicitly and then making our own internal "compound"
> seems kinda risky, wouldn't all the offsets be wrong without further
> file changes (e.g. per-field "start offset" where all the postings for
> that field begin) ?
>
> And this does nothing to solve dawid's problem of slow vectors, if you
> have vectors on that's always gonna be the bottleneck and those are
> per-doc.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Merging segment parts concurrently (SegmentMerger)

Reply via email to