The key thing is sharding git and sharding of Xapian are not
tied together:

git repos are sharded to reduce clone/repack costs; so we shard
them based on size (currently 1G or so).

Xapian DBs are sharded to take advantage of SMP during the indexing
phase.

Current import times are as follows:

                  git-only:  ~1 minute
                git+SQLite: ~12 minutes
 git+Xapian+SQLite  serial: ~45 minutes
 git+Xapian+SQLite 4 parts: ~15 minutes (2 + 2 hyperthread)

More cores will help since the Xapian text+term indexing is the
slowest and the only partitioned work.

I also tested just the December 2017 archives on an 8-core AMD
FX-8320.  I forget the specifics, but I seem to recall half the
cores on that chip are not full power:

        4 parts: 58s
        8 parts: 45s

Note: I use eatmydata (LD_PRELOAD to disable sync/fsync) for development
and I consider it perfectly safe to use for offline updates/reindexing.

--
unsubscribe: meta+unsubscr...@public-inbox.org
archive: https://public-inbox.org/meta/

Reply via email to