[
https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867197#action_12867197
]
Damien Katz commented on COUCHDB-738:
-------------------------------------
I've been thinking about this issue, and I think storing the whole rev tree in
the by_seq index is a bad idea.
I'm also thinking of ways to make the compaction faster.
Store the full_doc_info outside the by_id btree, but store the doc_info instead
(with it's pointers to the main doc and conflicts) with a pointer to the
full_doc_info as well. On reads, this avoids the overhead of loading up the
ful_doc_info just to get the main doc. Updates and replications reads (that
request the rev_info) will have to load up the full_doc_info with an extra
read, but unlikely to be an extra disk io since it will be close the most
recent doc revision.
The by_seq index will also have the doc_info and a pointer to the full_doc_info.
Then on compaction, scan the by_seq index, copying over the full_doc_info and
the documents and attachments into a new file. Each full_doc_info should be
linked to the next with an offset to next's file pos.
Then scan the newly written full_doc_info, converting them to doc_infos and
pointers to the full_doc_info, and writing them out consecutively to the end of
the file.
Then sort just this portion of the file, on disk, by the id in the doc_info.
This is the most expensive part of the compaction, but sorting things on disk
is common problem with lots of open libraries out there that are highly
optimized.
Then convert this id sorted portion of the file to btree leaf nodes. Then
rescan leaf nodes and build up the inner nodes, recurse until you have single
root node left. This is now your by_id index.
Then rescan the full_doc_infos, and write out to the end of the file the
doc_infos and pointers back to the full_doc_infos. This is already sorted
by_seq.
Then convert this seq sorted portion of the file to btree leaf nodes. Then
rescan leaf nodes and build up the inner nodes, recurse until you have single
root node left. This is now your by_seq index.
You now a fully compacted database with no wasted btree nodes. I think this
will be a lot faster. With the exception of the by_id sorting phase, this
eliminates random disk seeks.
> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
> Key: COUCHDB-738
> URL: https://issues.apache.org/jira/browse/COUCHDB-738
> Project: CouchDB
> Issue Type: Improvement
> Components: Database Core
> Affects Versions: 0.9.2, 0.10.1, 0.11
> Reporter: Adam Kocoloski
> Assignee: Adam Kocoloski
> Fix For: 1.1
>
> Attachments: 738-efficient-compaction-v1.patch,
> 738-efficient-compaction-v2.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a
> lookup in the by_id btree for every document in the database. It does this
> because the #full_doc_info{} record with the full revision tree is only
> stored in the by_id tree. I'm proposing instead to store duplicate copies of
> #full_doc_info{} in both trees, and to have the compactor use the by_seq tree
> exclusively. The net effect is significantly fewer calls to pread(), and an
> compaction IO pattern where reads tend to be clustered close to each other in
> the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near
> the seq tree nodes, the performance improvement is small but noticeable (~10%
> in some simple tests). On the other hand, in the worst-case scenario of
> randomly-generated docids and a database much larger than main memory the
> improvement is huge. Joe Williams did some simple benchmarks with a 50k
> document, 600 MB database on a 256MB VPS. The compaction time for that DB
> dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar
> optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.
> These include an increase in the size of the database and slower view
> indexing. I expect both to be small effects.
> The patch can be applied directly to tr...@934272. Existing DBs are still
> readable, new updates will be written in the new format, and databases can be
> fully upgraded by compacting.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.