[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Damien Katz (JIRA) Thu, 13 May 2010 10:53:08 -0700

    [ 
https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867197#action_12867197
 ]


Damien Katz commented on COUCHDB-738:
-------------------------------------

I've been thinking about this issue, and I think storing the whole rev tree in 
the by_seq index is a bad idea.

I'm also thinking of ways to make the compaction faster.

Store the full_doc_info outside the by_id btree, but store the doc_info instead 
(with it's pointers to the main doc and conflicts) with a pointer to the 
full_doc_info as well. On reads, this avoids the overhead of loading up the 
ful_doc_info just to get the main doc. Updates and replications reads (that 
request the rev_info) will have to load up the full_doc_info with an extra 
read, but unlikely to be an extra disk io since it will be close the most 
recent doc revision.

The by_seq index will also have the doc_info and a pointer to the full_doc_info.

Then on compaction, scan the by_seq index, copying over the full_doc_info and 
the documents and attachments into a new file. Each full_doc_info should be 
linked to the next with an offset to next's file pos.

Then scan the newly written full_doc_info, converting them to doc_infos and 
pointers to the full_doc_info, and writing them out consecutively to the end of 
the file.

Then sort just this portion of the file, on disk, by the id in the doc_info. 
This is the most expensive part of the compaction, but sorting things on disk 
is common problem with lots of open libraries out there that are highly 
optimized.

Then convert this id sorted portion of the file to btree leaf nodes. Then 
rescan leaf nodes and build up the inner nodes, recurse until you have single 
root node left. This is now your by_id index.

Then rescan the full_doc_infos, and write out to the end of the file the 
doc_infos and pointers back to the full_doc_infos. This is already sorted 
by_seq.

Then convert this seq sorted portion of the file to btree leaf nodes. Then 
rescan leaf nodes and build up the inner nodes, recurse until you have single 
root node left. This is now your by_seq index.

You now a fully compacted database with no wasted btree nodes. I think this 
will be a lot faster. With the exception of the by_id sorting phase, this 
eliminates random disk seeks.

> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: 738-efficient-compaction-v1.patch, 
> 738-efficient-compaction-v2.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a 
> lookup in the by_id btree for every document in the database.  It does this 
> because the #full_doc_info{} record with the full revision tree is only 
> stored in the by_id tree.  I'm proposing instead to store duplicate copies of 
> #full_doc_info{} in both trees, and to have the compactor use the by_seq tree 
> exclusively.  The net effect is significantly fewer calls to pread(), and an 
> compaction IO pattern where reads tend to be clustered close to each other in 
> the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near 
> the seq tree nodes, the performance improvement is small but noticeable (~10% 
> in some simple tests).  On the other hand, in the worst-case scenario of 
> randomly-generated docids and a database much larger than main memory the 
> improvement is huge.  Joe Williams did some simple benchmarks with a 50k 
> document, 600 MB database on a 256MB VPS.  The compaction time for that DB 
> dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar 
> optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  
> These include an increase in the size of the database and slower view 
> indexing.  I expect both to be small effects.
> The patch can be applied directly to tr...@934272.  Existing DBs are still 
> readable, new updates will be written in the new format, and databases can be 
> fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Reply via email to