[
https://issues.apache.org/jira/browse/COUCHDB-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799891#action_12799891
]
Paul Joseph Davis commented on COUCHDB-623:
-------------------------------------------
The consistency guarantee refers to the file format used guarantees on disk
consistency the same as is done for the main database file (ie, tail append
MVCC style). Its not a reference to figuring out the sync between the main db
and the view. As you point out doing things like querying with stale=ok can
give you a view result that does not reflect the most recent changes to the
database or reflects changes from other clients etc etc.
> File format for views is space and time inefficient - use a better one
> ----------------------------------------------------------------------
>
> Key: COUCHDB-623
> URL: https://issues.apache.org/jira/browse/COUCHDB-623
> Project: CouchDB
> Issue Type: Improvement
> Components: Database Core
> Affects Versions: 0.10
> Reporter: Roger Binns
>
> This was discussed on the dev mailing list over the last few days and noted
> here so it isn't forgotten.
> The main database file format is optimised for data integrity - not losing or
> mangling documents - and rightly so.
> That same append-only format is also used for views where it is a poor fit.
> The more random the ordering of data supplied, the larger the btree. The
> larger the keys (in bytes) the larger the btree. As an example my 2GB of raw
> JSON data turns into a 3.9GB CouchDB database but a 27GB view file (before
> compacting to 900MB). Since views are not replicated, this requires a
> disproportionate amount of disk space on each receiving server (not to
> mention I/O load). The format also affects view generation performance. By
> loading my documents into CouchDB in an order by the most emitted value in
> views I was able to reduce load time from 75 minutes to 40 minutes with the
> view file size being 15GB instead of 27GB, but still very distant from the
> 900MB post compaction.
> Views are a performance enhancement. They save you from having to visit
> every document when doing some queries. The data within in a view is
> generated and hence the only consequence of losing view data is a performance
> one and the view can be regenerated anyway. Consequently the file format
> should be one that is optimised for performance and size. The only integrity
> feature needed is the ability to tell that the view is potentially corrupt
> (eg the power failed while it was being generated/updated).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.