[ 
https://issues.apache.org/jira/browse/COUCHDB-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799891#action_12799891
 ] 

Paul Joseph Davis commented on COUCHDB-623:
-------------------------------------------

The consistency guarantee refers to the file format used guarantees on disk 
consistency the same as is done for the main database file (ie, tail append 
MVCC style). Its not a reference to figuring out the sync between the main db 
and the view. As you point out doing things like querying with stale=ok can 
give you a view result that does not reflect the most recent changes to the 
database or reflects changes from other clients etc etc.

> File format for views is space and time inefficient - use a better one
> ----------------------------------------------------------------------
>
>                 Key: COUCHDB-623
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-623
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.10
>            Reporter: Roger Binns
>
> This was discussed on the dev mailing list over the last few days and noted 
> here so it isn't forgotten.
> The main database file format is optimised for data integrity - not losing or 
> mangling documents - and rightly so.
> That same append-only format is also used for views where it is a poor fit.  
> The more random the ordering of data supplied, the larger the btree.  The 
> larger the keys (in bytes) the larger the btree.  As an example my 2GB of raw 
> JSON data turns into a 3.9GB CouchDB database but a 27GB view file (before 
> compacting to 900MB).  Since views are not replicated, this requires a 
> disproportionate amount of disk space on each receiving server (not to 
> mention I/O load).  The format also affects view generation performance.  By 
> loading my documents into CouchDB in an order by the most emitted value in 
> views I was able to reduce load time from 75 minutes to 40 minutes with the 
> view file size being 15GB instead of 27GB, but still very distant from the 
> 900MB post compaction.
> Views are a performance enhancement.  They save you from having to visit 
> every document when doing some queries.  The data within in a view is 
> generated and hence the only consequence of losing view data is a performance 
> one and the view can be regenerated anyway.  Consequently the file format 
> should be one that is optimised for performance and size.  The only integrity 
> feature needed is the ability to tell that the view is potentially corrupt 
> (eg the power failed while it was being generated/updated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to