[
https://issues.apache.org/jira/browse/COUCHDB-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799878#action_12799878
]
Roger Binns commented on COUCHDB-623:
-------------------------------------
What are the consistency guarantees that views make? I can't find any
documentation about it anywhere! (There is plenty about the main db, but
nothing about views.)
I can't see any that you can make as the view data is derived from the
documents and the documents can be changed at any point. For example while the
first row of a view is being returned the same corresponding document could
have been deleted. The "slow client" example can also lead to inconsistent
data - for example it may update a document on one connection and then access
the view on a second connection and due to timing end up with the view not
including that document.
The only consistency "guarantee" I can see is that if you do not
add/change/delete the documents for the period shortly before and then during
view retrieval until the view is completely retrieved then the view will
reflect the documents correctly at that time. If there is any form of
concurrency between the documents and the views then there cannot be guarantees
unless CouchDB introduced a transactioning system.
I do see how the append only btree/mvcc format makes the view retrieval code
easier to write, but users of CouchDB do not care how hard the code is to write
:-)
> File format for views is space and time inefficient - use a better one
> ----------------------------------------------------------------------
>
> Key: COUCHDB-623
> URL: https://issues.apache.org/jira/browse/COUCHDB-623
> Project: CouchDB
> Issue Type: Improvement
> Components: Database Core
> Affects Versions: 0.10
> Reporter: Roger Binns
>
> This was discussed on the dev mailing list over the last few days and noted
> here so it isn't forgotten.
> The main database file format is optimised for data integrity - not losing or
> mangling documents - and rightly so.
> That same append-only format is also used for views where it is a poor fit.
> The more random the ordering of data supplied, the larger the btree. The
> larger the keys (in bytes) the larger the btree. As an example my 2GB of raw
> JSON data turns into a 3.9GB CouchDB database but a 27GB view file (before
> compacting to 900MB). Since views are not replicated, this requires a
> disproportionate amount of disk space on each receiving server (not to
> mention I/O load). The format also affects view generation performance. By
> loading my documents into CouchDB in an order by the most emitted value in
> views I was able to reduce load time from 75 minutes to 40 minutes with the
> view file size being 15GB instead of 27GB, but still very distant from the
> 900MB post compaction.
> Views are a performance enhancement. They save you from having to visit
> every document when doing some queries. The data within in a view is
> generated and hence the only consequence of losing view data is a performance
> one and the view can be regenerated anyway. Consequently the file format
> should be one that is optimised for performance and size. The only integrity
> feature needed is the ability to tell that the view is potentially corrupt
> (eg the power failed while it was being generated/updated).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.