Re: [jira] Commented: (COUCHDB-623) File format for views is space and time inefficient - use a better one

Chris Anderson Wed, 13 Jan 2010 14:21:21 -0800

On Wed, Jan 13, 2010 at 2:11 PM, Roger Binns (JIRA) <[email protected]> wrote:
>
>    [ 
> https://issues.apache.org/jira/browse/COUCHDB-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799965#action_12799965
>  ]
>
> Roger Binns commented on COUCHDB-623:
> -------------------------------------
>
> The view consistency stuff is a red herring.  If you are not making changes 
> to the DB then any file format will work and give "consistent" results.
>
> If you are making changes to the docs then no scheme short of 
> transactions/locking will ensure that the view is consistent with the 
> documents.  It will always be possible for documents to be referenced by the 
> view that are not in the DB and for documents to be in the DB that are not in 
> the view.  I see no point in trying to even make the view "consistent" with a 
> point in time while DB changes are happening since it gives no performance 
> efficiency nor any space efficiency - in fact the extreme opposites.
>
> The point of views is to give me information fast that I could only otherwise 
> obtain by visiting all the documents.  That is what they should be optimized 
> for.


the current views are optimized for youre red herring. here it
actually matters is the ability to give transactional information
about things like bank account balances.

see: http://books.couchdb.org/relax/reference/recipes for Banking

without MVCC views, there's no way to query accurately at all when
inserts are underway (short of blocking reads during writes).

If you need something with less consistency, you are encouraged to
wrap your own indexing system around couchdb's map reduce runtime, or
even build your own runtime.

has anyone used Hadoop as an external yet?

Chris


>
>> File format for views is space and time inefficient - use a better one
>> ----------------------------------------------------------------------
>>
>>                 Key: COUCHDB-623
>>                 URL: https://issues.apache.org/jira/browse/COUCHDB-623
>>             Project: CouchDB
>>          Issue Type: Improvement
>>          Components: Database Core
>>    Affects Versions: 0.10
>>            Reporter: Roger Binns
>>            Assignee: Damien Katz
>>
>> This was discussed on the dev mailing list over the last few days and noted 
>> here so it isn't forgotten.
>> The main database file format is optimised for data integrity - not losing 
>> or mangling documents - and rightly so.
>> That same append-only format is also used for views where it is a poor fit.  
>> The more random the ordering of data supplied, the larger the btree.  The 
>> larger the keys (in bytes) the larger the btree.  As an example my 2GB of 
>> raw JSON data turns into a 3.9GB CouchDB database but a 27GB view file 
>> (before compacting to 900MB).  Since views are not replicated, this requires 
>> a disproportionate amount of disk space on each receiving server (not to 
>> mention I/O load).  The format also affects view generation performance.  By 
>> loading my documents into CouchDB in an order by the most emitted value in 
>> views I was able to reduce load time from 75 minutes to 40 minutes with the 
>> view file size being 15GB instead of 27GB, but still very distant from the 
>> 900MB post compaction.
>> Views are a performance enhancement.  They save you from having to visit 
>> every document when doing some queries.  The data within in a view is 
>> generated and hence the only consequence of losing view data is a 
>> performance one and the view can be regenerated anyway.  Consequently the 
>> file format should be one that is optimised for performance and size.  The 
>> only integrity feature needed is the ability to tell that the view is 
>> potentially corrupt (eg the power failed while it was being 
>> generated/updated).
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: [jira] Commented: (COUCHDB-623) File format for views is space and time inefficient - use a better one

Reply via email to