Hi Volker,

>> Do you mean "emittin the full document ...."?

You already answered my question: by relying on the SDK, the full documents 
will be retrieved by using the IDs obtained through the specified view.

Thanks for the help!

Regards,

-- Tito

On Jun 16, 2014, at 12:52 AM, Volker Mische <[email protected]> wrote:

> Hi Tito,
> 
> On 06/16/2014 06:35 AM, Tito Ciuro wrote:
>> Hi,
>> 
>> I've been using CouchDB for a while and now I'm evaluating Couchbase.
>> I'm wondering what's the best way to determine when to emit data vs
>> null. I typically avoid emitting the whole document is it's too "large"
>> (i.e. 1 MB or so) because the index would grow way too much. In this
>> case, I tend to emit null and then collect the documents via
>> Include_docs. However, if the data set is small (or all I need is a
>> subset of the document, then I emit this subset, as it's faster and puts
>> less strain on the storage system. There is also the potential for a
>> race condition. As per CouchDB's documentation
>> (http://wiki.apache.org/couchdb/HTTP%5Fview%5FAPI)
>> 
>>    The include_docs option will include the associated document.
>>    However, the user should keep in mind that there is a race condition
>>    when using this option. It is possible that between reading the view
>>    data and fetching the corresponding document that the document has
>>    changed. If you want to alleviate such concerns you should emit an
>>    object with a _rev attribute as in emit(key, {"_rev": doc._rev}).
>>    This alleviates the race condition but leaves the possibility that
>>    the returned document has been deleted (in which case, it includes
>>    the "_deleted": true attribute). Note: include_docs will cause a
>>    single document lookup per returned view result row. This adds
>>    significant strain on the storage system if you are under high load
>>    or return a lot of rows per request. If you are concerned about
>>    this, you can emit the full doc in each row; this will increase view
>>    index time and space requirements, but will make view reads
>>    optimally fast.
> 
> 
> The Couchbase implementation for include_docs is different. If you use an 
> SDK, it requests the view to get all the IDs and then it fetches the full 
> docs via a memcache GET. In the upcoming version of Couchbase (3.0) the 
> original include_docs of the views will completely go away aand it will only 
> be supported through the SDKS (don't worry the API won't change when you use 
> the SDKS).
> 
>> Since Couchbase utilizes memcache, storing and retrieving data is a
>> whole different game: while in general a CouchDB document should not be
>> split and related into other documents (it's not a RDBMS!), it seems to
>> be perfectly fine in Couchbase. Because get/set/multiget are cheap
>> operations, it's perfectly feasible to "break" a document into smaller
>> pieces and retrieve them piecemeal. It seems this would be great for
>> memcache because it'd allow to cache the documents that are used the
>> most. On the other hand, keeping a document "monolithic" not only makes
>> the index larger, but it makes it less efficient to cache (it's an all
>> or nothing proposition.)
>> 
>> So it seems that a valid approach in Couchbase would be to:
>> 
>> 1) break "large" documents into smaller, more manageable ones. Retrieve
>> them via get/multiget (cheap op) and let memcache cache them as
>> efficiently as possible.
>> 2) emit small data subsets as needed, as opposed to the entire document
>> where possible.
>> 3) for those queries where the entire document needs to be retrieved...
>> what then?:
>> 
>>     3.1) should we emit null and include_docs=true?
>>     3.2) should we emit the entire document instead?
> 
> You would emit null and let the SDK do the rest
> 
>> It's clear that always emitting null in CouchDB puts a lot of pressure
>> on the storage system. But what about Couchbase? Are there any best
>> practices to be followed?
> 
> Do you mean "emittin the full document ...."?
> 
> Cheers,
>  Volker
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "Couchbase" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/couchbase/Y385HZQ73k0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to