On Apr 14, 2008, at 02:34, Ralf Nieuwenhuijsen wrote:
Well, that doesn't really apply. I am not looking for way to create unique
documents.
I'm looking for a way to get a view with only unique documents.

Imagine some portion of all the documents having the key 'adres'.
Then I want a list of unique adresses; a view with only the adres keys for
documents that have it, and then only unique entries.

It seems currently i can solve this problem in two ways:
- creating a separate adres document that stores an array of all unique addresses. But without any sane default merging behavior, this breaks at
replication.
- creating a separate document for _each_ adres using put and the md5 of the adres of doc-id. This seems like an enormous waste of space. Esspcially
since I will be doing this with almost every key in every document.

In the future this should be doable with the reduce/combinator behavior, i expect.But even there, i think the suggested approach is too limiting. The reducer is going to return one json object. I would rather have it emit (key,value) and use default view operations on it for stuff like pagination.

Using the above example and assuming the reducer is implemented. How to get the X most used addresses? the value of X needs to be hard-coded with the suggested implemenation. Whereas using emit(key,value) in the reducer as
well, would allow for pagination.

I might be totally off here, but the reduce function actually does only return one key-value pair for the view:

map: /* _id = md5(address) */
function(doc) {
  emit(doc._id, 1);
}

produces:

abc | 1
abc | 1
def | 1
xyz | 1
yyy | 1
yyy | 1
yyy | 1

for fictional _id values.

reduce:
function(keys, values) {
  var sum = 0;
  for(var i in values) {
    sum += values[i];
  }

  return sum;
}

produces:

abc | 2
def | 1
xyz | 1
yyy | 3

as the output of the view, which can be paginated just as easy as the list that map alone produces. This gives you a count for all addresses but not yet a sorted list. got to think about that one a bit more.

Cheers
Jan
--


Greetings,
Ralf

2008/4/13, Chris Anderson <[EMAIL PROTECTED]>:

Ralf,

If you use an algorithm to generate a deterministic _id for records
before PUT-ing them to CouchDB, you can ensure that each unique record
only appears once in the database. This discussion might be relevant
for you:


http://mail-archives.apache.org/mod_mbox/incubator-couchdb-user/200803.mbox/[EMAIL
 PROTECTED]

Chris


On Sat, Apr 12, 2008 at 8:43 PM, Ralf Nieuwenhuijsen
<[EMAIL PROTECTED]> wrote:
Is it possible to create a view with only unique-records?
I assume it would be possible using the future reduce/combinator
behavor?

What time-frame is the reduce-behavior planned?

Greetings,
Ralf





--
Chris Anderson
http://jchris.mfdz.com


Reply via email to