On Wed, Nov 5, 2008 at 11:12 PM, Antony Blakey <[EMAIL PROTECTED]> wrote: > > On Tue, Nov 4, 2008 at 6:47 PM, Antony Blakey <[EMAIL PROTECTED]> > wrote: > >> My solution is to use the notification mechanism to maintain (multiple) >> SQLite databases containing the document keys I need to search over. In >> the >> URP example, I store (db, src, dest) records. I also store the last seqno, >> so that I can do incremental updates to SQLite. >> >> Then, using _external, I can make queries of SQLite to get me either >> (user, >> permission) pairs using a self-join, or in the case of the arbitrary >> metadata queries, a list of document ids. >> >> The primary difficulties I have with this simple model are: >> >> 1) The notification mechanism doesn't give me enough information. >> Currently >> I have to do a _all_docs_by_seq and check for deletions by attempting to >> get >> each document, which I have to do for every document in any case (unless I >> use transparent id's) to determine if I'm interested in it, and then get >> the >> data. I presume this technique works because deletion is actually a form >> of >> update until compaction (I copied it from GeoCouch). >> >> ** SUGGESTION ** I would prefer an update interface that gave me a stream >> of >> (old, new) document pairs, which covers add/update/delete, plus a (from, >> to) >> seqno pair. Have a 'from' seqno lets me know when I have to trigger a full >> rescan, which I need to do in a variety of circumstances such as >> configuration change. > > In retrospect this is not a good idea. I think notification handlers should > do nothing more than mark a view or query source as dirty, invalidate a > cache such as memcached, and possibly check for mods to a config document to > enable/disable the query service. The _external/plugin query handler should > do the subsequence processing and update any private data structures or > on-disk indexes, just as the map/reduce views do, and for the same reason. > So I don't think the notification mechanism should be changed. > > However, that raises a question about external view/query server updating: > should a view update to the seqnum that is current when the request is > received, or should it keep looping in an update cycle until the seqnum has > reached a steady state? > > The former would make sense if you just wanted the ensure that the view was > up-to-date with records a client might have just written in the requesting > thread, whilst the later would seem to potentially block forever depending > on the amount of processing required to update the external view and the > update rate. >
I'm haven't considered all of the possible ramificaitons of what information should be presented to update notifications processes, but my current feelings in my rather tired state are in no particular order: 1. Update notifications should at a bare minimum support DB create/update/delete notifications. IIRC, create was missing but was a minimal patch. Not sure if its been comited or not. 2. View resets may be an addition to the notifications 3. Following from 2, updates to a view may lend credence to having an update that is "view updated to seq N" All of those I could see as having particular use cases. As to updating, if I'm not mistaken, views internally will update to the latest sequence that was available when the update started. (Thought being if they access the btree in one go, the consistent read state would point at them not geting new updates till the next read request. Not sure if incomming reads durring an update reset this though) When we index outside of erlang, we don't have the consistent read-state guarantee if we page through the _all_docs_by_seq view as per usual design pattern. We could read the entire thing view into memory, but that has the obvious not-scalable side effect. Thus by default all external indexers (external == accessing couchdb via http) would have the second form of updating until they reach a steady state. This in deed could lead to race conditions with an indexer never managing to stay quite in sync. > Finally, does anyone have advice about the merits of mnesia vs. sqlite (or > another sql db) for this kind of auxiliary indexing? > I'd say it really depends on what you're wanting to accomplish. I for awhile have contemplated the relative awesome/fail aspects of having an mnesia layer that treated couchdb as its permanent store and exported some sort of HTTP query api. I'm pretty sure I've convinced myself it would be a not-general enough thing that would be worth supporting. That being said, using it for a specific well defined roll probably isn't out of the question. As for the specifics of mnesia vs. sqlite I couldn't tell you. My guess is that the mnesia integration would kick the crap out of the sqlite integeration seeing as mnesia is part of the core library. But you mentioned GIS stuff and I haven't a clue on mnesia support for such things. Also, mnesia has that whole horizontal scale thing baked in. Hopefully that helps more than confounds, Paul > Antony Blakey > -------------------------- > CTO, Linkuistics Pty Ltd > Ph: 0438 840 787 > > Lack of will power has caused more failure than lack of intelligence or > ability. > -- Flower A. Newhouse > >
