On Sun, Jan 18, 2009 at 11:56 PM, Antony Blakey <[email protected]> wrote: > > On 19/01/2009, at 2:53 PM, Paul Davis wrote: > >> On Sun, Jan 18, 2009 at 10:51 PM, Antony Blakey <[email protected]> >> wrote: >>> >>> I've previously posted a solution using _external that doesn't hit couch >>> every update, and that maintains MVCC consistency and lazy-update view >>> behaviour. >>> >> >> Right. I tried looking through mark mail for a link to your >> implementation but came up empty handed. I'd contemplated something >> similar as well. The issue though is that Lucene index writers are >> AFAIK not reentrant. > > Thread 'couchdb' started by Tim Parkin around 20/21 December. >
Odd. I only noticed that last 2 or 3 posts of that thread before. Thanks for the tip. > IndexWriters are mutexed using a lock file. > Ew. >> Thus the headache of coordinating multiple random >> processes would start to suck. Lots. > > My reading of the code was that there was a single process for each > _external definition (although admittedly that was early in my understanding > of gen_server). Major consistency issues result if requests to the _external > aren't serialized. > There can be many _external processes for a single definition. So, not only are requests not serialized, they can be concurrent etc. >>> The problem with using notifications is lack of snapshot coordination >>> between the update process and the external process. >>> >> >> I'd say this is use case dependent. > > It does mean that you can't guarantee that an external request (that does > reference a given MVCC snapshot) is getting data from the same snapshot. > > You're right that's use case dependent, but the issue is whether the use > case is 'free text indexing' or is a client use case. If the later, then you > need to handle the situation where it *does* matter, so an implementation > that has random characteristics is IMO less than optimal. > Err, right. Its use case dependent. If your (client defined) use case requires certain characteristics, the update_notifcation/_external process may just not be the right tool for the job etc etc. >>> The synchronisation between sequential _external calls is obvious e.g. >>> guaranteeing that the _external process sees a monotonic increasing >>> update_seq. >>> >> >> I don't follow. > > I mean you'll never get a request in the context of an update_seq that your > _external process has already advanced beyond, because the update_seqs seen > by the external are a) serialized and b) only see a monotonic increasing > sequence of update_seq values. Hence you can safely run an update process > and set a 'last_update_seq_seen' (which is the key to avoiding hitting couch > again) knowing that you never have to backtrack. > A single _external process should only see monotonically increasing update_seq's. I think it's techincally possible to have a smaller update_seq processed later in time in a different os process though (later in time <= few ms). The ideas from the other thread about having a UUID per db and compaction are interesting, are either of those included the fs layout stuff you were working on? Paul > Antony Blakey > -------------------------- > CTO, Linkuistics Pty Ltd > Ph: 0438 840 787 > > Human beings, who are almost unique in having the ability to learn from the > experience of others, are also remarkable for their apparent disinclination > to do so. > -- Douglas Adams > > >
