Re: Document validation involving other documents

2010-01-25 Thread Brian Candler
On Sun, Jan 24, 2010 at 12:21:25PM -0800, Chris Anderson wrote:
 The problem with this approach is that validation is run during
 replication as well, so any multi-doc data dependencies become
 problematic in ad-hoc clusters.

But not every application makes sense as an ad-hoc cluster. In tight-knit
clusters the databases trust each other and you want the data to be as
coherent as possible, so you'd run replication as a user which has
permit-everything rights in validate_doc_update.

In these models you're more interested in validating the data once at its
point of entry, not at every point of replication.

It would be horrendous to have a document in instance 1 but not in instance
2, just because it was accepted initially according to some set of rules,
but failed to replicate because the rules had changed in the mean time.

This is especially true if the rules themselves are documents, and hence may
be a bit stale. At worst you may accept an update which would be invalid if
you had the most up-to-date rules, or reject one which would be valid, but
the fact that you *did* accept or reject it should be consistent throughout
the cluster.


Document validation involving other documents

2010-01-24 Thread David Richardson
I've been trying to eliminate, or at least vastly reduce the need for 
middleware in a specific class of applications, but this problem probably cuts 
that task short.

We have incoming documents that need to be validated against data stored in 
other docs (rate tables, technical abbreviations, inspection codes). There is a 
high procedural cost to allowing documents in and validating them after saving 
(queue processing), but validation functions don't allow bringing in external 
data (correct me if I'm wrong, please.) Without writing middleware or sideware 
(an external handler), I don't see a way to do this.

What are the arguments against including a native 'lookup' function available 
to validation functions only. This seems to be a side-effect-free usage case to 
me. Reads are cheap, correct?
If my memory hasn't completely failed, this was a significant problem in Lotus 
Notes until such a function was added, somewhere around version 3.3 or 4.0.

David Richardson

Re: Document validation involving other documents

2010-01-24 Thread David Richardson
That's exactly what creates the high associated procedure cost - the documents 
are submitted by mobile workers who connect for very short times, sometimes 
very infrequently. If the document is accepted with invalid entries and 
processed later, the opportunity to alert the author of the problem(s) during 
the request/response cycle is lost. This means someone else must fix the 
mistake (extra work), and a very unwieldy process (usually manual) must be 
instituted to alert the author (training/feedback is harder and less effective).

I'm now willing to accept extra moving parts living in front of the db - there 
are other things that are easier to accomplish there too.
The cron async process smells like Greenspun's 10th law, extrapolated to 
transaction managers and message brokers ;-)

David

 On Sun, Jan 24, 2010 at 12:06 PM, David Richardson
 techni...@enquora.com wrote:
 I've been trying to eliminate, or at least vastly reduce the need for 
 middleware in a specific class of applications, but this problem probably 
 cuts that task short.
 
 We have incoming documents that need to be validated against data stored in 
 other docs (rate tables, technical abbreviations, inspection codes). There 
 is a high procedural cost to allowing documents in and validating them after 
 saving (queue processing), but validation functions don't allow bringing in 
 external data (correct me if I'm wrong, please.) Without writing middleware 
 or sideware (an external handler), I don't see a way to do this.
 
 What are the arguments against including a native 'lookup' function 
 available to validation functions only. This seems to be a side-effect-free 
 usage case to me. Reads are cheap, correct?
 If my memory hasn't completely failed, this was a significant problem in 
 Lotus Notes until such a function was added, somewhere around version 3.3 or 
 4.0.
 
 David Richardson
 
 The problem with this approach is that validation is run during
 replication as well, so any multi-doc data dependencies become
 problematic in ad-hoc clusters.
 
 The way to do this without middleware is to have a backend
 asynchronous process (maybe node.js) that consumes _changes and acts
 on particular updates. So a user saves the doc with a pending state
 and the _changes handler sees that and initiates validation.
 
 (One trick you can do with this is use _changes heartbeat to generate
 a cron trigger, so your async process can also do things based on
 time.)
 
 I'd like to see this async/cron functionality bundled with Couch
 itself, but there's only so much time in the day and I don't think
 it's worth delaying any releases for. Patches extremely welcome, but I
 think this one might require more discussion before anyone would be
 ready to run off and write it.
 
 Chris
 
 
 
 -- 
 Chris Anderson
 http://jchrisa.net
 http://couch.io



Re: Document validation involving other documents

2010-01-24 Thread Chris Anderson
On Sun, Jan 24, 2010 at 2:25 PM, David Richardson techni...@enquora.com wrote:
 That's exactly what creates the high associated procedure cost - the 
 documents are submitted by mobile workers who connect for very short times, 
 sometimes very infrequently. If the document is accepted with invalid entries 
 and processed later, the opportunity to alert the author of the problem(s) 
 during the request/response cycle is lost. This means someone else must fix 
 the mistake (extra work), and a very unwieldy process (usually manual) must 
 be instituted to alert the author (training/feedback is harder and less 
 effective).

 I'm now willing to accept extra moving parts living in front of the db - 
 there are other things that are easier to accomplish there too.
 The cron async process smells like Greenspun's 10th law, extrapolated to 
 transaction managers and message brokers ;-)


The advantage of having an asynchronous document state machine handler
is a whole lot of decoupling, as the queue is just a side effect of
the database.

 I've been trying to eliminate, or at least vastly reduce the need for 
 middleware in a specific class of applications, but this problem probably 
 cuts that task short.

I'm wary of the procedural-controller middleware approach because I'm
afraid it encourages you to program as though you have transactions.
(Eg: change these 4 documents at the same time and hope for the best.)

If you are careful with asynchronous handlers, you can be careful to
pass the state through the database _changes feeds in a way that
doesn't give devs a false sense of multi-document transactions.

This is why I think an asynchronous node.js _changes handler (or
browser ajax handlers) or event based programming in any language is
such a good fit for CouchDB. If you transmit all of your important
events through the database, even real-time apps can scale by using
replication.

Each doc update is atomic, so there's room to model larger
transactions as a series of atomic updates.

Chris

 On Sun, Jan 24, 2010 at 12:06 PM, David Richardson
 techni...@enquora.com wrote:
 I've been trying to eliminate, or at least vastly reduce the need for 
 middleware in a specific class of applications, but this problem probably 
 cuts that task short.

 We have incoming documents that need to be validated against data stored in 
 other docs (rate tables, technical abbreviations, inspection codes). There 
 is a high procedural cost to allowing documents in and validating them 
 after saving (queue processing), but validation functions don't allow 
 bringing in external data (correct me if I'm wrong, please.) Without 
 writing middleware or sideware (an external handler), I don't see a way to 
 do this.

 What are the arguments against including a native 'lookup' function 
 available to validation functions only. This seems to be a side-effect-free 
 usage case to me. Reads are cheap, correct?
 If my memory hasn't completely failed, this was a significant problem in 
 Lotus Notes until such a function was added, somewhere around version 3.3 
 or 4.0.

 David Richardson

 The problem with this approach is that validation is run during
 replication as well, so any multi-doc data dependencies become
 problematic in ad-hoc clusters.

 The way to do this without middleware is to have a backend
 asynchronous process (maybe node.js) that consumes _changes and acts
 on particular updates. So a user saves the doc with a pending state
 and the _changes handler sees that and initiates validation.

 (One trick you can do with this is use _changes heartbeat to generate
 a cron trigger, so your async process can also do things based on
 time.)

 I'd like to see this async/cron functionality bundled with Couch
 itself, but there's only so much time in the day and I don't think
 it's worth delaying any releases for. Patches extremely welcome, but I
 think this one might require more discussion before anyone would be
 ready to run off and write it.

 Chris



 --
 Chris Anderson
 http://jchrisa.net
 http://couch.io





-- 
Chris Anderson
http://jchrisa.net
http://couch.io


Re: Document validation involving other documents

2010-01-24 Thread Chris Anderson
On Sun, Jan 24, 2010 at 5:28 PM, David Richardson techni...@enquora.com wrote:
 I'm wary of the procedural-controller middleware approach because I'm
 afraid it encourages you to program as though you have transactions.
 (Eg: change these 4 documents at the same time and hope for the best.)
 I fear I haven't communicated the usage case adequately.
 We're talking about validating a single document using a formula which 
 depends on data residing in one of several documents already in the database. 
 Think rates, crew lists, technical abbreviations, prior documents. We aren't 
 changing anything on either the incoming document or the data documents used 
 for validation.
 Simply determining whether the state of the incoming document is valid.
 The validating data *could* be compiled into the validation function, but 
 that function would need to be updated every time the validation data 
 changes. Possible, but inelegant.

 If you are careful with asynchronous handlers, you can be careful to
 pass the state through the database _changes feeds in a way that
 doesn't give devs a false sense of multi-document transactions.
 If we don't validate and communicate failures back to author during the 
 request/response cycle, we may lose the chance to do so for a week or more. 
 It also has financial implications - I've seen documents worth $ 75,000 in 
 billings that could not be processed for an extra week until things were 
 straightened out.
 It gets tiresome in a hurry.

 The documents arrive from clients that are usually not connected to the 
 internet. This is the common state of affairs in the mobile world.
 By definition an asynchronous handler is outside the request/response cycle, 
 so the communication opportunity is lost. Additionally, the validation 
 process is no longer atomic. If that is the eventual path, I'll hitch this 
 wagon to a proper message broker/transaction manager any day.

The validation wouldn't be atomic to the original write, but the state
machine model can eg: ignore pending docs in views.

I do see what you are saying about mobile browsers.

One thing would be to use couchdb validations to handle wellformedness
and an async process to handle dependencies with other docs.

There's also plans for  a security object which might do the trick for you.

The security object would be a _local document sent to the validation
function along with updates.



 This is exactly the usage scenario WebMachine was designed for, methinks. 
 I've seen sporadic references to webmachine in couchdb, including one last 
 week I thought.
 Is anything happening there?


I don't think we'd be planning to embed arbitrary webmachine
controllers. We really want to avoid any kind of multi-document
transaction-like api. We also don't want writes to be blocked by other
queries. For these reasons, the async update process makes sense.

For the mobile need to respond now use case, I think something like
node.js would be perfect.

It would sit in front of the couch and submit the update. Then it
would listen to the changes feed until it sees the document move from
pending to valid, and response to the client with OK.

Does that make sense? I know it's not CouchDB only, but it does do the trick.

Another option would be to write your own mochiweb or couchdb handler
and link it in using the config api. That is simple if you can handle
running on a custom-configured couchdb.

Chris

 cheers,
 David

 p.s. I hope this is of some interest to other people. I hear there's been a 
 recent small uptick in network activity by mobile clients on unreliable, 
 non-homogeneous networks ;-)



-- 
Chris Anderson
http://jchrisa.net
http://couch.io