I've been working on a cpf pipeline that needs to resolve cross-references between a large number of documents. Basically; we want to merge some text from the referenced document into the referring document. I wonder if folks would be able to share good ideas for a strategy here.

The challenge with a multi-threaded loading pipeline like cpf is that you don't know whether the referenced document is available yet. There are two main ideas we're working with at the moment:

1) a two-pass strategy where you load *all* the documents and then resolve all the references. This is the most straightforward and (in theory) requires 2N updates, but I can't see how to trigger all the updates in cpf for the second pass: maybe I'm just being thick, but it seems to me that there will come a time when you need to go and update all the documents (another time: make it 3N) just to set their state and trigger phase 2 processing. And this seems to defeat the whole purpose of cpf, since you then need to build a list of all documents in order to retrigger phase 2.

2) a one-pass bidirectional strategy in which each document pulls content from its resolvable references and pushes its content into documents that reference it. This is completely order-independent, but it results in more updates in a bad case (ie lots of cross references). I think that if the average number of references in a given document is M, then you get something like N + MN/2 updates if the references are distributed evenly. So if M > 2, this will result in more updates than strategy 1: potentially a *lot* more, if M is large.

As an aside: No matter what M is, if all the xrefs are in the last document, this takes only N updates. If they're all in the first one, then you get 2N updates.

I guess the questions are, then:

What's the best way to implement a two-pass processing pipeline in cpf?

Is there some other approach that I haven't thought of?

-Mike
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to