I've been working on a cpf pipeline that needs to resolve
cross-references between a large number of documents. Basically; we
want to merge some text from the referenced document into the referring
document. I wonder if folks would be able to share good ideas for a
strategy here.
The challenge with a multi-threaded loading pipeline like cpf is that
you don't know whether the referenced document is available yet. There
are two main ideas we're working with at the moment:
1) a two-pass strategy where you load *all* the documents and then
resolve all the references. This is the most straightforward and (in
theory) requires 2N updates, but I can't see how to trigger all the
updates in cpf for the second pass: maybe I'm just being thick, but it
seems to me that there will come a time when you need to go and update
all the documents (another time: make it 3N) just to set their state and
trigger phase 2 processing. And this seems to defeat the whole purpose
of cpf, since you then need to build a list of all documents in order to
retrigger phase 2.
2) a one-pass bidirectional strategy in which each document pulls
content from its resolvable references and pushes its content into
documents that reference it. This is completely order-independent, but
it results in more updates in a bad case (ie lots of cross references).
I think that if the average number of references in a given document is
M, then you get something like N + MN/2 updates if the references are
distributed evenly. So if M > 2, this will result in more updates than
strategy 1: potentially a *lot* more, if M is large.
As an aside: No matter what M is, if all the xrefs are in the last
document, this takes only N updates. If they're all in the first one,
then you get 2N updates.
I guess the questions are, then:
What's the best way to implement a two-pass processing pipeline in cpf?
Is there some other approach that I haven't thought of?
-Mike
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general