(Trying to get the email thriough the system; it hasn't
appeared on my epimorphics mailbox inbox)

I posted to Stephen:

Dealing with deletions. We do not yet address the general case.
TextDocProducerTriples simply ignores the deletion case and
we don't change that behaviour in jena-text -- instead we use an
alternative docProducer (which you can see in the Epimorphics
github repository in ppd-text-index). This docProducer deals
with deletions in this way:

- incoming quads are accumulated while the subject remains
  the same. When the subject changes (or we reach finish())
  then we deal with the batch.

- we delete the documents corresponding to the subject.

- however we may have deleted documents that should still
  exist. We make a new document entity and then reach back
  into the dataset and add to the entity all the quads
  that are still in the dataset and about this subject.

- if we added any quads, then we put this new entity back
  into the index

This requires that the producer has access to the dataset,
which is why our TextDocProducerBatch takes the dataset
(graph) as one of its constructor arguments. Our branch's
assembler, when attempting to construct the producer
specified by its classname, checks to see if there's a
two-argument form (DatasetGraph, TextIndex) as well as
the one-argument form (TextIndex).


-- 
Chris "allusive" Dollin

Reply via email to