My team is creating a new repository connector. The source system has
a delta API that lets us know of all new, modified, and deleted
individual folders and documents since the last call to the API. Each
call to the delta API provides the changes, as well as a token which
can be provided on subsequent calls to get changes since that token
was generated/returned.

What is the best approach to building a repo connector to a system
that has this type of delta API?

Our first design was an implementation that specifies
`MODEL_ADD_CHANGE_DELETE` and then:

* In addSeedDocuments, on the initial call we seed every document in
the source system. On subsequent calls, we use the delta API to seed
every added, modified, or deleted file. We return the delta API token
as the version value of addSeedDocuments, so that it an be used on
subsequent calls.

* In processDocuments, we do the usual thing for each document identifier.

On prototyping, this works for new docs, but "processDocuments" is
never triggered for modified and deleted docs.

A second design we are considering is to use
MODEL_CHAINED_ADD_CHANGE_DELETE and have addSeedDocuments return only
one "virtual" document, which represents the root of the remote repo.

Then, in "processDocuments" the new "document" is used to determine
all the child documents of that delta call, which are then added to
the queue via `activities.addDocumentReference`. To force the "virtual
seed" to trigger processDocuments again on the next call to
`addSeedDocuments`, we do `activities.deleteDocument(virtualDocId)` as
well.

With this alternative design, the stage 1 seed effectively becomes a
no-op, and is just used as a mechanism to trigger stage 2.

Thoughts?

Regards,
Raman Gupta

Reply via email to