Hi Karl, 

Thanks for the answer. 

Is your suggestion something like :

processDocuments(...) {

        if(documentIdentifier.isURI) {
                jsonDocs = getJsonDocsFromURI(documentIdentifier)
                jsonDocs.foreach(jsonDoc -> {
                        String jsonDocID = "jsonDoc+" + jsonDoc.toJsonString();
                        activities.addDocumentReference(jsonDocID);
                })
        } else if(documentIdentifier.isJsonDoc) {
                jsonDoc = getJsonDoc(documentIdentifier)
                jsonDocVersion = jsonDoc.getVersion()
                jsonDocUri = jsonDoc.getUri();
                if(activities.checkDocumentNeedsReindexing(documentIdentifier, 
jsonDocVersion)) {
                        
activities.ingestDocumentWithException(documentIdentifier, jsonDoc, jsonDocUri)
                }
        }
}  

?

Julien

-----Message d'origine-----
De : Karl Wright <daddy...@gmail.com> 
Envoyé : vendredi 4 octobre 2019 21:07
À : dev <dev@manifoldcf.apache.org>
Objet : Re: Technical question on repo connector dev

Hi Julien,

The checkDocumentNeedsReindexing() method is meant to be used inside
processDocuments() for the specific document you are checking.  So you can 
convert your URI to a set of JSON documents, if the document identifier is a 
URI, But you will probably want to put the actual data for the document in 
carrydown information.  You will need to also create some kind of non-URI 
document ID too.

Karl


On Fri, Oct 4, 2019 at 1:36 PM <julien.massi...@francelabs.com> wrote:

> Hi,
>
>
>
> I am facing a simple technical case that I am not sure how to deal 
> with, concerning the development of a repository connector.
>
>
>
> I want to develop a repo connector using the ADD_CHANGE_DELETE model 
> that will normally add seed documents, and each seed document will 
> produce several documents.
> The problem is that each produced document from a seed doc is 
> instantly ingest-able and does not need to be processed.
>
>
>
> The use case here is that the addSeedDocuments method will call an API 
> that will provide several URIs (seeds).
>
> In the processDocuments method, each URI provides a JSON array 
> containing JSON objects and those JSON objects are meant to become 
> repository documents and ingested.
> So the logic would be to use the activities.addDocumentReference for 
> each JSON object before I can use the 
> activities.checkDocumentNeedsReindexing
> (each JSON object has an id and a version field) and then ingest the 
> document. But by doing this, I am afraid that the processDocuments 
> method will be called with those newly referenced docs while they do 
> not need to be processed.
>
>
>
> Any suggestion about how to deal with this use case is welcome.
>
>
>
> Thanks,
> Julien
>
>

Reply via email to