Hi Rafa, The processDocuments() method decides what the disposition of every document should be for each document it is handed. Your connector is expected to call one of several different IProcessActivities depending on what the decision is. See the 1.7 Javadoc for IProcessActivity:
* The processing flow for a document is expected to go something like this: * (1) The connector's processDocuments() method is called with a set of documents to be processed. * (2) The connector computes a version string for each document in the set as part of determining * whether the document indeed needs to be refetched. * (3) For each document processed, there can be one of several dispositions: * (a) There is no such document (anymore): deleteDocument() called for the document. * (b) The document is (re)indexed: ingestDocumentWithException() is called for the document. * (c) The document is determined to be unchanged and no updates are needed: nothing needs to be called * for the document. * (d) The document is determined to be unchanged BUT the version string needs to be updated: recordDocument() * is called for the document. * (e) The document is determined to be unindexable BUT it still exists in the repository: noDocument() * is called for the document. * (f) There was a service interruption: ServiceInterruption is thrown. * (4) In order to determine whether a document needs to be reindexed, the method checkDocumentNeedsReindexing() * is available to return an opinion on that matter. This is not quite complete because there is also a removeDocument() method that is available which is not described, but you get the idea. So it doesn't make much sense to for processDocuments() to also return results; essentially the processDocument() method has to do that already. As for this question: >>>>>> Would be reasonable to also generally extend the Transformation Connector and Output Connector interfaces to allow returning not only a rejection/acceptance code but also a Reason String Message? <<<<<< Well, the idea right now behind accept/reject is that it informs the framework whether to remove the document from the queue or not. There's no place to record why it was removed from the queue, since it's no longer in the queue at all. Instead, your repository, transformation, or output connector can record the basic reason for rejection in the history for the crawl. For example, if it calls checkMimeTypeIndexable() and gets back a false result, it can record that the document was rejected because the mime type of XXX was not accepted by the downstream pipeline. This will tell you what happened to the document, and roughly why. Later, we could consider having the check methods return a status object rather than a boolean, so a more detailed message could be provided for history logging or connector log output. If you open a ticket for this, it would probably need to wait until 2.0 though. Hope this helps. Karl On Fri, Aug 8, 2014 at 8:59 AM, Rafa Haro <[email protected]> wrote: > Hi devs, > > I have a quick question, more a curiosity than other thing. At > Transformation Connectors and Output Connectors, we have the possibility to > return a code like DOCUMENTSTATUS_REJECTED as result of the main > addDocument method, indicating that the document has been rejected. I > suppose that this code is recorded by Manifold and later the user can check > for the rejected documents. I’m facing now a situation in a Repository > Connector I’m extending where I have enough information about the document > to decide rejecting it or not. But I have not found any way within the > Framework to notify this rejection in a Repository Connector. > > Couple of questions: > > - Would be reasonable to extend current Repository Connector interface for > allowing returning a rejection or acceptance code in the processDocuments > method? > > - Would be reasonable to also generally extend the Transformation > Connector and Output Connector interfaces to allow returning not only a > rejection/acceptance code but also a Reason String Message? > > Thanks all!! > >
