Hi Karl, Thanks a lot for your response. Now everything is clear. I had the intuition of using the activities object but honestly I didn't go through the documentation. My fault. I will take it now into account.
Cheers, Rafa On Friday, August 8, 2014, Karl Wright <[email protected]> wrote: > Hi Rafa, > > The processDocuments() method decides what the disposition of every > document should be for each document it is handed. Your connector is > expected to call one of several different IProcessActivities depending on > what the decision is. See the 1.7 Javadoc for IProcessActivity: > > * The processing flow for a document is expected to go something like this: > * (1) The connector's processDocuments() method is called with a set of > documents to be processed. > * (2) The connector computes a version string for each document in the set > as part of determining > * whether the document indeed needs to be refetched. > * (3) For each document processed, there can be one of several > dispositions: > * (a) There is no such document (anymore): deleteDocument() called for > the document. > * (b) The document is (re)indexed: ingestDocumentWithException() is > called for the document. > * (c) The document is determined to be unchanged and no updates are > needed: nothing needs to be called > * for the document. > * (d) The document is determined to be unchanged BUT the version string > needs to be updated: recordDocument() > * is called for the document. > * (e) The document is determined to be unindexable BUT it still exists in > the repository: noDocument() > * is called for the document. > * (f) There was a service interruption: ServiceInterruption is thrown. > * (4) In order to determine whether a document needs to be reindexed, the > method checkDocumentNeedsReindexing() > * is available to return an opinion on that matter. > > This is not quite complete because there is also a removeDocument() method > that is available which is > not described, but you get the idea. So it doesn't make much sense to for > processDocuments() to also return results; essentially the > processDocument() method has to do that already. > > As for this question: > >>>>>> > Would be reasonable to also generally extend the Transformation Connector > and Output Connector interfaces to allow returning not only a > rejection/acceptance code but also a Reason String Message? > <<<<<< > > Well, the idea right now behind accept/reject is that it informs the > framework whether to remove the document from the queue or not. There's no > place to record why it was removed from the queue, since it's no longer in > the queue at all. Instead, your repository, transformation, or output > connector can record the basic reason for rejection in the history for the > crawl. For example, if it calls checkMimeTypeIndexable() and gets back a > false result, it can record that the document was rejected because the mime > type of XXX was not accepted by the downstream pipeline. This will tell > you what happened to the document, and roughly why. Later, we could > consider having the check methods return a status object rather than a > boolean, so a > more detailed message could be provided for history logging or connector > log output. If you open a ticket for this, it would probably need to wait > until 2.0 though. > > Hope this helps. > Karl > > > > On Fri, Aug 8, 2014 at 8:59 AM, Rafa Haro <[email protected] <javascript:;>> > wrote: > > > Hi devs, > > > > I have a quick question, more a curiosity than other thing. At > > Transformation Connectors and Output Connectors, we have the possibility > to > > return a code like DOCUMENTSTATUS_REJECTED as result of the main > > addDocument method, indicating that the document has been rejected. I > > suppose that this code is recorded by Manifold and later the user can > check > > for the rejected documents. I’m facing now a situation in a Repository > > Connector I’m extending where I have enough information about the > document > > to decide rejecting it or not. But I have not found any way within the > > Framework to notify this rejection in a Repository Connector. > > > > Couple of questions: > > > > - Would be reasonable to extend current Repository Connector interface > for > > allowing returning a rejection or acceptance code in the processDocuments > > method? > > > > - Would be reasonable to also generally extend the Transformation > > Connector and Output Connector interfaces to allow returning not only a > > rejection/acceptance code but also a Reason String Message? > > > > Thanks all!! > > > > >
