Hi Rafa,

The processDocuments() method decides what the disposition of every
document should be for each document it is handed.  Your connector is
expected to call one of several different IProcessActivities depending on
what the decision is.  See the 1.7 Javadoc for IProcessActivity:

* The processing flow for a document is expected to go something like this:
* (1) The connector's processDocuments() method is called with a set of
documents to be processed.
* (2) The connector computes a version string for each document in the set
as part of determining
*    whether the document indeed needs to be refetched.
* (3) For each document processed, there can be one of several dispositions:
*   (a) There is no such document (anymore): deleteDocument() called for
the document.
*   (b) The document is (re)indexed: ingestDocumentWithException() is
called for the document.
*   (c) The document is determined to be unchanged and no updates are
needed: nothing needs to be called
*     for the document.
*   (d) The document is determined to be unchanged BUT the version string
needs to be updated: recordDocument()
*     is called for the document.
*   (e) The document is determined to be unindexable BUT it still exists in
the repository: noDocument()
*    is called for the document.
*   (f) There was a service interruption: ServiceInterruption is thrown.
* (4) In order to determine whether a document needs to be reindexed, the
method checkDocumentNeedsReindexing()
*    is available to return an opinion on that matter.

This is not quite complete because there is also a removeDocument() method
that is available which is
not described, but you get the idea.  So it doesn't make much sense to for
processDocuments() to also return results; essentially the
processDocument() method has to do that already.

As for this question:
>>>>>>
Would be reasonable to also generally extend  the Transformation Connector
and Output Connector interfaces to allow returning not only a
rejection/acceptance code but also a Reason String Message?
<<<<<<

Well, the idea right now behind accept/reject is that it informs the
framework whether to remove the document from the queue or not.  There's no
place to record why it was removed from the queue, since it's no longer in
the queue at all.  Instead, your repository, transformation, or output
connector can record the basic reason for rejection in the history for the
crawl.  For example, if it calls checkMimeTypeIndexable() and gets back a
false result, it can record that the document was rejected because the mime
type of XXX was not accepted by the downstream pipeline.  This will tell
you what happened to the document, and roughly why.  Later, we could
consider having the check methods return a status object rather than a
boolean, so a
more detailed message could be provided for history logging or connector
log output.  If you open a ticket for this, it would probably need to wait
until 2.0 though.

Hope this helps.
Karl



On Fri, Aug 8, 2014 at 8:59 AM, Rafa Haro <[email protected]> wrote:

> Hi devs,
>
> I have a quick question, more a curiosity than other thing. At
> Transformation Connectors and Output Connectors, we have the possibility to
> return a code like DOCUMENTSTATUS_REJECTED as result of the main
> addDocument method, indicating that the document has been rejected. I
> suppose that this code is recorded by Manifold and later the user can check
> for the rejected documents. I’m facing now a situation in a Repository
> Connector I’m extending where I have enough information about the document
> to decide rejecting it or not. But I have not found any way within the
> Framework to notify this rejection in a Repository Connector.
>
> Couple of questions:
>
> - Would be reasonable to extend current Repository Connector interface for
> allowing returning a rejection or acceptance code in the processDocuments
> method?
>
> - Would be reasonable to also generally extend  the Transformation
> Connector and Output Connector interfaces to allow returning not only a
> rejection/acceptance code but also a Reason String Message?
>
> Thanks all!!
>
>

Reply via email to