Prasad Perera created CONNECTORS-1009:
-----------------------------------------

             Summary: Cmis Repository Connector does not handle Document 
updating properly
                 Key: CONNECTORS-1009
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1009
             Project: ManifoldCF
          Issue Type: Bug
          Components: CMIS connector
            Reporter: Prasad Perera
            Priority: Minor


As a part of the Fix for CONNECTORS-1004, It seems CmisRepositoryConnector does 
not handle document updating properly.

Case Scenario:

* Create a continuous crawling job using  CmisRepositoryConnector.
* Update a document on repository end.
* The document keep submitting to OutputConnector at each crawling interval 
though it was not updated afterwards.

One possible Fix needed I is : @ CmisRepositoryConnector:processDocument,

 activities.ingestDocumentWithException(nodeId, version, documentURI, rd);
The documentURI should point to the old document URI (Now it points to the 
latest documentURI discovered and it may seems to confuse document references ?)

Also, In ECM systems, for example in Alfresco, the documentIDs are formulated 
with the version number as well.
Ex: workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.0 --> 
version 1.0
workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.1 --> version 1.1

When we setup a query to crawl a repository folder, we discover content by 
referring the child nodes. Because of that, now it seems to queue all the 
document versions and submit them to OutputConnector thus producing duplicate 
documents at the output (search) side.
Is there a way to avoid this problem ? It will be great if the repository can 
just take the latest document version and submit it as an update.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to