Hi,

I am working on integrating manifoldcf or mcf with alfresco cms as
repository connector using CMIS query and using solr as output channel
where all index are stored. I am able to do it fine & can search documents
in solr index.

Now as part of implementation, i am planing to introduce multiple
repository such as sharepoint, file systems etc. so now i have three
document repositories : alfresco, sharepoint & filesystem. I am planning to
have scheduled jobs which run through each of repositories and crawl these
at particular intervals. But i have following contentions.

1. Although i am scheduling jobs for frequent intervals, i want to make
sure that mcf jobs pick only those content which are either added new or
updated say i have 100 docs dring current job run but say 110 at next job
run so i only want to run jobs for new 10 docs not entire 110 docs.
2. As there are relatively lesser mcf tutorials available, i have no means
to ensure that mcf jobs behaves this way but i assume it is intelligent
enough to behave this way but again no proof  to substantiate it.
3. I want to know more about mcf job schedule type : scan every document
once/rescan documents directly. Similarly i want to know more about job
invocation : complete/minimal. i would be sorry for being a newbie.
4. Also i am considering about doing some custom coding to ensure that only
latest/updated docs are eligible for processing but again going thru code
only as less documentation available.
5. Is it wise to doc custom coding in this case or mcf provides all these
features OOTB.

I would appreciate for any response.

Regards,
Lalit Jangra.

Reply via email to