Hi Karl, Thanks for your response. I've decided to not use my file monitoring approach for a more elegant solution, I think. I have started to implement my own File system output connector by extending the org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector class. I have a question about this however: Once I've created my own implementation of an output connector from the above class, how can I register my output connector with ManifoldCF so that jobs can run properly with my own output connector implementation?
I.e., at the moment (before implementing my own connector) when I create an output connection in ManifoldCF, I use the JSON API with "org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector" as my class name. How can I register my own class name (let's say "sample_package.myfileoutputconnector", which extends "org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector") with ManifoldCF, so that jobs can be defined to use my own output connector? Thanks, Pranesh Vadhirajan -----Original Message----- From: Karl Wright [mailto:[email protected]] Sent: Monday, September 16, 2013 5:59 PM To: dev Subject: Re: File System crawler question Hi Pranesh, The API basically allows you to do anything you can do in the UI. In the UI you would use the Document Status report to figure out what documents belongs to a given job that were in a particular state, and that's exactly what you will need to do here. See the "Programmatic Operation" page. Here are some sections of interest: http://manifoldcf.apache.org/release/trunk/en_US/programmatic-operation.html #Queue+query+parameters ... and the following REST operation: repositoryconnectionquery/*<encoded_connection_name> * *Karl * On Mon, Sep 16, 2013 at 5:50 PM, Pranesh Vadhirajan < [email protected]> wrote: > Hi All, > > > > I am implementing my own java client to crawl file system resources > via the ManifoldCF JSON based API. I have been able to define and run > a job to crawl a file system repository and output to a file system destination. > > > > The trouble I'm having currently is to be able to know which documents > have been crawled via the ManifoldCF API. I have looked through the > API documentation on the ManifoldCF release pages, but I'm unable to > find this information. Could someone point me in the right direction? > > > > When I try to use java API for file system monitoring (to check on the > contents of the output folder), I'm having issues with the files being > locked during the execution of the job. Therefore, I need to query > ManifoldCF engine to understand what documents have been changed in > the output area so that I can run my file system monitoring code on a > different schedule. > > > > Please let me know if I didn't explain myself well here. > > > > Thanks, > Pranesh > > > > Pranesh Vadhirajan > > > >
