Re: File System crawler question

Karl Wright Tue, 17 Sep 2013 09:19:48 -0700

You can either add your connector's class name to the connectors.xml file,
in which case it will be registered when ManifoldCF is started, or you can
use the command-line command for registering a connector.  See the
Programmatic Operation page for that option.


Karl


On Tue, Sep 17, 2013 at 11:47 AM, Pranesh Vadhirajan <
[email protected]> wrote:

> Hi Karl,
>
> Thanks for your response.  I've decided to not use my file monitoring
> approach for a more elegant solution, I think.  I have started to implement
> my own File system output connector by  extending the
> org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector class.
>  I
> have a question about this however:  Once I've created my own
> implementation
> of an output connector from the above class, how can I register my output
> connector with ManifoldCF so that jobs can run properly with my own output
> connector implementation?
>
> I.e., at the moment (before implementing my own connector) when I create an
> output connection in ManifoldCF, I use the JSON API with
> "org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector" as my
> class name.  How can I register my own class name (let's say
> "sample_package.myfileoutputconnector", which extends
> "org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector") with
> ManifoldCF, so that jobs can be defined to use my own output connector?
>
> Thanks,
> Pranesh Vadhirajan
>
> -----Original Message-----
> From: Karl Wright [mailto:[email protected]]
> Sent: Monday, September 16, 2013 5:59 PM
> To: dev
> Subject: Re: File System crawler question
>
> Hi Pranesh,
>
> The API basically allows you to do anything you can do in the UI.  In the
> UI
> you would use the Document Status report to figure out what documents
> belongs to a given job that were in a particular state, and that's exactly
> what you will need to do here.  See the "Programmatic Operation" page.
> Here are some sections of interest:
>
>
> http://manifoldcf.apache.org/release/trunk/en_US/programmatic-operation.html
> #Queue+query+parameters
>
> ... and the following REST operation:
>
> repositoryconnectionquery/*<encoded_connection_name>
>
> *
> *Karl
> *
>
>
> On Mon, Sep 16, 2013 at 5:50 PM, Pranesh Vadhirajan <
> [email protected]> wrote:
>
> > Hi All,
> >
> >
> >
> > I am implementing my own java client to crawl file system resources
> > via the ManifoldCF JSON based API.  I have been able to define and run
> > a job to crawl a file system repository and output to a file system
> destination.
> >
> >
> >
> > The trouble I'm having currently is to be able to know which documents
> > have been crawled via the ManifoldCF API.  I have looked through the
> > API documentation on the ManifoldCF release pages, but I'm unable to
> > find this information.  Could someone point me in the right direction?
> >
> >
> >
> > When I try to use java API for file system monitoring (to check on the
> > contents of the output folder), I'm having issues with the files being
> > locked during the execution of the job.  Therefore, I need to query
> > ManifoldCF engine to understand what documents have been changed in
> > the output area so that I can run my file system monitoring code on a
> > different schedule.
> >
> >
> >
> > Please let me know if I didn't explain myself well here.
> >
> >
> >
> > Thanks,
> > Pranesh
> >
> >
> >
> > Pranesh Vadhirajan
> >
> >
> >
> >
>
>
>

Re: File System crawler question

Reply via email to