RE: File System crawler question

Pranesh Vadhirajan Tue, 17 Sep 2013 09:15:51 -0700

Hi Karl,

Thanks for your response.  I've decided to not use my file monitoring
approach for a more elegant solution, I think.  I have started to implement
my own File system output connector by  extending the
org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector class.  I
have a question about this however:  Once I've created my own implementation
of an output connector from the above class, how can I register my output
connector with ManifoldCF so that jobs can run properly with my own output
connector implementation?

I.e., at the moment (before implementing my own connector) when I create an
output connection in ManifoldCF, I use the JSON API with
"org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector" as my
class name.  How can I register my own class name (let's say
"sample_package.myfileoutputconnector", which extends
"org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector") with
ManifoldCF, so that jobs can be defined to use my own output connector?

Thanks,
Pranesh Vadhirajan

-----Original Message-----
From: Karl Wright [mailto:[email protected]] 
Sent: Monday, September 16, 2013 5:59 PM
To: dev
Subject: Re: File System crawler question

Hi Pranesh,

The API basically allows you to do anything you can do in the UI.  In the UI
you would use the Document Status report to figure out what documents
belongs to a given job that were in a particular state, and that's exactly
what you will need to do here.  See the "Programmatic Operation" page.
Here are some sections of interest:

http://manifoldcf.apache.org/release/trunk/en_US/programmatic-operation.html
#Queue+query+parameters

... and the following REST operation:

repositoryconnectionquery/*<encoded_connection_name>

*
*Karl
*

On Mon, Sep 16, 2013 at 5:50 PM, Pranesh Vadhirajan <
[email protected]> wrote:

> Hi All,
>
>
>
> I am implementing my own java client to crawl file system resources 
> via the ManifoldCF JSON based API.  I have been able to define and run 
> a job to crawl a file system repository and output to a file system
destination.
>
>
>
> The trouble I'm having currently is to be able to know which documents 
> have been crawled via the ManifoldCF API.  I have looked through the 
> API documentation on the ManifoldCF release pages, but I'm unable to 
> find this information.  Could someone point me in the right direction?
>
>
>
> When I try to use java API for file system monitoring (to check on the 
> contents of the output folder), I'm having issues with the files being 
> locked during the execution of the job.  Therefore, I need to query 
> ManifoldCF engine to understand what documents have been changed in 
> the output area so that I can run my file system monitoring code on a 
> different schedule.
>
>
>
> Please let me know if I didn't explain myself well here.
>
>
>
> Thanks,
> Pranesh
>
>
>
> Pranesh Vadhirajan
>
>
>
>

RE: File System crawler question

Reply via email to