Re: File System crawler question

Karl Wright Mon, 16 Sep 2013 15:06:46 -0700

Hi Pranesh,

The API basically allows you to do anything you can do in the UI.  In the
UI you would use the Document Status report to figure out what documents
belongs to a given job that were in a particular state, and that's exactly
what you will need to do here.  See the "Programmatic Operation" page.
Here are some sections of interest:


http://manifoldcf.apache.org/release/trunk/en_US/programmatic-operation.html#Queue+query+parameters

... and the following REST operation:

repositoryconnectionquery/*<encoded_connection_name>

*
*Karl
*


On Mon, Sep 16, 2013 at 5:50 PM, Pranesh Vadhirajan <
[email protected]> wrote:

> Hi All,
>
>
>
> I am implementing my own java client to crawl file system resources via the
> ManifoldCF JSON based API.  I have been able to define and run a job to
> crawl a file system repository and output to a file system destination.
>
>
>
> The trouble I'm having currently is to be able to know which documents have
> been crawled via the ManifoldCF API.  I have looked through the API
> documentation on the ManifoldCF release pages, but I'm unable to find this
> information.  Could someone point me in the right direction?
>
>
>
> When I try to use java API for file system monitoring (to check on the
> contents of the output folder), I'm having issues with the files being
> locked during the execution of the job.  Therefore, I need to query
> ManifoldCF engine to understand what documents have been changed in the
> output area so that I can run my file system monitoring code on a different
> schedule.
>
>
>
> Please let me know if I didn't explain myself well here.
>
>
>
> Thanks,
> Pranesh
>
>
>
> Pranesh Vadhirajan
>
>
>
>

Re: File System crawler question

Reply via email to