Re: Index Purged if no new documents are seeded

Rafa Haro Wed, 17 Sep 2014 08:33:59 -0700

Hi Karl, 

As always, thanks for your quick response. Changing the model to 
MODEL_ADD_CHANGE_DELETE did the trick. About the seeding string, we already 
managed that.

Thanks a lot. We aim the community to test this connector also :-)

Cheers,
Rafa

En 17 de septiembre de 2014 en 16:51:18, Karl Wright ([email protected]) 
escrito:

Hi Rafa,  

You probably need to do a few things to get your connector working right.  
First, what connector model are you using? MODEL_ALL is the default, and  
it tells ManifoldCF that your seeding method supplies ALL matching  
documents, and that's probably not right. Maybe you want MODEL_ADD_CHANGE  
instead. Second, please be sure your connector deals properly with the  
situation where the previous seeding string is empty. The seeding string  
is set to empty whenever someone changes the document specification for a  
job. In that case, you should always seed as if from the beginning of time.  

I will not have a chance to review your code for a while due to other  
issues I'm currently looking at, but based on your description of the  
problem, you've probably chosen the wrong seeding model.  

Thanks,  
Karl  

On Wed, Sep 17, 2014 at 10:41 AM, Rafa Haro <[email protected]> wrote:  

> Hi folks,  
>  
> We have been working on an “unofficial” Alfresco connector that currently  
> is more or less working for Manifold 1.7. You can check the code here:  
> https://github.com/rafaharo/alfresco-webscript-manifold-connector. The  
> README.md file is out of date, so please ignore it. Basically, this  
> connector is using a client that consumes a set of Alfresco webscritps for  
> dealing with content and metadata crawling. Documents seeding is based on  
> Alfresco transactions, so the connector keeps asking alfresco for a  
> concrete number of transactions until no new transactions are found. The  
> transactions info, among others things, indicates if a documents has been  
> deleted so, later, while processing the documents, those documents are  
> marked to be deleted.  
>  
> In the first run, all the available documents identifiers are seeded. In  
> the next runs, we thought to seed only those documents affected by new  
> transactions (new documents, any change at any level or deletions). And  
> this is what is happening right now: for example, if there is not new  
> transactions, any document is seeded and the whole index is purged (all the  
> previous indexed documents are deleted).  
>  
> My question is: is this a normal behavior ? How can we avoid it? Is there  
> any configuration option for the jobs? We have read about minimal and  
> complete runs, but it is still not clear for us.  
>  
> Thanks a lot!  
> Cheers,  
> Rafa  
>  
>  
>

Re: Index Purged if no new documents are seeded

Reply via email to