My fault actually,
I was making experiments so
I indexed the document D directly to solr
added a reference in processDocuments to doc D
getDocumentVersions() was returning null for doc D
but it wasn't removed…
then I realized that manifold doesn't remove what it didn't index
itself (not all crawlers behave this way)
So I made another test indexing doc D with manifold and everything
works as expected
hope this helps others
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
Il giorno 16/giu/2014, alle ore 19:11, Karl Wright ha scritto:
> Hi Matteo,
>
> The document should be deleted from the target repository when you return a
> null document version. Why do you think it does not?
>
> As for your second question, please read up on the various models that the
> crawler supports. They're described pretty thoroughly in ManifoldCF in
> Action.
>
> Karl
>
>
>
> On Mon, Jun 16, 2014 at 12:47 PM, Matteo Grolla <[email protected]>
> wrote:
>
>> Hi,
>> I see that if I return null in getDocumentVersions() (actually
>> the array values are null)
>> the method processDocuments is not called for the corresponding identifiers
>> But the document is not deleted from the target repository.
>> I'm using the filesystem connector, so those are my settings for the
>> crawling mode.
>> Supposing that my source repository gives me the list of deleted
>> documents, what should I do to handle the deletion?
>>
>> Cheers
>>
>> --
>> Matteo Grolla
>> Sourcesense - making sense of Open Source
>> http://www.sourcesense.com
>>
>>