[jira] [Commented] (CONNECTORS-1497) Re-index seeded modified documents when the re-crawl interval is infinity and connector model is MODEL_ADD_CHANGE

Karl Wright (JIRA) Mon, 26 Feb 2018 10:26:44 -0800

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377310#comment-16377310
 ]


Karl Wright commented on CONNECTORS-1497:
-----------------------------------------

Changing the status, though, is not a good idea and would cause many other 
things to break.  Specifically, that status is used to determine whether, on 
recovery, the document should be deleted or just requeued.

So if the PENDINGPURGATORY status is blocking the update of the time, the code 
that needs to change would be in updateExistingRecordInitial().  Specifically, 
this method should receive a boolean flag that tells it how to handle 
PENDINGPURGATORY documents, because just checking the document status is 
insufficient by itself to determine what should be done.

I'll have to chase that back further to see whether there are multiple places 
this method is called from and if so whether we can make a distinction on how 
it's called based on context.


> Re-index seeded modified documents when the re-crawl interval is infinity and 
>   connector model is MODEL_ADD_CHANGE
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1497
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1497
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework agents process
>    Affects Versions: ManifoldCF 2.9.1
>            Reporter: Ahmed Mahfouz
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: CONNECTORS-1497.patch
>
>
> Trying to avoid a full scan of all documents for a better efficiency with a 
> large number of documents. I tried so many different setting for the Jobs but 
> I couldn't accomplish that. Especially when the repository connector model is 
> MODEL_ADD_CHANGE I was expecting the modified documents seeded should be 
> re-indexed immediately similar to the new seeds but I found out it uses the 
> re-crawl time as the scheduled time and it waits for the full scan to get 
> re-indexed. I avoided full scan by setting the re-crawl interval to infinity 
> but still, my modified documents seeds were not getting indexed. After 
> digging into the code for quite good time. I did some modification to the 
> JobManager and it worked for me. I would like to share the change with you 
> for review so I opened this ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1497) Re-index seeded modified documents when the re-crawl interval is infinity and connector model is MODEL_ADD_CHANGE

Reply via email to