[
https://issues.apache.org/jira/browse/CONNECTORS-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497956#comment-13497956
]
Maciej Lizewski commented on CONNECTORS-567:
--------------------------------------------
I would also go with two scenarios to maintain compatibility with current model.
My point is that there plenty case when listing document also gives you
information about its version: directory listing gives you file modyfication
time, SQL query can return document ID and its version, web interfaces (REST,
WebService) often support scenario: getObjectsList which gives you document IDs
and almost always some document information like modyfication time, version,
owner, etc and separate method for fetching whole document.
Your proposition to have all-in-one is not as good because: like I said earlier
common interfaces have separate methods for fetching lists and single documents
and you would have to first fetch the list and then for every document fetch
its conent. Another reason is that in real world documents are not changed very
often and fetching their content every time is much not needed overhead.
And last but not least - what I mean by "old enough" - when you call
addSeedDocuments there are several scenarios but in most cases this method can
provide new documents, updated documents and often all other documents that
still exists. There are still some documents that were deleted and
addSeedDocuemnts mostly will not return them. they are injected to reindexing
process from database of previously indexed document, and when
getDocumentVersion returns null - they are removed. That is clear and this is
what I mainly meant: getDocumentVesrions could be used to fetch versions for
documents that are already in our database, but addSeedDocuemnts did not
returned them (either because they were deleted or they were just not modified
and addSeedDocuments just return new and modified documents)
So I was thinking of such (re)indexing process:
1. mark all already indexed document to re-index
2. call addSeedDocuments which can provide versions for documents or not
3. call getDocumentVersions for all documents that were not added by
addSeedDocuments with version (this means that it should be called also for
documents added by addSeedDocuemnts but without version - this is the backward
compatibility)
4. call processDocuments as usual.
now - if addSeedDocuments does not provide versions at all this process is
pretty same as it is working now. If addSeedDocuments provides versions for
some(all) documents - those are excluded from calls to getDocumentVersions.
>From connector side the difference could be just in calling overloaded
>ISeedingActivity::addSeedDocument method with second argument:
addSeedDocument(idValue) or addSeedDocument(idValue, version)
of course I understand it means much more hidden work on the other side of this
interface :)
What do think about it?
> Extended seeding interface which provides document versions
> -----------------------------------------------------------
>
> Key: CONNECTORS-567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-567
> Project: ManifoldCF
> Issue Type: Wish
> Reporter: Maciej Lizewski
>
> There are some cases when seeding function can provide document version with
> data it already has.
> Current data flow needs one call to addSeedDocuments, then call to
> getDocumentVersions, which essentialy must fetch same data, and after that
> one more call to processDocuments. The last one probably needs separate call
> because it needs to fetch document body, however seeding and getting versions
> in many cases work on very same data (and probably duplicating requests to
> repository).
> Now - reducing number of needed request to repository by eliminating
> getDocumentVersions call for document which have version returned by
> addSeedDocuments could significantly reduce load.
> getDocumentVersions would still be called for older docuemnts (not returned
> by addSeedDocuments) to check if they were modified or deleted.
> This is only proposition...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira