[
https://issues.apache.org/jira/browse/CONNECTORS-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242450#comment-14242450
]
Aeham Abushwashi commented on CONNECTORS-1118:
----------------------------------------------
The profiler trace looks much better now. Thank you!
> Documents processed by the shared drive connector incur an unnecessary
> synchronisation hit
> ------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1118
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Framework core
> Affects Versions: ManifoldCF 1.7.2
> Reporter: Aeham Abushwashi
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.8, ManifoldCF 2.0
>
>
> Each document processed by the shared drive connector is passed through
> SharedDriveConnector#checkInclude to verify whether the document is eligible
> for ingestion. The calls made here to
> WorkerThread$ProcessActivity#checkMimeTypeIndexable and
> WorkerThread$ProcessActivity#checkLengthIndexable are unnecessarily costly as
> they each create a fresh instance of IncrementalIngester$PipelineConnections
> on every call. The constructor of IncrementalIngester$PipelineConnections can
> be very expensive due to the loading of output connection objects, which in
> turn requires some locking (via ZK - in a distrubuted environment).
> The other area of inefficiency is in
> WorkerThread$ProcessActivity#processDocumentReferences. This method creates
> new instances of PriorityCalculator using the less-efficient 3-arg
> constructor. This can be addressed using the same pattern implemented for
> CONNECTORS-1094
> To highlight the impact of the above calls, I profiled an active worker
> thread for 40 minutes. During that window, it spent ~23 minutes in
> SharedDriveConnector#checkInclude and its callees + 9 minutes creating
> instances of PriorityCalculator.
> I've seen the above issues when using the shared drive connector but I think
> other connectors too could be impacted - depending on how they're implemented.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)