[
https://issues.apache.org/jira/browse/CONNECTORS-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822012#comment-15822012
]
Aeham Abushwashi edited comment on CONNECTORS-1364 at 1/13/17 5:13 PM:
-----------------------------------------------------------------------
Thanks.
re prioritisation amount - on some systems we observed that doc prioritisation
sometimes doesn't occur fast enough to keep the stuffer/workers fully utilised
especially when there’s a number of jobs running concurrently. Prioritising
more docs can help in those circumstances
(As an aside, it seems jira comment replies don't work any more)
was (Author: aeham.abushwashi):
Thanks.re prioritisation amount - on some systems we observed that doc
prioritisation sometimes doesn't occur fast enough to keep the stuffer/workers
fully utilised especially when there’s a number of jobs running concurrently.
Prioritising more docs can help in those circumstances
(As an aside, it seems jira comment replies don't work any more)
> Better bin naming in the Shared Drive Connector
> -----------------------------------------------
>
> Key: CONNECTORS-1364
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1364
> Project: ManifoldCF
> Issue Type: Improvement
> Components: JCIFS connector
> Affects Versions: ManifoldCF 1.9
> Reporter: Aeham Abushwashi
> Assignee: Karl Wright
> Fix For: ManifoldCF 2.7
>
> Attachments: CONNECTORS-1364.git.patch, CONNECTORS-1364.git.v2.patch
>
>
> Hello and happy new year!
> Bin naming in the Shared Drive Connector makes assumptions that are not
> always valid.
> As I understand it, Manifold uses bins to prevent overloading data sources.
> In the SDC, server name is designated as bin name. All jobs created against a
> particular server will be treated as one unit when documents are prioritised,
> which can severely disadvantage some jobs (e.g. late starters).
> Moreover, this is incompatible with some common enterprise server topologies.
> In Windows DFS, which is widely used in large enterprises, what the SDC
> thinks of as a server name, isn’t actually a physical resource. It’s a
> namespace that can span many servers and shares. In this case, it doesn’t
> make sense to throttle simply on the root ‘server’ name. In other
> environments, a powerful storage server can be more than capable of handling
> high crawl load; overzealous throttling can end up limiting/hurting
> Manifold’s performance there.
> I’m struggling to find a single solution that fits all so I’m leaning towards
> passing in to the repo connection config some sort of server topology flag or
> throttling depth flag as a hint that ShareDriveConnector#getBinNames can use
> to decide whether the bin name should be server, server+share or
> server+share+root_folder. Share and root_folder would need to be explicitly
> passed in the repo config too or extracted from the documentIdentifier arg in
> getBinNames (assuming it's reliable).
> Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)