Aeham Abushwashi created CONNECTORS-1364:
--------------------------------------------
Summary: Better bin naming in the Shared Drive Connector
Key: CONNECTORS-1364
URL: https://issues.apache.org/jira/browse/CONNECTORS-1364
Project: ManifoldCF
Issue Type: Improvement
Components: JCIFS connector
Affects Versions: ManifoldCF 1.9
Reporter: Aeham Abushwashi
Hello and happy new year!
Bin naming in the Shared Drive Connector makes assumptions that are not always
valid.
As I understand it, Manifold uses bins to prevent overloading data sources. In
the SDC, server name is designated as bin name. All jobs created against a
particular server will be treated as one unit when documents are prioritised,
which can severely disadvantage some jobs (e.g. late starters).
Moreover, this is incompatible with some common enterprise server topologies.
In Windows DFS, which is widely used in large enterprises, what the SDC thinks
of as a server name, isn’t actually a physical resource. It’s a namespace that
can span many servers and shares. In this case, it doesn’t make sense to
throttle simply on the root ‘server’ name. In other environments, a powerful
storage server can be more than capable of handling high crawl load;
overzealous throttling can end up limiting/hurting Manifold’s performance there.
I’m struggling to find a single solution that fits all so I’m leaning towards
passing in to the repo connection config some sort of server topology flag or
throttling depth flag as a hint that ShareDriveConnector#getBinNames can use to
decide whether the bin name should be server, server+share or
server+share+root_folder. Share and root_folder would need to be explicitly
passed in the repo config too.
Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)