[
https://issues.apache.org/jira/browse/CONNECTORS-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright resolved CONNECTORS-1364.
-------------------------------------
Resolution: Fixed
r1778622.
I had to code everything by hand because of other changes that the patch
referred to that weren't on trunk. I did not include the prioritization
document count parameter, since it was unrelated anyway to this change. If I
can get an explanation of why it needs to be there, we could commit that too.
> Better bin naming in the Shared Drive Connector
> -----------------------------------------------
>
> Key: CONNECTORS-1364
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1364
> Project: ManifoldCF
> Issue Type: Improvement
> Components: JCIFS connector
> Affects Versions: ManifoldCF 1.9
> Reporter: Aeham Abushwashi
> Assignee: Karl Wright
> Fix For: ManifoldCF 2.7
>
> Attachments: CONNECTORS-1364.git.patch, CONNECTORS-1364.git.v2.patch
>
>
> Hello and happy new year!
> Bin naming in the Shared Drive Connector makes assumptions that are not
> always valid.
> As I understand it, Manifold uses bins to prevent overloading data sources.
> In the SDC, server name is designated as bin name. All jobs created against a
> particular server will be treated as one unit when documents are prioritised,
> which can severely disadvantage some jobs (e.g. late starters).
> Moreover, this is incompatible with some common enterprise server topologies.
> In Windows DFS, which is widely used in large enterprises, what the SDC
> thinks of as a server name, isn’t actually a physical resource. It’s a
> namespace that can span many servers and shares. In this case, it doesn’t
> make sense to throttle simply on the root ‘server’ name. In other
> environments, a powerful storage server can be more than capable of handling
> high crawl load; overzealous throttling can end up limiting/hurting
> Manifold’s performance there.
> I’m struggling to find a single solution that fits all so I’m leaning towards
> passing in to the repo connection config some sort of server topology flag or
> throttling depth flag as a hint that ShareDriveConnector#getBinNames can use
> to decide whether the bin name should be server, server+share or
> server+share+root_folder. Share and root_folder would need to be explicitly
> passed in the repo config too or extracted from the documentIdentifier arg in
> getBinNames (assuming it's reliable).
> Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)