Aeham Abushwashi updated CONNECTORS-1364:
It’s a fair comment. In my use case, I have a client application that’s talking
to manifold through the API so I have to implement this logic either way. I
figured it’d be useful for others too but perhaps other advanced users would
prefer to use their own bin naming convention.
I could see a future use for share and root folder being passed in to the repo
connector but I think it’d be better to introduce those as first class
citizens, and not optional parameters, should the need for them ever arise.
Here’s an updated patch..
> Better bin naming in the Shared Drive Connector
> Key: CONNECTORS-1364
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1364
> Project: ManifoldCF
> Issue Type: Improvement
> Components: JCIFS connector
> Affects Versions: ManifoldCF 1.9
> Reporter: Aeham Abushwashi
> Assignee: Karl Wright
> Fix For: ManifoldCF 2.7
> Attachments: CONNECTORS-1364.git.patch, CONNECTORS-1364.git.v2.patch
> Hello and happy new year!
> Bin naming in the Shared Drive Connector makes assumptions that are not
> always valid.
> As I understand it, Manifold uses bins to prevent overloading data sources.
> In the SDC, server name is designated as bin name. All jobs created against a
> particular server will be treated as one unit when documents are prioritised,
> which can severely disadvantage some jobs (e.g. late starters).
> Moreover, this is incompatible with some common enterprise server topologies.
> In Windows DFS, which is widely used in large enterprises, what the SDC
> thinks of as a server name, isn’t actually a physical resource. It’s a
> namespace that can span many servers and shares. In this case, it doesn’t
> make sense to throttle simply on the root ‘server’ name. In other
> environments, a powerful storage server can be more than capable of handling
> high crawl load; overzealous throttling can end up limiting/hurting
> Manifold’s performance there.
> I’m struggling to find a single solution that fits all so I’m leaning towards
> passing in to the repo connection config some sort of server topology flag or
> throttling depth flag as a hint that ShareDriveConnector#getBinNames can use
> to decide whether the bin name should be server, server+share or
> server+share+root_folder. Share and root_folder would need to be explicitly
> passed in the repo config too or extracted from the documentIdentifier arg in
> getBinNames (assuming it's reliable).
This message was sent by Atlassian JIRA