[ 
https://issues.apache.org/jira/browse/CONNECTORS-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aeham Abushwashi updated CONNECTORS-1364:
-----------------------------------------
    Attachment: CONNECTORS-1364.git.v2.patch

It’s a fair comment. In my use case, I have a client application that’s talking 
to manifold through the API so I have to implement this logic either way. I 
figured it’d be useful for others too but perhaps other advanced users would 
prefer to use their own bin naming convention. 
I could see a future use for share and root folder being passed in to the repo 
connector but I think it’d be better to introduce those as first class 
citizens, and not optional parameters, should the need for them ever arise.

Here’s an updated patch..

> Better bin naming in the Shared Drive Connector
> -----------------------------------------------
>
>                 Key: CONNECTORS-1364
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1364
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: JCIFS connector
>    Affects Versions: ManifoldCF 1.9
>            Reporter: Aeham Abushwashi
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.7
>
>         Attachments: CONNECTORS-1364.git.patch, CONNECTORS-1364.git.v2.patch
>
>
> Hello and happy new year!
> Bin naming in the Shared Drive Connector makes assumptions that are not 
> always valid. 
> As I understand it, Manifold uses bins to prevent overloading data sources. 
> In the SDC, server name is designated as bin name. All jobs created against a 
> particular server will be treated as one unit when documents are prioritised, 
> which can severely disadvantage some jobs (e.g. late starters). 
> Moreover, this is incompatible with some common enterprise server topologies. 
> In Windows DFS, which is widely used in large enterprises, what the SDC 
> thinks of as a server name, isn’t actually a physical resource. It’s a 
> namespace that can span many servers and shares. In this case, it doesn’t 
> make sense to throttle simply on the root ‘server’ name. In other 
> environments, a powerful storage server can be more than capable of handling 
> high crawl load; overzealous throttling can end up limiting/hurting 
> Manifold’s performance there.
> I’m struggling to find a single solution that fits all so I’m leaning towards 
> passing in to the repo connection config some sort of server topology flag or 
> throttling depth flag as a hint that ShareDriveConnector#getBinNames can use 
> to decide whether the bin name should be server, server+share or 
> server+share+root_folder. Share and root_folder would need to be explicitly 
> passed in the repo config too or extracted from the documentIdentifier arg in 
> getBinNames (assuming it's reliable).
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to