[
https://issues.apache.org/jira/browse/CONNECTORS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170987#comment-14170987
]
Antonio David Pérez Morales commented on CONNECTORS-1071:
---------------------------------------------------------
That is ok, I have used the File and URL mappings tab, but not the path. When
you use the path regular expression, which field name is used to store the
result of one (or each) regular expressions applied?
Do you lose the original path of the file after the regular expression has been
applied like in the URL mapping? If that is the case, it does not fit to our
use case, so we need a specific field for the parant_url (or container url).
Maybe, another transformer to be able to copy and apply regular expression or
mapping rules to some fields to create new ones is good to have. But the
problem of that is the user must be able to configure all the things, and I
like the simplicity of ManifoldCF configuration where with only one repository
and output connector you can start to retrieve and index content.
> The windows Shares connector needs some improvement in dates and other fields
> management
> ----------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1071
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1071
> Project: ManifoldCF
> Issue Type: Improvement
> Components: JCIFS connector
> Affects Versions: ManifoldCF 1.7
> Reporter: Antonio David Pérez Morales
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF 2.0
>
> Attachments: CONNECTORS-1071.patch
>
>
> Right now the connector is overwriting the tika metadata "creation_date" and
> "last_modification_Date" for a document. This is happening because at a
> Windows Shares level you have a creation_date and a last_modification_date
> (related to the creation of the document in the windows shares filesystem)
> that are different from the creation_date and the last_modification_date
> associated to the original file.
> There is the need to change the metadata name to distinguish between this 2
> layers of dates and guaranteeing flexibility to the user to use the one that
> he/she wants with a proper mapping.
> A plus can be to format the date in the lucene standard, to be aligned with a
> proper standard.
> - Url metadata :
> Can be useful to extract the Url and store it in a specific metadata (
> further than the ID of the document). In this way we can keep it as Id but
> also use it with other mappings without affecting the Id field.
> - Parent Directory path :
> Can be useful to extract the Path for the directory that contains the current
> file. Evaluate well this as can be a redundancy or an improvement.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)