[
https://issues.apache.org/jira/browse/CONNECTORS-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072992#comment-16072992
]
Karl Wright edited comment on CONNECTORS-1440 at 7/4/17 7:21 AM:
-----------------------------------------------------------------
[~svanschalkwyk]: The problem is that the file system connector doesn't set the
standard creation date at all:
{code}
RepositoryDocument data = new RepositoryDocument();
data.setFileName(fileName);
data.setMimeType(mimeType);
data.setModifiedDate(modifiedDate);
if (convertPath != null) {
// WGET-compatible input; convert back to external URI
data.addField("uri",uri);
} else {
data.addField("uri",file.toString());
}
// MHL for other metadata
// Ingest the document.
{code}
As we've discussed before, the reason for this omission is because the Java
standard IO code doesn't support creation date. Instead, the creation date is
coming (apparently) from fields extracted using Tika. So you have the
following choices:
(1) If you want the creation date from the PDF metadata, you will need to map
these to the field names you want using the Metadata Adjuster transformer.
(2) If you want the file system creation date, and your file system supports
it, we can consider using java.nio, as described here:
https://stackoverflow.com/questions/2723838/determine-file-creation-date-in-java
Please let me know what you want to do.
was (Author: [email protected]):
The problem is that the file system connector doesn't set the standard creation
date at all:
{code}
RepositoryDocument data = new RepositoryDocument();
data.setFileName(fileName);
data.setMimeType(mimeType);
data.setModifiedDate(modifiedDate);
if (convertPath != null) {
// WGET-compatible input; convert back to external URI
data.addField("uri",uri);
} else {
data.addField("uri",file.toString());
}
// MHL for other metadata
// Ingest the document.
{code}
As we've discussed before, the reason for this omission is because the Java
standard IO code doesn't support creation date. Instead, the creation date is
coming (apparently) from fields extracted using Tika. So you have the
following choices:
(1) If you want the creation date from the PDF metadata, you will need to map
these to the field names you want using the Metadata Adjuster transformer.
(2) If you want the file system creation date, and your file system supports
it, we can consider using java.nio, as described here:
https://stackoverflow.com/questions/2723838/determine-file-creation-date-in-java
Please let me know what you want to do.
> "Created date field name" is not honored for pdf filesystem to ElasticSearch
> ----------------------------------------------------------------------------
>
> Key: CONNECTORS-1440
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1440
> Project: ManifoldCF
> Issue Type: Bug
> Components: Elastic Search connector
> Affects Versions: ManifoldCF 2.7.1
> Environment: Ubuntu 16.10
> ElasticSearch 5.4.1
> Reporter: Steph van Schalkwyk
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF 2.8
>
>
> The "Created date field name" attribute name is not honored for pdf crawls to
> ES.
> The ES field created is "created", not the name entered on the ES parameters
> page, in my case "createdOn". BTW, I have a mapping in the index for
> "createdOn".
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)