[ 
https://issues.apache.org/jira/browse/CONNECTORS-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072992#comment-16072992
 ] 

Karl Wright edited comment on CONNECTORS-1440 at 7/4/17 7:21 AM:
-----------------------------------------------------------------

[~svanschalkwyk]: The problem is that the file system connector doesn't set the 
standard creation date at all:

{code}
        RepositoryDocument data = new RepositoryDocument();
        data.setFileName(fileName);
        data.setMimeType(mimeType);
        data.setModifiedDate(modifiedDate);
        if (convertPath != null) {
          // WGET-compatible input; convert back to external URI
          data.addField("uri",uri);
        } else {
          data.addField("uri",file.toString());
        }
        // MHL for other metadata
        
        // Ingest the document.
{code}

As we've discussed before, the reason for this omission is because the Java 
standard IO code doesn't support creation date.  Instead, the creation date is 
coming (apparently) from fields extracted using Tika.  So you have the 
following choices:

(1) If you want the creation date from the PDF metadata, you will need to map 
these to the field names you want using the Metadata Adjuster transformer.
(2) If you want the file system creation date, and your file system supports 
it, we can consider using java.nio, as described here: 
https://stackoverflow.com/questions/2723838/determine-file-creation-date-in-java

Please let me know what you want to do.



was (Author: [email protected]):
The problem is that the file system connector doesn't set the standard creation 
date at all:

{code}
        RepositoryDocument data = new RepositoryDocument();
        data.setFileName(fileName);
        data.setMimeType(mimeType);
        data.setModifiedDate(modifiedDate);
        if (convertPath != null) {
          // WGET-compatible input; convert back to external URI
          data.addField("uri",uri);
        } else {
          data.addField("uri",file.toString());
        }
        // MHL for other metadata
        
        // Ingest the document.
{code}

As we've discussed before, the reason for this omission is because the Java 
standard IO code doesn't support creation date.  Instead, the creation date is 
coming (apparently) from fields extracted using Tika.  So you have the 
following choices:

(1) If you want the creation date from the PDF metadata, you will need to map 
these to the field names you want using the Metadata Adjuster transformer.
(2) If you want the file system creation date, and your file system supports 
it, we can consider using java.nio, as described here: 
https://stackoverflow.com/questions/2723838/determine-file-creation-date-in-java

Please let me know what you want to do.


> "Created date field name" is not honored for pdf filesystem to ElasticSearch
> ----------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1440
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1440
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Elastic Search connector
>    Affects Versions: ManifoldCF 2.7.1
>         Environment: Ubuntu 16.10
> ElasticSearch 5.4.1
>            Reporter: Steph van Schalkwyk
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 2.8
>
>
> The "Created date field name" attribute name is not honored for pdf crawls to 
> ES. 
> The ES field created is  "created", not the name entered on the ES parameters 
> page, in my case "createdOn". BTW, I have a mapping in the index for 
> "createdOn".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to