[ 
https://issues.apache.org/jira/browse/TIKA-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284925#comment-13284925
 ] 

Jörg Ehrlich commented on TIKA-930:
-----------------------------------

Some answers to Ray's comments:

Creator:
The DublinCore creator is usually considered the creator of the intellectual 
property, not the creator of the file. That is what the "creator tool" property 
is for. So we should stick with the "creator" property and don't use "author" 
or any other additional key.

Rating:
I think we should better not use anything more generic here. The generic 
approaches taken in the past are the reason why we have this huge mess of 
incompatible applications today. There is a strong reason why the Metadata 
Working Group has introduced this definition as it is. A lot of important 
applications understand and use this definition today. And didn't we say we 
wanted to use only something which is clearly defined?

Geographic:
Have you found any files or file types which are actually using the W3C 
approach to store geolocation data? All I have seen until today are using Exif 
to store it :)


                
> Consolidation of Some Tika Core Properties
> ------------------------------------------
>
>                 Key: TIKA-930
>                 URL: https://issues.apache.org/jira/browse/TIKA-930
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 1.2
>            Reporter: Ray Gauss II
>
> There are a few properties in TikaCoreProperties which overlap and I think we 
> should minimize ambiguity by consolidating them into a single composite 
> property with the clearest name, the most general specification referenced as 
> its primary property, and the others and deprecated strings as its 
> secondaries.
> Here's the proposed pseudo-code for the changes:
> Remove TikaCoreProperties.SUBJECT
> TikaCoreProperties.KEYWORDS <- DublinCore.SUBJECT, { Office.KEYWORDS, 
> MSOffice.KEYWORDS, Metadata.SUBJECT }
> Remove TikaCoreProperties.DATE
> TikaCoreProperties.CREATION_DATE <- DublinCore.DATE, { Office.CREATION_DATE, 
> MSOffice.CREATION_DATE, Metadata.DATE }
> Remove TikaCoreProperties.MODIFIED
> TikaCoreProperties.SAVE_DATE <- DublinCore.MODIFIED, { Office.SAVE_DATE, 
> MSOffice.LAST_SAVED, Metadata.MODIFIED, "Last-Modified" }
> and an example of the Java changes:
> {code:title=TikaCoreProperties.java *Before*}
>     /**
>      * @see DublinCore#SUBJECT
>      */
>     public static final Property SUBJECT = 
> Property.composite(DublinCore.SUBJECT, 
>             new Property[] { Property.internalText(Metadata.SUBJECT) });
>       
>     /**
>      * @see Office#KEYWORDS
>      */
>     public static final Property KEYWORDS = 
> Property.composite(Office.KEYWORDS,
>             new Property[] { Property.internalTextBag(MSOffice.KEYWORDS) });
> {code}
> would become
> {code:title= TikaCoreProperties.java *After*}
>     /**
>      * @see DublinCore#SUBJECT
>      * @see Office#KEYWORDS
>      */
>     public static final Property KEYWORDS = 
> Property.composite(DublinCore.SUBJECT,
>             new Property[] { 
>                   Office.KEYWORDS, 
>                   Property.internalTextBag(MSOffice.KEYWORDS),
>                   Property.internalText(Metadata.SUBJECT)
>               });
> {code}
> Since this would require a bit of refactoring for parsers that use the 
> properties being removed I thought it best to get some feedback before 
> working up a full patch.
> Does this seem like a reasonable approach?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to