[ 
https://issues.apache.org/jira/browse/TIKA-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Gauss II resolved TIKA-930.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 1.2

Fixed in r1356560.

This ended up being a fairly large commit.  Feel free to revert or re-open this 
issue if I've messed something up.

I've included the commit message here as it describes the majority of the 
changes:

   - Added the Dublin Core Terms namespace and prefix

   - Changed DublinCore.CREATOR to multi-valued property
   - Consolidated TikaCoreProperties.AUTHOR to TikaCoreProperties.CREATOR
   - Removed TikaCoreProperties.LAST_AUTHOR and added 
TikaCoreProperties.MODIFIER

   - Added DublinCore.CREATED
   - Consolidated TikaCoreProperties.DATE and TikaCoreProperties.CREATION_DATE 
to TikaCoreProperties.CREATED
   - Consolidated TikaCoreProperties.SAVE_DATE to TikaCoreProperties.MODIFIED

   - Updated DublinCore.MODIFIED to correct terms namespace

   - Added OpenOfficeXMLCore.SUBJECT
   - Consolidated TikaCoreProperties.SUBJECT to TikaCoreProperties.KEYWORDS
   - Added several temporary transition properties to TikaCoreProperties to 
ease migrating previous use of 'subject' to more specific properties and 
maintain backwards compatibility
      * For most mail-related parsers/handlers, transition subject to dc:title
      * For most office-related parsers/handlers, transition subject to OO 
cp:subject

   - Added TikaCoreProperties.CREATOR_TOOL

   - Added TikaCoreProperties.METADATA_DATE

   - Added TikaCoreProperties.RATING

   - Changed XMP to use common namespace delimiter

   - Added Open Office word processing namespace and prefix to 
OfficeOpenXMLExtended
   - Added OfficeOpenXMLExtended.COMMENTS
   - Added TikaCoreProperties.COMMENTS which is a composite of 
OfficeOpenXMLExtended.COMMENTS, ClimateForecast.COMMENT and MSOffice.COMMENTS
   - Deprecated MSOffice.Comments

   - Changed OpenDocumentMetaParser to accommodate TikaCoreProperties since the 
XML it processes treats dc:date and dc:subject differently than DcXMLParser

   - Change nextMetadata in TextExtractor to a Property rather than String key

   - Changed DcXmlParser to use namespace already defined in DublinCore

   - Updated parsers to reflect TikaCoreProperties changes
   - Updated tika-xmp to reflect TikaCoreProperties changes
   - Registered dcterms namespace in XMPMetadataTest
   - Updated tests to reflect new changes and added some tests for backwards 
compatibility

                
> Consolidation of Some Tika Core Properties
> ------------------------------------------
>
>                 Key: TIKA-930
>                 URL: https://issues.apache.org/jira/browse/TIKA-930
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 1.2
>            Reporter: Ray Gauss II
>             Fix For: 1.2
>
>
> There are a few properties in TikaCoreProperties which overlap and I think we 
> should minimize ambiguity by consolidating them into a single composite 
> property with the clearest name, the most general specification referenced as 
> its primary property, and the others and deprecated strings as its 
> secondaries.
> Here's the proposed pseudo-code for the changes:
> Remove TikaCoreProperties.SUBJECT
> TikaCoreProperties.KEYWORDS <- DublinCore.SUBJECT, { Office.KEYWORDS, 
> MSOffice.KEYWORDS, Metadata.SUBJECT }
> Remove TikaCoreProperties.DATE
> TikaCoreProperties.CREATION_DATE <- DublinCore.DATE, { Office.CREATION_DATE, 
> MSOffice.CREATION_DATE, Metadata.DATE }
> Remove TikaCoreProperties.MODIFIED
> TikaCoreProperties.SAVE_DATE <- DublinCore.MODIFIED, { Office.SAVE_DATE, 
> MSOffice.LAST_SAVED, Metadata.MODIFIED, "Last-Modified" }
> and an example of the Java changes:
> {code:title=TikaCoreProperties.java *Before*}
>     /**
>      * @see DublinCore#SUBJECT
>      */
>     public static final Property SUBJECT = 
> Property.composite(DublinCore.SUBJECT, 
>             new Property[] { Property.internalText(Metadata.SUBJECT) });
>       
>     /**
>      * @see Office#KEYWORDS
>      */
>     public static final Property KEYWORDS = 
> Property.composite(Office.KEYWORDS,
>             new Property[] { Property.internalTextBag(MSOffice.KEYWORDS) });
> {code}
> would become
> {code:title= TikaCoreProperties.java *After*}
>     /**
>      * @see DublinCore#SUBJECT
>      * @see Office#KEYWORDS
>      */
>     public static final Property KEYWORDS = 
> Property.composite(DublinCore.SUBJECT,
>             new Property[] { 
>                   Office.KEYWORDS, 
>                   Property.internalTextBag(MSOffice.KEYWORDS),
>                   Property.internalText(Metadata.SUBJECT)
>               });
> {code}
> Since this would require a bit of refactoring for parsers that use the 
> properties being removed I thought it best to get some feedback before 
> working up a full patch.
> Does this seem like a reasonable approach?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to