Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "MetadataRoadmap" page has been changed by JoergEhrlich: http://wiki.apache.org/tika/MetadataRoadmap?action=diff&rev1=2&rev2=3 Properties which are currently not connected to a namespace (like the properties from MSOffice interface) would also be moved to an appropriate namespace interface.<<BR>> To not have to prefix each namespace property, the namespace interfaces should be removed from Metadata class and aliases be added to the class to keep backwards compatibility.<<BR>><<BR>> No parser or client has to be changed.<<BR>><<BR>> - I. '''Improve XMP output utilizing XMPCore library'''<<BR>><<BR>> - Add XMPCore library from Maven.org to Tika and use it in XMPContentHandler to replace the current string concatenation and create XMP output.<<BR>><<BR>> + I. '''Move XMP output to an extra XMP module of Tika'''<<BR>><<BR>> + Add an extra Tika module that takes the metadata map from Tika-core and transforms it into XMP. Add XMPCore library from Maven.org to this module and use create XMP output.<<BR>><<BR>> Add a static Tika-to-XMP mapping table for the common set of properties and file formats to have a first working version of XMP output.<<BR>><<BR>> I. '''Correct parsers where necessary'''<<BR>><<BR>> Adjust the parsers to map metadata not only to the current mappings but also to the correct set of common properties and namespaces (i.e. DublinCore and XMP ones) and maybe add file format specific properties.<<BR>><<BR>> Declare current mappings deprecated if needed.<<BR>> Still no client changes needed.<<BR>><<BR>> - I. '''Use XMP instead of Hashmap in Metadata class'''<<BR>><<BR>> - The idea is to have just one data model which is able to faithfully store all metadata information. The XMP data model provides that. The Metadata API will be kept as is, just the internal representation of the data will be moved to XMP data model. To be able to map from the API to the internal data model, the static mapping table that has been introduced in step 2 will be used. (see picture 1)<<BR>><<BR>> + I. '''Add support for structured data to metadata class'''<<BR>><<BR>> + There is need to have a data model which is able to faithfully store all metadata information, also structured one. There are several potential ways to solve this. Either the current HashMap is extended to also support structured properties or other structured data models like XMP could be used as internal representation for that. The Metadata API will be kept as is, just the internal representation of the data will be changed. Additional APIs should be introduced to manipulate structured properties.<<BR>><<BR>> - Any client provided data that cannot be mapped to existing namespaces, will be stored in a special Tika namespace in XMP.<<BR>> + Any client provided data that cannot be mapped to existing namespaces, will be stored in a special Tika namespace.<<BR>> - Add an access API to the internal XMP object to the Metadata API for clients or parsers who want to directly work on the XMP data model. The alternative would be to add a complete XMP API to the Metadata class, but it is the question whether that is feasible or worth the effort.<<BR>><<BR>> - The XMP output handler can be declared deprecated.<<BR>><<BR>> Still no client has to change.<<BR>><<BR>> I. '''Introduce versioning scheme for metadata mappings'''<<BR>><<BR>> This is very useful if mappings of metadata properties need to be changed in the future. Such changes are versioned and Clients can then pass the mapping version they are interested in through the parsing context. This will ensure backwards compatibility while allowing for changes and improvements.<<BR>><<BR>>
