Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "MetadataRoadmap" page has been changed by JoergEhrlich:
http://wiki.apache.org/tika/MetadataRoadmap?action=diff&rev1=2&rev2=3

   Properties which are currently not connected to a namespace (like the 
properties from MSOffice interface) would also be moved to an appropriate 
namespace interface.<<BR>>
   To not have to prefix each namespace property, the namespace interfaces 
should be removed from Metadata class and aliases be added to the class to keep 
backwards compatibility.<<BR>><<BR>>
   No parser or client has to be changed.<<BR>><<BR>>
-  I. '''Improve XMP output utilizing XMPCore library'''<<BR>><<BR>>
-  Add XMPCore library from Maven.org to Tika and use it in XMPContentHandler 
to replace the current string concatenation and create XMP output.<<BR>><<BR>>
+  I. '''Move XMP output to an extra XMP module of Tika'''<<BR>><<BR>>
+  Add an extra Tika module that takes the metadata map from Tika-core and 
transforms it into XMP. Add XMPCore library from Maven.org to this module and 
use create XMP output.<<BR>><<BR>>
   Add a static Tika-to-XMP mapping table for the common set of properties and 
file formats to have a first working version of XMP output.<<BR>><<BR>>
   I. '''Correct parsers where necessary'''<<BR>><<BR>>
   Adjust the parsers to map metadata not only to the current mappings but also 
to the correct set of common properties and namespaces (i.e. DublinCore and XMP 
ones) and maybe add file format specific properties.<<BR>><<BR>>
   Declare current mappings deprecated if needed.<<BR>>
   Still no client changes needed.<<BR>><<BR>>
-  I. '''Use XMP instead of Hashmap in Metadata class'''<<BR>><<BR>>
-  The idea is to have just one data model which is able to faithfully store 
all metadata information. The XMP data model provides that. The Metadata API 
will be kept as is, just the internal representation of the data will be moved 
to XMP data model. To be able to map from the API to the internal data model, 
the static mapping table that has been introduced in step 2 will be used. (see 
picture 1)<<BR>><<BR>>
+  I. '''Add support for structured data to metadata class'''<<BR>><<BR>>
+  There is need to have a data model which is able to faithfully store all 
metadata information, also structured one. There are several potential ways to 
solve this. Either the current HashMap is extended to also support structured 
properties or other structured data models like XMP could be used as internal 
representation for that. The Metadata API will be kept as is, just the internal 
representation of the data will be changed. Additional APIs should be 
introduced to manipulate structured properties.<<BR>><<BR>>
-  Any client provided data that cannot be mapped to existing namespaces, will 
be stored in a special Tika namespace in XMP.<<BR>>
+  Any client provided data that cannot be mapped to existing namespaces, will 
be stored in a special Tika namespace.<<BR>>
-  Add an access API to the internal XMP object to the Metadata API for clients 
or parsers who want to directly work on the XMP data model. The alternative 
would be to add a complete XMP API to the Metadata class, but it is the 
question whether that is feasible or worth the effort.<<BR>><<BR>>
-  The XMP output handler can be declared deprecated.<<BR>><<BR>>
   Still no client has to change.<<BR>><<BR>>
   I. '''Introduce versioning scheme for metadata mappings'''<<BR>><<BR>>
   This is very useful if mappings of metadata properties need to be changed in 
the future. Such changes are versioned and Clients can then pass the mapping 
version they are interested in through the parsing context. This will ensure 
backwards compatibility while allowing for changes and improvements.<<BR>><<BR>>

Reply via email to