Hi Nick,

On Tue, 22 May 2012, Joerg Ehrlich wrote:
>> Thanks, this looks like a great step forward. It definitely helps to 
>> clean up the current metadata usage. But I still have no real idea how 
>> to represent structured properties with the current Property/Metadata 
>> setup going forward.
>
>The only thing the current setup won't support is Structured Properties. 
>(That hasn't changed). That will need more work, but hopefully it'll be easier 
>now we've moved more things to be Property based.
>
>Are you able to come up with a good, simple example for using a structured 
>property? That'd provide us with something to ponder, and to use when testing 
>out possible solutions

Sure, there are plenty of examples, like from 

EXIF, which would be easy to map onto a flat property list:
<exif:Flash rdf:parseType="Resource">
            <exif:Fired>False</exif:Fired>
            <exif:Return>0</exif:Return>
            <exif:Mode>0</exif:Mode>
            <exif:Function>False</exif:Function>
            <exif:RedEyeMode>False</exif:RedEyeMode>
</exif:Flash>

whereas Face detection would be more complicated:
<mwg-rs:Regions rdf:parseType="Resource">
      <mwg-rs:AppliedToDimensions stDim:w="4288" stDim:h="2848" 
stDim:unit="pixel"/>
      <mwg-rs:RegionList>
        <rdf:Bag>
          <rdf:li rdf:parseType="Resource">
            <mwg-rs:Area stArea:x="0.5" stArea:y="0.5" stArea:w="0.06" 
stArea:h="0.09" stArea:unit="normalized"/>
            <mwg-rs:Type>Face</mwg-rs:Type>
            <mwg-rs:Title>John Doe</mwg-rs:Title>
          </rdf:li>
        ...

Interesting are also properties which offer Language alternatives. They are 
arrays, but each item is qualified with a language.
For example the title property as defined by IPTC:
<dc:title>
            <rdf:Alt>
                <rdf:li xml:lang="en-us">title</rdf:li>
        <rdf:li xml:lang="de-de">titel</rdf:li>
            </rdf:Alt>
 </dc:title>

The moment Tika would start reading more metadata from assets (like XMP) and 
map more than the current simple stuff, you would have to deal with such 
structured information. In case of XMP data, Tika could also just pass that 
through as blob data without parsing it and let the client deal with it, of 
course :)

Regards
Jörg

Reply via email to