>What would be the approach for more richly typed values? Would they be an 
>extension of the current model, or a second >model existing in parallel with 
>the first one?
On TIKA-1607, there are two (and a half) proposals:
1) move everything to DOM with helper classes for common elements
2) use POJOs as metadata values
c) ;) keep current setup, perhaps add binary values, use DOM inputstreams for 
things that already have standards (e.g. Dublin core)  This could be a 
transitional step to option 1 in Tika 2.0.

If we went with 1 or c) we could embed ISO 19115, we could either embed the 
info within the DOM or add an ISO DOM stream that would include this 
information. 


>Thanks for the link. TIKA-1607 seems to be about associating arbitrary 
>java.lang.Object to property keys. But isn't a little bit opaque? I mean, if a 
>user get an instance of a class that he doesn't know, how to extract 
>information from it?

I agree with this on the one hand.  However, once we move beyond Map<String, 
String[]> the user is going to have to have some knowledge of the metadata 
structure to extract information, whether that's POJO, DOM or Map<String, Node>.


>Regarding ISO 19115 support, what seems the main question to me is how to 
>handle a tree structure? 
Right, that's the crux of TIKA-1607.

On your interest in ISO 19115, to echo Nick, what specifically do you need? 
What document formats do you see populating this information?



Reply via email to