>What would be the approach for more richly typed values? Would they be an >extension of the current model, or a second >model existing in parallel with >the first one? On TIKA-1607, there are two (and a half) proposals: 1) move everything to DOM with helper classes for common elements 2) use POJOs as metadata values c) ;) keep current setup, perhaps add binary values, use DOM inputstreams for things that already have standards (e.g. Dublin core) This could be a transitional step to option 1 in Tika 2.0.
If we went with 1 or c) we could embed ISO 19115, we could either embed the info within the DOM or add an ISO DOM stream that would include this information. >Thanks for the link. TIKA-1607 seems to be about associating arbitrary >java.lang.Object to property keys. But isn't a little bit opaque? I mean, if a >user get an instance of a class that he doesn't know, how to extract >information from it? I agree with this on the one hand. However, once we move beyond Map<String, String[]> the user is going to have to have some knowledge of the metadata structure to extract information, whether that's POJO, DOM or Map<String, Node>. >Regarding ISO 19115 support, what seems the main question to me is how to >handle a tree structure? Right, that's the crux of TIKA-1607. On your interest in ISO 19115, to echo Nick, what specifically do you need? What document formats do you see populating this information?
