[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201379#comment-14201379
 ] 

Martin Desruisseaux commented on TIKA-443:
------------------------------------------

For Tika to ISO 19115, I see those choices:

* Some core Tika classes could implement some {{org.opengis.metadata}} 
interfaces. For example if there is a Tika class somewhere which contains the 
(latitude, longitude) coordinates of a rectangle, that class could implement 
the {{GeographicBoundingBox}} interface. All the {{org.opengis.metadata}} 
interfaces follow ISO 19115 model, so this is not like a purely arbitrary API.
* Alternatively, if Tika prefer to not modify their core classes, the data 
could be copied from the Tika class to a separated {{GeographicBoundingBox}} 
implementation just before marshalling. That separated implementation could be 
the SIS one or an other one if the Tika group prefer. However using the SIS one 
would avoid an other copy since SIS will need to copy the data into its own 
implementation before to marshall anyway (because of the way JAXB works).

Once Tika has identified the information of interest to them 
({{GeographicBoundingBox}}, maybe {{DataIdentification}}, etc.), those data 
needs to be put together into a {{org.opengis.metadata.Metadata}} 
implementation, which is usually the root of ISO 19115 hierarchy. Again it can 
be either a core SIS class implementing {{Metadata}}, or a separated 
implementation like the SIS one, at your choice.

Once you have a {{Metadata}} instance, the easiest way to marshall it is using 
{{org.apache.sis.XML}}. This convenience class provides several {{marshal}} 
methods, so you can pick the most convenient. An easy one for testing purpose 
is:

{code:java}
System.out.println(XML.marshal(metadata));
{code}

For the reverse operation (ISO 19115 to Tika), the starting point could be:

{code:java}
Metadata md = (Metadata) XML.unmarshal(inputStream);
{code}

but the next issue is to use that {{Metadata}} information. Again I see two 
choices:

* Tika may copy the information into its own internal structure.
* Or alternatively, some Tika API may be designed to accept {{Metadata}}, 
{{GeographicBoundingBox}}, etc. arguments. Again they are GeoAPI interfaces, so 
not necessarily SIS implementations. If Tika implemented those interfaces as a 
result of above discussion, the modified API would work with Tika classes.


> Geographic Information Parser
> -----------------------------
>
>                 Key: TIKA-443
>                 URL: https://issues.apache.org/jira/browse/TIKA-443
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Arturo Beltran
>            Assignee: Chris A. Mattmann
>         Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to