Hello everyone,
Sorry for the late answer, I wasn't yet registered on this mailing list.
Here is a quick introduction since martin already talked about me :
I'm Johann Sorel from the same company and working on the geotoolkit
project too, I mainly work on data reader/writer, rendering engines and
swing user interfaces but also a bit on everything :
metadata,coverage,security,web services.
I have being looking at the Tika project, I never used it so correct me
if I say something wrong.
From what I see it is limited to Metadata reading only and reduced to
file types.
Writing is also something the Apache SIS project should provide so I
believe SIS should have a higher level api that Tika could implement.
About data source, I propose a different approach : Java Content
Repository version 2 (JCR) specification (JSR 170 and 283)
A possible implementation is Apache JackRabbit :
http://jackrabbit.apache.org
While Tika might be interesting for metadata, the JCR specification
defines apis for reading, writing and queries.
Beside the community using JCR is far larger then Tika or GDAL, to name
some of them : LifeRay, Exoplatform, Oracle beehive, Hippo CMS, ...
Reusing the same or a similar model would simplify the integration of
the SIS model in existing applications
and we would benefit from the expertise already made in this specification.
The JCR model is very similar to features, it has Nodes and NodeTypes
which I believe might be useable for metadata too.
Filter would be placed just before datasource since it should have a
query api which use filters.
If I can make an global view of the solution we have so far :
(I won't talk about referencing, martin has much more knowledge then me
on this topic)
1) we have 3 base storage atoms : Metadata, Feature(and underneath
Geometry), Coverage
--> defined by several OGC/ISO specifications
2) to interrogate them we can use : Filter, Expression, Query
--> defined by OGC(exist in geoapi-pending) Query --> defined in JCR
3) to manage/query/analyze them : Repository/DataSource/DataStore
--> can be based on JCR , GDAL ,tika models or a mix
4) to render the datas : style model, Map model
--> can be OGC SLD/SE(exist in geoapi-pending), could also be some
kind of CSS ,
-->the map model could be OGC WMC but this spec is limited to web, it
would require some improvements.
Some of those solutions are already implemented and have been properly
separated
in interfaces (geoapi-pending) and implementations (geotoolkit-pending)
so it could be used as a starting point.
Johann Sorel
Geomatys
-------------------------------------------------------------------------------
Hey Martin,
On 1/18/13 12:12 PM, "Martin Desruisseaux"
<[email protected]> wrote:
>Le 18/01/13 11:31, Adam Estrada a écrit :
>> Spot on with Tika being an SIS dependency, Martin! The idea is to be
>>able
>> to extract content from as may file formats as possible based on their
>>MIME
>> types. GDAL provides the interface to a lot more geospatial formats.
>
>We have the notion of "data source" interface (not yet committed), and
>Tika or GDAL can be one of them. GeoTIFF, NetCDF, etc. are other data
>sources (we have some extra flexibility if we read NetCDF files directly
>rather than through GDAL for instance, but we would do that only for the
>most important formats instead than duplicating the totality of GDAL).
>However "data sources" appear downstream relative to metadata and other
>basic modules. A list of modules in approximative dependency order can be:
>
> - utility
> - metadata
> - referencing
> - geometry
> - feature
> - coverage
> - data source <-- Tika/GDAL can be plugged here
> - styles
> - renderer
+1 that makes sense to me.
Note I also believe there is another dependency from Tika to SIS
(especially for the WKT parsing).
>
>I'm not sure if "filter" would be before or after "data source" - Johann
>Sorel would known better (I think he is watching this list, even if he
>didn't sent emails yet).
Come on Johann, come out and say hi! :)
>
>Actually the "sis-metadata" module being built is not about arbitrary
>metadata, but rather about the "lingua franca" to be used in SIS for
>metadata. Many metadata model could be choose for this purpose, but the
>proposed SIS approach is to select ISO standards as the lingua franca.
>All other sources of metadata would need to be converted to ISO 19115
>before to be used in a source-independent way by all SIS modules. This
>is the purpose for instance of the NetCDF - ISO mapping mentioned in
>previous email. This explain why "data source", which is where
>input/output happen, is so far away from metadata in the above
>dependency chain; all preceding modules define the models which will
>represent the data read by the data sources.
It would be great to use Tika to convert *insert format here* to ISO 19115
if possible.
>
>Obviously the XML (un)marshalling is an exception to what I just said,
>since it is defined straight in the core metadata module instead than as
>a data source. But we should have (I hope) few such exceptions. This
>exception exists for two reasons: 1) as a side effect of the way JAXB
>works (annotations straight in the source code), and 2) because while
>ISO 19115 would be the "lingua franca" for the conceptual model, XML is
>the "lingua franca" for the file format at least at OGC/ISO/INSPIRE, so
>maybe it deserves that special treatment...
+1.
Cheers,
Chris
>
> Martin
>