Hi Johann,

First off, welcome to the list!! :)

More comments below:

On 1/21/13 2:13 AM, "johann sorel" <[email protected]> wrote:

>Hello everyone,
>
>Sorry for the late answer, I wasn't yet registered on this mailing list.
>Here is a quick introduction since martin already talked about me :
>I'm Johann Sorel from the same company and working on the geotoolkit
>project too, I mainly work on data reader/writer, rendering engines and
>swing user interfaces but also a bit on everything :
>metadata,coverage,security,web services.
>
>I have being looking at the Tika project, I never used it so correct me
>if I say something wrong.
> From what I see it is limited to Metadata reading only and reduced to
>file types.
>Writing is also something the Apache SIS project should provide so I
>believe SIS should have a higher level api that Tika could implement.

Yep indeed. Tika doesn't provide facilities for writing files -- it's an
API and implementation for:

* automatic file identification/detection and classification based on IANA
std mime-types and others
* mime-repository
* integration of existing parsing toolkits to extract metadata and/or text
extraction
* language identification

>
>About data source, I propose a different approach : Java Content
>Repository version 2 (JCR) specification (JSR 170 and 283)
>A possible implementation is Apache JackRabbit :
>http://jackrabbit.apache.org

Yep I know Jukka who used to be their VP, and have followed their
development it's a great product.

>While Tika might be interesting for metadata, the JCR specification
>defines apis for reading, writing and queries.
>Beside the community using JCR is far larger then Tika or GDAL, to name
>some of them : LifeRay, Exoplatform, Oracle beehive, Hippo CMS, ...

I have to say I'm not sure that the community using JCR is larger than
Tika or GDAL -- which themselves have pretty wide infection as well into
some of those same systems.

>Reusing the same or a similar model would simplify the integration of
>the SIS model in existing applications
>and we would benefit from the expertise already made in this
>specification.
>The JCR model is very similar to features, it has Nodes and NodeTypes
>which I believe might be useable for metadata too.

Using the JCR may help us integrate better into some of the applications
we want to target, for sure.

>
>Filter would be placed just before datasource since it should have a
>query api which use filters.
>
>If I can make an global view of the solution we have so far :
>(I won't talk about referencing, martin has much more knowledge then me
>on this topic)
>
>1) we have 3 base storage atoms : Metadata, Feature(and underneath
>Geometry), Coverage
>   --> defined by several OGC/ISO specifications
>2) to interrogate them we can use : Filter, Expression, Query
>   --> defined by OGC(exist in geoapi-pending)      Query --> defined in
>JCR
>3) to manage/query/analyze them : Repository/DataSource/DataStore
>   --> can be based on JCR , GDAL ,tika models or a mix
>4) to render the datas : style model, Map model
>   --> can be OGC SLD/SE(exist in geoapi-pending), could also be some
>kind of CSS ,
>   -->the map model could be OGC WMC but this spec is limited to web, it
>would require some improvements.

This sounds great to me. I'd be happy also to figure out where Tika fits
-- probably in the Metadata model.

>
>Some of those solutions are already implemented and have been properly
>separated
>in interfaces (geoapi-pending) and implementations (geotoolkit-pending)
>so it could be used as a starting point.

Great, looking forward to it! Please feel free to file some JIRA issues,
and to get started! We'd welcome you here in the SIS community!

Cheers,
Chris
>
>
>Johann Sorel
>Geomatys
>
>
>
>
>
>--------------------------------------------------------------------------
>-----
>Hey Martin,
>
>On 1/18/13 12:12 PM, "Martin Desruisseaux"
><[email protected]> wrote:
>
> >Le 18/01/13 11:31, Adam Estrada a écrit :
> >> Spot on with Tika being an SIS dependency, Martin! The idea is to be
> >>able
> >> to extract content from as may file formats as possible based on their
> >>MIME
> >> types. GDAL provides the interface to a lot more geospatial formats.
> >
> >We have the notion of "data source" interface (not yet committed), and
> >Tika or GDAL can be one of them. GeoTIFF, NetCDF, etc. are other data
> >sources (we have some extra flexibility if we read NetCDF files directly
> >rather than through GDAL for instance, but we would do that only for the
> >most important formats instead than duplicating the totality of GDAL).
> >However "data sources" appear downstream relative to metadata and other
> >basic modules. A list of modules in approximative dependency order can
>be:
> >
> >  - utility
> >  - metadata
> >  - referencing
> >  - geometry
> >  - feature
> >  - coverage
> >  - data source   <-- Tika/GDAL can be plugged here
> >  - styles
> >  - renderer
>
>+1 that makes sense to me.
>
>Note I also believe there is another dependency from Tika to SIS
>(especially for the WKT parsing).
>
> >
> >I'm not sure if "filter" would be before or after "data source" - Johann
> >Sorel would known better (I think he is watching this list, even if he
> >didn't sent emails yet).
>
>Come on Johann, come out and say hi! :)
>
> >
> >Actually the "sis-metadata" module being built is not about arbitrary
> >metadata, but rather about the "lingua franca" to be used in SIS for
> >metadata. Many metadata model could be choose for this purpose, but the
> >proposed SIS approach is to select ISO standards as the lingua franca.
> >All other sources of metadata would need to be converted to ISO 19115
> >before to be used in a source-independent way by all SIS modules. This
> >is the purpose for instance of the NetCDF - ISO mapping mentioned in
> >previous email. This explain why "data source", which is where
> >input/output happen, is so far away from metadata in the above
> >dependency chain; all preceding modules define the models which will
> >represent the data read by the data sources.
>
>It would be great to use Tika to convert *insert format here* to ISO 19115
>if possible.
>
> >
> >Obviously the XML (un)marshalling is an exception to what I just said,
> >since it is defined straight in the core metadata module instead than as
> >a data source. But we should have (I hope) few such exceptions. This
> >exception exists for two reasons: 1) as a side effect of the way JAXB
> >works (annotations straight in the source code), and 2) because while
> >ISO 19115 would be the "lingua franca" for the conceptual model, XML is
> >the "lingua franca" for the file format at least at OGC/ISO/INSPIRE, so
> >maybe it deserves that special treatment...
>
>+1.
>
>Cheers,
>Chris
>
> >
> >     Martin
> >
>

Reply via email to