Spot on with Tika being an SIS dependency, Martin! The idea is to be able
to extract content from as may file formats as possible based on their MIME
types. GDAL provides the interface to a lot more geospatial formats.

Adam


On Fri, Jan 18, 2013 at 2:23 PM, Martin Desruisseaux <
[email protected]> wrote:

> Hello Chris and Adam
>
> Le 17/01/13 22:17, Mattmann, Chris A (388J) a écrit :
>
>  They were a discussion about the character encoding. I will try to make
>>> sure that the standard accepts Kanjis, Hiragana and similar characters.
>>>
>> For this we can leverage Apache Tika for encoding detection and language
>> detection. Thoughts?
>>
> Given that in Java everything is UTF-16 anyway, I think that the encoding
> is not an issue for SIS at least in the most basic modules. I think it is
> rather an issue for SIS users or for higher-level modules who need to read
> and write text files. The question about specifying an encoding in the
> standard has been raised because OGC is not only about Java, but also about
> C/C++, XML, etc. and the encoding may be an issue for some of those targets.
>
>
>  The Sensor Web group reported various experiments. Not surprisingly,
>>> parsing of inefficient file formats was identified as one of the most
>>> important cause of CPU and batteries consumption.
>>>
>> That is crazy -- they should be looking at Apache Tika ;) Thanks for the
>> detailed report, my friend.
>>
> Thanks for your encouragement :-)
>
>
> Le 18/01/13 04:46, Adam Estrada a écrit :
>
>  1. All the metadata "formats" still seem to favor XML. Was there any
>> mention of serializing to JSON instead?
>>
> There was mention of JSON in other standards (I don't remember which one
> exactly), but I have not seen any mention around metadata. However I don't
> think they would be objection if someone push for it. There is already a
> NetCDF - ISO metadata mapping defined by NOAA, I'm pretty sure they would
> accept other mappings. But we would probably need to identify first if
> there is any existing common practice for metadata with JSON.
>
>
>  2. Apache Tika is really good stuff and there is some ongoing work to get
>> Tika working with GDAL via @jwhite. Any thoughts on that integration?
>>
> I presume that you mean integration with SIS? Since OGC is about standards
> in implementation-independent way, we can not really "integrate" Tika in
> OGC. About SIS, the Tika description on the home page ("The Apache Tika
> toolkit detects and extracts metadata and structured text content from
> various documents using existing parser libraries") gives me the impression
> that SIS could be a Tika dependency rather than the other way around. Is
> that right?
>
>     Martin
>
>

Reply via email to