Re: OGC meeting (continuing)

Martin Desruisseaux Fri, 18 Jan 2013 11:23:34 -0800

Hello Chris and Adam

Le 17/01/13 22:17, Mattmann, Chris A (388J) a écrit :

They were a discussion about the character encoding. I will try to make
sure that the standard accepts Kanjis, Hiragana and similar characters.

For this we can leverage Apache Tika for encoding detection and language
detection. Thoughts?

Given that in Java everything is UTF-16 anyway, I think that theencoding is not an issue for SIS at least in the most basic modules. Ithink it is rather an issue for SIS users or for higher-level moduleswho need to read and write text files. The question about specifying anencoding in the standard has been raised because OGC is not only aboutJava, but also about C/C++, XML, etc. and the encoding may be an issuefor some of those targets.

The Sensor Web group reported various experiments. Not surprisingly,
parsing of inefficient file formats was identified as one of the most
important cause of CPU and batteries consumption.

That is crazy -- they should be looking at Apache Tika ;) Thanks for the
detailed report, my friend.

Thanks for your encouragement :-)


Le 18/01/13 04:46, Adam Estrada a écrit :

1. All the metadata "formats" still seem to favor XML. Was there any mention of 
serializing to JSON instead?

There was mention of JSON in other standards (I don't remember which oneexactly), but I have not seen any mention around metadata. However Idon't think they would be objection if someone push for it. There isalready a NetCDF - ISO metadata mapping defined by NOAA, I'm pretty surethey would accept other mappings. But we would probably need to identifyfirst if there is any existing common practice for metadata with JSON.

2. Apache Tika is really good stuff and there is some ongoing work to get Tika 
working with GDAL via @jwhite. Any thoughts on that integration?

I presume that you mean integration with SIS? Since OGC is aboutstandards in implementation-independent way, we can not really"integrate" Tika in OGC. About SIS, the Tika description on the homepage ("The Apache Tika toolkit detects and extracts metadata andstructured text content from various documents using existing parserlibraries") gives me the impression that SIS could be a Tika dependencyrather than the other way around. Is that right?


    Martin

Re: OGC meeting (continuing)

Reply via email to