Hello Chris and Adam
Le 17/01/13 22:17, Mattmann, Chris A (388J) a écrit :
They were a discussion about the character encoding. I will try to make
sure that the standard accepts Kanjis, Hiragana and similar characters.
For this we can leverage Apache Tika for encoding detection and language
detection. Thoughts?
Given that in Java everything is UTF-16 anyway, I think that the
encoding is not an issue for SIS at least in the most basic modules. I
think it is rather an issue for SIS users or for higher-level modules
who need to read and write text files. The question about specifying an
encoding in the standard has been raised because OGC is not only about
Java, but also about C/C++, XML, etc. and the encoding may be an issue
for some of those targets.
The Sensor Web group reported various experiments. Not surprisingly,
parsing of inefficient file formats was identified as one of the most
important cause of CPU and batteries consumption.
That is crazy -- they should be looking at Apache Tika ;) Thanks for the
detailed report, my friend.
Thanks for your encouragement :-)
Le 18/01/13 04:46, Adam Estrada a écrit :
1. All the metadata "formats" still seem to favor XML. Was there any mention of
serializing to JSON instead?
There was mention of JSON in other standards (I don't remember which one
exactly), but I have not seen any mention around metadata. However I
don't think they would be objection if someone push for it. There is
already a NetCDF - ISO metadata mapping defined by NOAA, I'm pretty sure
they would accept other mappings. But we would probably need to identify
first if there is any existing common practice for metadata with JSON.
2. Apache Tika is really good stuff and there is some ongoing work to get Tika
working with GDAL via @jwhite. Any thoughts on that integration?
I presume that you mean integration with SIS? Since OGC is about
standards in implementation-independent way, we can not really
"integrate" Tika in OGC. About SIS, the Tika description on the home
page ("The Apache Tika toolkit detects and extracts metadata and
structured text content from various documents using existing parser
libraries") gives me the impression that SIS could be a Tika dependency
rather than the other way around. Is that right?
Martin