I had a go at this but just getting a reproducible test was not so easy. I think the current behavior is OK. The issues I noticed could be attributed to JVM misconfiguration. JSON data on disk should arguably be stored in UTF8 (since it’s JSON), it’s just that the JVM on Windows assumes everything is windows1252 (unless told otherwise), hence our problem.
-Tobias Från: Ian Turton <ijtur...@gmail.com> Skickat: den 10 maj 2022 19:58 Till: Tobias Gerdin <tobias.ger...@havochvatten.se> Kopia: Geotools-Devel list <geotools-devel@lists.sourceforge.net> Ämne: Re: [Geotools-devel] gt-geojsondatastore GeoJSONReader should specify encoding as UTF-8? I just assumed that everything was going to be utf8. Happy to review a pull request with a test. Ian On Tue, 10 May 2022, 13:41 Tobias Gerdin, <tobias.ger...@havochvatten.se<mailto:tobias.ger...@havochvatten.se>> wrote: Hello, I was puzzled by the behaviour of org.geotools.data.geojson.GeoJSONReader when I was using it to read a feature collection containing non-ascii strings. It complains that the JSON string contains invalid UTF-8 data. Due to client mandate I need to develop on a Windows 11 machine. The default platform encoding is windows-1252 (for archeological reasons, I guess), not UTF8. I noticed that GeoJSONReader uses plain String.getBytes() to read the JSON data (https://github.com/geotools/geotools/blob/f416fcc3763b2db020c54a9323601fbdd49388e7/modules/unsupported/geojson-core/src/main/java/org/geotools/data/geojson/GeoJSONReader.java#L179<https://url11.mailanyone.net/v1/?m=1noU7V-0005wd-6C&i=57e1b682&c=nd0BHN18lI5vvZyhJeZSul8QCsK7EjzqVxFVLS2HSnuWzQCPdExUmmZjNsftJZCHkAw3hTGWYgnnba9mYVF9T5M448udpKgER6NJW5_vcJ_JidCPAKNOxNbTcXoxMOLph80MgSLX4zYdDI2dDTAyWQe8kvVM4seqem0owGeUgtjFOhBMYXOEMCx0TF2tE2MId438iJ0CQM-5D-PsvptlbdX_WOR1OXabMtUzfAlZpiwiD8Q28DHoj52O6Xd7ejb2RGDXkGwTD1OZJL6r7595YAl3MsrjK1v4vBuK_NQ3UCqrbGuoJFYtZOgOEO1oDRDEkjyFsyVvf872aCYSh89sJaV3w191WBq3wmwWO_jqmEdluG5Z3hVXiy9aL7Fxpxa0vsziD2_7TSjc2uvk2kOUVe_Q4WtG8IrXoYa2vfo_CZk>). When I change the JVM charset encoding (which needs to be done at startup) using -Dfile.encoding=”UTF-8” my code works, but I rather not have to do this. I am not an expert on JSON but I recall the spec mandates that JSON data is encoded in UTF-8. So I believe that the above linked line should do jsonString.getBytes(StandardCharsets.UTF_8) (and in all other locations where JSON data is read). Apparently Java is slated to go UTF-8 by default in the future, but until then we need to deal with this mess I guess. Tobias Gerdin Systemutvecklare, Konsult Enheten för systemutveckling Gullbergs Strandgata 15, 411 04 Göteborg Box 11930, SE-404 39 Göteborg tobias.ger...@havochvatten.se<mailto:tobias.ger...@havochvatten.se> www.havochvatten.se<https://www.havochvatten.se/> Havs- och vattenmyndigheten behandlar dina personuppgifter i enlighet med dataskyddsförordningen och myndighetens dataskyddspolicy, läs mer på www.havochvatten.se/sa-behandlar-hav-dina-personuppgifter<https://www.havochvatten.se/sa-behandlar-hav-dina-personuppgifter> SwAM processes your personal data in accordance with the General Data Protection Regulation (GDPR) and our Data Protection Policy, see www.havochvatten.se/sa-behandlar-hav-dina-personuppgifter<https://www.havochvatten.se/sa-behandlar-hav-dina-personuppgifter> _______________________________________________ GeoTools-Devel mailing list GeoTools-Devel@lists.sourceforge.net<mailto:GeoTools-Devel@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/geotools-devel<https://url11.mailanyone.net/v1/?m=1noU7V-0005wd-6C&i=57e1b682&c=eSColVShyw2qqIHmNo0FJvvRFmDXQHdhDf1owtnjIFXQKG7glkWUMrZgvan3f0c4bPu23ihiJwC5ZMsGoyGFBismOfkDR-DkQsgwKVFsYfVq4RHbS6tBLsmqndc6kAzOTS5OEmZKJgFdK-UFwuPilR1H89mjHHbePQ7hfx_mwuUHMk2gclP8D2wI6gBKItdEz8_suRy-IvZcW7G9Qnj06AdYxGUfU0sNWmKZfvqqMcfLtS0qZ6yUEmGt11OtvuCD>
_______________________________________________ GeoTools-Devel mailing list GeoTools-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geotools-devel