Hi Gerd Here is typCodePage_v4 that uses try () in both CharsetProbe and compile, getting rid of Utils.closeFile().
The patch includes the change in typCodePage-test.patch My javac doesn't seem to have an option to detect unused imports, but when I run it with -Xlint I get a variety of errors - I've attached the log. Ticker On Fri, 2020-01-17 at 11:13 +0000, Gerd Petermann wrote: > Hi Ticker, > > I use Eclipse with customized settings in Preferences -> Java > ->Compiler-> Error/Warnings as well as the SonarLint plugin. > > My understanding is that the InputStream is only closed if everything > goes well. The nature of unit tests is that they produce special > cases". > The try-with-ressources was introduced to handle this. > Maybe you can post a v4 which uses try-with-ressources in class > CharsetProbe (as the unpatched version does)? > > Gerd > > > ________________________________________ > Von: Ticker Berkin <rwb-mkg...@jagit.co.uk> > Gesendet: Freitag, 17. Januar 2020 11:54 > An: Gerd Petermann > Betreff: Re: AW: AW: [mkgmap-dev] TYP files and character encoding > > Hi Gerd > > I have another patch almost ready for the StandardCharset utf8 / try > (with-resources) etc, but this is quite wide-ranging and unrelated to > the need for the typCodePage patch. > > I wanted to get the typCodePage patches committed first so that I can > get on with mapnik.txt patches and also improve some bits of > TypCompiler as the last part of the utf8 patch. > > My compilation system doesn't warn about unused imports - what > options/tool do you use for this? > > Concerning the close, the FileInputSteam is closed, which should > release any OS file handle; it's just the InputSteamReader & > BufferedReader that arn't, but these are just java data structures. > > Ticker > > On Fri, 2020-01-17 at 10:27 +0000, Gerd Petermann wrote: > > Hi Ticker, > > > > ah, seems I got another post wrong. I thought you'd work on a > > typCodePage_v4.patch which would use try-with-ressources. > > With typCodePage_v3.patch and typCodePage-test.patch I still see > > some warnings for TypCompiler: > > - unused imports > > - br is not closed (line 232) > > > > Gerd > > > > ________________________________________ > > Von: Ticker Berkin <rwb-mkg...@jagit.co.uk> > > Gesendet: Freitag, 17. Januar 2020 10:17 > > An: Gerd Petermann > > Betreff: Re: AW: [mkgmap-dev] TYP files and character encoding > > > > Hi Gerd > > > > Yes. Sorry - I didn't explain it at all well. This needs to be > > applied > > at the same time as typCodePage_v3.patch from 14-Jan > > > > Ticker > > > > On Fri, 2020-01-17 at 08:48 +0000, Gerd Petermann wrote: > > > Hi Ticker, > > > > > > I don't understand this patch. Do I have to use it in combination > > > with another one? > > > > > > Gerd > > > > > > ________________________________________ > > > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im > > > Auftrag > > > von Ticker Berkin <rwb-mkg...@jagit.co.uk> > > > Gesendet: Freitag, 17. Januar 2020 00:02 > > > An: Development list for mkgmap > > > Betreff: Re: [mkgmap-dev] TYP files and character encoding > > > > > > Hi Gerd > > > > > > I've just noticed that a change to a function profile stopped a > > > test > > > from compiling, so here is a patch for that > > > > > > Ticker
[javac] Compiling 502 source files to /norbert/svn/trunk/build/classes [javac] warning: [path] bad path element "/norbert/svn/trunk/lib/optional/jai_core-1.1.3.jar": no such file or directory [javac] warning: [path] Unexpected file on path: /norbert/svn/trunk/lib/optional/license-1.1.3.txt [javac] warning: [path] bad path element "/norbert/svn/trunk/lib/optional/lib/fastutil-6.5.15-mkg.1b.jar": no such file or directory [javac] warning: [path] bad path element "/norbert/svn/trunk/lib/optional/lib/osmpbf-1.3.3.jar": no such file or directory [javac] warning: [path] bad path element "/norbert/svn/trunk/lib/optional/lib/protobuf-java-2.5.0.jar": no such file or directory [javac] warning: [path] bad path element "/norbert/svn/trunk/lib/optional/lib/xpp3-1.1.4c.jar": no such file or directory [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/ExitException.java:25: warning: [serial] serializable class ExitException has no definition of serialVersionUID [javac] public class ExitException extends RuntimeException { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/FileExistsException.java:26: warning: [serial] serializable class FileExistsException has no definition of serialVersionUID [javac] public class FileExistsException extends IOException { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/FileNotWritableException.java:25: warning: [serial] serializable class FileNotWritableException has no definition of serialVersionUID [javac] public class FileNotWritableException extends IOException { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/FormatException.java:24: warning: [serial] serializable class FormatException has no definition of serialVersionUID [javac] public class FormatException extends RuntimeException { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/MapFailedException.java:28: warning: [serial] serializable class MapFailedException has no definition of serialVersionUID [javac] public class MapFailedException extends RuntimeException { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/ReadFailedException.java:22: warning: [serial] serializable class ReadFailedException has no definition of serialVersionUID [javac] public class ReadFailedException extends RuntimeException { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/Utils.java:390: warning: [cast] redundant cast to long [javac] return (long)(latHp & 0xffffffffL) << 32 | (lonHp & 0xffffffffL); [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/util/EnhancedProperties.java:28: warning: [serial] serializable class EnhancedProperties has no definition of serialVersionUID [javac] public class EnhancedProperties extends Properties { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/app/net/RouteArc.java:319: warning: [cast] redundant cast to int [javac] log.debug("val is", Integer.toHexString((int)val)); [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/app/mdr/Mdr7.java:366: warning: [cast] redundant cast to ArrayList<Mdr7Record> [javac] return Collections.unmodifiableList((ArrayList<Mdr7Record>) allStreets); [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/app/mdr/Mdr10.java:37: warning: [rawtypes] found raw type: ArrayList [javac] private List<Mdr10Record>[] poiTypes = new ArrayList[MAX_GROUP_NUMBER+1]; [javac] ^ [javac] missing type arguments for generic class ArrayList<E> [javac] where E is a type-variable: [javac] E extends Object declared in class ArrayList [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/app/net/NumberPreparer.java:874: warning: [serial] serializable class Abandon has no definition of serialVersionUID [javac] class Abandon extends RuntimeException { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/util/MultiHashMap.java:22: warning: [serial] serializable class MultiHashMap has no definition of serialVersionUID [javac] public class MultiHashMap<K,V> extends HashMap<K,List<V>> { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/imgfmt/app/typ/TypLabelException.java:21: warning: [serial] serializable class TypLabelException has no definition of serialVersionUID [javac] public class TypLabelException extends RuntimeException { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/io/EndOfFileException.java:26: warning: [serial] serializable class EndOfFileException has no definition of serialVersionUID [javac] public class EndOfFileException extends IOException { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/mkgmap/combiners/MdrBuilder.java:183: warning: [cast] redundant cast to int [javac] countryMap.put((int) c.getIndex(), record); [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/mkgmap/combiners/MdrBuilder.java:195: warning: [cast] redundant cast to int [javac] Mdr14Record mdr14 = maps.countries.get((int) region.getCountry().getIndex()); [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/mkgmap/combiners/MdrBuilder.java:197: warning: [cast] redundant cast to int [javac] regionMap.put((int) region.getIndex(), record); [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/mkgmap/osmstyle/PrefixSuffixFilter.java:170: warning: [fallthrough] possible fall-through into case [javac] default: [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/mkgmap/scan/SyntaxException.java:25: warning: [serial] serializable class SyntaxException has no definition of serialVersionUID [javac] public class SyntaxException extends RuntimeException { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/util/MultiIdentityHashMap.java:23: warning: [serial] serializable class MultiIdentityHashMap has no definition of serialVersionUID [javac] public class MultiIdentityHashMap<K,V> extends IdentityHashMap<K,List<V>> { [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/mkgmap/reader/osm/boundary/BoundarySaver.java:419: warning: [fallthrough] possible fall-through into case [javac] case PathIterator.SEG_MOVETO: [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/mkgmap/reader/osm/boundary/BoundaryMerger.java:45: warning: [try] explicit call to close() on an auto-closeable resource [javac] in.close(); [javac] ^ [javac] /norbert/svn/trunk/src/uk/me/parabola/mkgmap/reader/osm/boundary/BoundaryMerger.java:46: warning: [try] explicit call to close() on an auto-closeable resource [javac] out.close(); [javac] ^ [javac] 30 warnings
Index: src/uk/me/parabola/imgfmt/app/typ/TYPFile.java =================================================================== --- src/uk/me/parabola/imgfmt/app/typ/TYPFile.java (revision 4422) +++ src/uk/me/parabola/imgfmt/app/typ/TYPFile.java (working copy) @@ -121,12 +121,13 @@ // If we succeeded then note offsets for indexes strToType.put(off, type); typeToStr.put(type, off); - + writer.put1u(0); } catch (CharacterCodingException ignore) { + //ignore.printStackTrace(); String name = encoder.charset().name(); - throw new TypLabelException(name); + log.warn("Cannot represent icon String", label, "in CodePage", name); + //throw new TypLabelException(name); } - writer.put1u(0); } } Utils.closeFile(writer); Index: src/uk/me/parabola/imgfmt/app/typ/TypData.java =================================================================== --- src/uk/me/parabola/imgfmt/app/typ/TypData.java (revision 4422) +++ src/uk/me/parabola/imgfmt/app/typ/TypData.java (working copy) @@ -17,6 +17,7 @@ import java.util.List; import uk.me.parabola.imgfmt.app.srt.Sort; +import uk.me.parabola.log.Logger; /** * Holds all the data for a typ file. @@ -24,6 +25,8 @@ * @author Steve Ratcliffe */ public class TypData { + private static final Logger log = Logger.getLogger(TypData.class); + private final ShapeStacking stacking = new ShapeStacking(); private final TypParam param = new TypParam(); private final List<TypPolygon> polygons = new ArrayList<TypPolygon>(); @@ -51,10 +54,11 @@ if (origCodepage != 0) { if (origCodepage != sort.getCodepage()) { // This is just a warning, not a definite problem - System.out.println("WARNING: SortCode in TYP txt file different from" + - " command line setting"); + // and is to be expected if have general UTF-8 TYP.txt + log.warn("CodePage in TYP txt file:", sort.getCodepage(), "different from --code-page:", origCodepage); } } + return; // want to use the command line one } this.sort = sort; encoder = sort.getCharset().newEncoder(); Index: src/uk/me/parabola/imgfmt/app/typ/TypElement.java =================================================================== --- src/uk/me/parabola/imgfmt/app/typ/TypElement.java (revision 4422) +++ src/uk/me/parabola/imgfmt/app/typ/TypElement.java (working copy) @@ -20,6 +20,7 @@ import java.util.List; import uk.me.parabola.imgfmt.app.ImgFileWriter; +import uk.me.parabola.log.Logger; /** * Base routines and data used by points, lines and polygons. @@ -30,6 +31,8 @@ * @author Steve Ratcliffe */ public abstract class TypElement implements Comparable<TypElement> { + private static final Logger log = Logger.getLogger(TypElement.class); + private int type; private int subType; @@ -124,17 +127,19 @@ protected ByteBuffer makeLabelBlock(CharsetEncoder encoder) { ByteBuffer out = ByteBuffer.allocate(256 * labels.size()); for (TypLabel tl : labels) { - out.put((byte) tl.getLang()); CharBuffer cb = CharBuffer.wrap(tl.getText()); try { ByteBuffer buffer = encoder.encode(cb); + out.put((byte) tl.getLang()); out.put(buffer); + out.put((byte) 0); } catch (CharacterCodingException ignore) { + //ignore.printStackTrace(); String name = encoder.charset().name(); //System.out.println("cs " + name); - throw new TypLabelException(name); + log.warn("Cannot represent String", tl.getText(), "for language", tl.getLang(), "in CodePage", name); + //throw new TypLabelException(name); } - out.put((byte) 0); } return out; Index: src/uk/me/parabola/mkgmap/main/TypCompiler.java =================================================================== --- src/uk/me/parabola/mkgmap/main/TypCompiler.java (revision 4422) +++ src/uk/me/parabola/mkgmap/main/TypCompiler.java (working copy) @@ -17,21 +17,20 @@ import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; -import java.io.InputStream; import java.io.InputStreamReader; import java.io.Reader; import java.io.UnsupportedEncodingException; -import java.nio.CharBuffer; +import java.nio.ByteBuffer; +//import java.nio.CharBuffer; import java.nio.channels.FileChannel; import java.nio.charset.CharacterCodingException; import java.nio.charset.Charset; -import java.nio.charset.CharsetEncoder; +import java.nio.charset.CharsetDecoder; import java.nio.charset.StandardCharsets; import java.nio.file.StandardOpenOption; import uk.me.parabola.imgfmt.ExitException; import uk.me.parabola.imgfmt.MapFailedException; -import uk.me.parabola.imgfmt.Utils; import uk.me.parabola.imgfmt.app.srt.Sort; import uk.me.parabola.imgfmt.app.typ.TYPFile; import uk.me.parabola.imgfmt.app.typ.TypData; @@ -85,7 +84,7 @@ param.setFamilyId(family); if (product != -1) param.setProductId(product); - if (cp != -1 && param.getCodePage() == 0) + if (cp != -1) param.setCodePage(cp); File outFile = new File(filename); @@ -131,16 +130,13 @@ TypData data = tr.getData(); data.setSort(sort); - try { - Reader r = new BufferedReader(new InputStreamReader(new FileInputStream(filename), charset)); - try { - tr.read(filename, r); - } finally { - Utils.closeFile(r); - } + try (Reader r = new BufferedReader(new InputStreamReader(new FileInputStream(filename), charset))) { + tr.read(filename, r, charset); } catch (UnsupportedEncodingException e) { // Not likely to happen as we should have already used this character set! throw new MapFailedException("Unsupported character set", e); + } catch (IOException e) { + throw new ExitException("Unable to read/close file " + filename); } return tr.getData(); @@ -204,79 +200,88 @@ class CharsetProbe { - private String codePage; - private CharsetEncoder encoder; + // TODO: this should could be moved to somewhere like util and used on other text files + // except looking for Codepage is particular to Typ files + // and want to have ability to return default environment decoder + // (ie inputStream without 2nd parameter) - public CharsetProbe() { - setCodePage("latin1"); - } - - private void setCodePage(String codePage) { - if ("cp65001".equalsIgnoreCase(codePage)) { - this.codePage = "utf-8"; - this.encoder = StandardCharsets.UTF_8.newEncoder(); - } else { - this.codePage = codePage; - this.encoder = Charset.forName(codePage).newEncoder(); - } - } - private String probeCharset(String file) { - String readingCharset = "utf-8"; - try { - tryCharset(file, readingCharset); - return readingCharset; - } catch (TypLabelException e) { - try { - readingCharset = e.getCharsetName(); - tryCharset(file, readingCharset); - } catch (Exception e1) { - return "utf-8"; - } - } + final String BOM_UTF_8 = "\u00EF\u00BB\u00BF"; + final String BOM_UTF_16LE = "\u00FF\u00FE"; + final String BOM_UTF_16BE = "\u00FE\u00FF"; + final String BOM_UTF_32LE = "\u00FF\u00FE\u0000\u0000"; + final String BOM_UTF_32BE = "\u0000\u0000\u00FE\u00FF"; - return readingCharset; - } + final Charset byteCharNoMap = StandardCharsets.ISO_8859_1; // byteVal == charVal + final CharsetDecoder utf8Decoder = StandardCharsets.UTF_8.newDecoder(); - private void tryCharset(String file, String readingCharset) { - - try (InputStream is = new FileInputStream(file); BufferedReader br = new BufferedReader(new InputStreamReader(is, readingCharset))) { - + String charset = null; + boolean validUTF8 = true; + try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file), byteCharNoMap))) { String line; - while ((line = br.readLine()) != null) { + int lineNo = 0; + do { + line = br.readLine(); + if (line == null) + break; + ++lineNo; if (line.isEmpty()) continue; + if (lineNo <= 2) { // only check the first few lines for these + if (line.contains(BOM_UTF_8)) + charset = "UTF-8"; + else if (line.contains(BOM_UTF_32LE)) // must test _32 before _16 + charset = "UTF-32LE"; + else if (line.contains(BOM_UTF_32BE)) + charset = "UTF-32BE"; + else if (line.contains(BOM_UTF_16LE)) + charset = "UTF-16LE"; + else if (line.contains(BOM_UTF_16BE)) + charset = "UTF-16BE"; + if (charset != null) + break; - // This is a giveaway the file is in utf-something, so ignore anything else - if (line.charAt(0) == 0xfeff) - return; + int strInx = line.indexOf("-*- coding:"); // be lax about start/end + if (strInx >= 0) { + charset = line.substring(strInx+11).trim(); + strInx = charset.indexOf(' '); + if (strInx >= 0) + charset = charset.substring(0, strInx); + break; + } + } + // special for TypFile; to be compatible with possible old usage if (line.startsWith("CodePage=")) { - String[] split = line.split("="); + charset = line.substring(9).trim(); try { - if (split.length > 1) - setCodePage("cp" + Integer.decode(split[1].trim())); + int codePage = Integer.decode(charset); + if (codePage == 65001) + charset = "UTF-8"; + else + charset = "cp" + codePage; } catch (NumberFormatException e) { - setCodePage("cp1252"); } + break; } - if (line.startsWith("String")) { - CharBuffer cb = CharBuffer.wrap(line); - if (encoder != null) - encoder.encode(cb); + if (validUTF8) { // test the line for being valid UTF-8 + ByteBuffer asBytes = byteCharNoMap.encode(line); + try { // arbitary sequences of bytes > 127 tend not to be UTF8 + /*CharBuffer asChars =*/ utf8Decoder.decode(asBytes); + } catch (CharacterCodingException e) { + validUTF8 = false; + // don't stop as might still get coding directive + } } - } - } catch (UnsupportedEncodingException | CharacterCodingException e) { - throw new TypLabelException(codePage); - + } while (true); } catch (FileNotFoundException e) { throw new ExitException("File not found " + file); - } catch (IOException e) { - throw new ExitException("Could not read file " + file); + throw new ExitException("Unable to read file " + file); } + return charset != null ? charset : (validUTF8 ? "UTF-8" : "cp1252"); } } } Index: src/uk/me/parabola/mkgmap/scan/TokenScanner.java =================================================================== --- src/uk/me/parabola/mkgmap/scan/TokenScanner.java (revision 4422) +++ src/uk/me/parabola/mkgmap/scan/TokenScanner.java (working copy) @@ -28,6 +28,7 @@ */ public class TokenScanner { private static final int NO_PUSHBACK = 0; + private String charset = "utf-8"; // Reading state private final Reader reader; @@ -53,6 +54,10 @@ fileName = filename; } + public void setCharset(String charset) { + this.charset = charset; + } + /** * Peek and return the first token. It is not consumed. */ @@ -236,7 +241,7 @@ try { c = reader.read(); if (c == 0xfffd) - throw new SyntaxException(this, "Bad character in input, file probably not in utf-8"); + throw new SyntaxException(this, "Bad character in input, file probably not in " + charset); } catch (IOException e) { isEOF = true; c = -1; Index: src/uk/me/parabola/mkgmap/typ/IdSection.java =================================================================== --- src/uk/me/parabola/mkgmap/typ/IdSection.java (revision 4422) +++ src/uk/me/parabola/mkgmap/typ/IdSection.java (working copy) @@ -42,7 +42,8 @@ } else if (name.equalsIgnoreCase("ProductCode")) { data.setProductId(ival); } else if (name.equalsIgnoreCase("CodePage")) { - data.setSort(SrtTextReader.sortForCodepage(ival)); + if (data.getSort() == null) // ignore if --code-page + data.setSort(SrtTextReader.sortForCodepage(ival)); } else { throw new SyntaxException(scanner, "Unrecognised keyword in id section: " + name); } Index: src/uk/me/parabola/mkgmap/typ/TypTextReader.java =================================================================== --- src/uk/me/parabola/mkgmap/typ/TypTextReader.java (revision 4422) +++ src/uk/me/parabola/mkgmap/typ/TypTextReader.java (working copy) @@ -32,9 +32,10 @@ // As the file is read in, the information is saved into this data structure. private final TypData data = new TypData(); - public void read(String filename, Reader r) { + public void read(String filename, Reader r, String charset) { TokenScanner scanner = new TokenScanner(filename, r); scanner.setCommentChar(null); // the '#' comment character is not appropriate for this file + scanner.setCharset(charset); ProcessSection currentSection = null; Index: test/uk/me/parabola/mkgmap/typ/TypTextReaderTest.java =================================================================== --- test/uk/me/parabola/mkgmap/typ/TypTextReaderTest.java (revision 4422) +++ test/uk/me/parabola/mkgmap/typ/TypTextReaderTest.java (working copy) @@ -20,6 +20,7 @@ import java.io.RandomAccessFile; import java.io.Reader; import java.io.StringReader; +import java.nio.charset.Charset; import java.nio.channels.FileChannel; import java.util.List; @@ -269,7 +270,7 @@ public void testFromFile() throws IOException, InterruptedException { Reader r = new BufferedReader(new FileReader("test/resources/typ/test.txt")); tr = new TypTextReader(); - tr.read("test.typ", r); + tr.read("test.typ", r, Charset.defaultCharset().name()); TestUtils.registerFile("ts__test.typ"); RandomAccessFile raf = new RandomAccessFile("ts__test.typ", "rw"); @@ -312,7 +313,7 @@ Reader r = new StringReader(in); TypTextReader tr = new TypTextReader(); - tr.read("string", r); + tr.read("string", r, "N/A"); if (tr.getData().getSort() == null) tr.getData().setSort(SrtTextReader.sortForCodepage(1252)); return tr;
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev