Thanks Johann, this sounds like quite a reasonable approach too. It shouldn't
be too hard to add in to the splitter (eg by tacking the info on to the density
map, or by holding on to the is_in data until the areas are known then calculating
the names as a separate step). Looking at a few osm files the is_in tags
and their values seem inconsistent at best though so I don't know how easy
it would be to get sensible/consistent data from them. Another possible issue
is that osm files without the tags wouldn't work at all.
Yes, thats true. As far as I can remember, the is_in tags was very
inconsistent. But I had hoped my algorithm is flexible enough to ignore
this inconsistency. The idea was to extract each name at each level of
the is_in tag. So if I take for example is_in =
Germany,Bavaria,Munich,suburb,street name,.... then I will count the
frequency of all five words. With statistical probability Germany will
be the most used word in this tile.
Afterwards I try to find unique names for the tiles. The name Germany
will be occur in nearly all tiles, so it is not unique and will not be
used. Also the region Bavaria will be in more then one tile and will not
be used. If the city Munich is contained fully in one tile, the name
will get taken, otherwise I will go down to the next. So I will get the
most used name which is unique for this tile.
If you do have some
code that deals with filtering/sanitising the is_in data I'd be interested
to see it however as it sounds like it would be worth investigating further.
Find attached a patch, which works against the relative outdated R37.
I've tried to update to the recent splitter, but it wont work. There was
some structural changes from SubArea to Area.
Regards,
Johann
Index: src/uk/me/parabola/splitter/SubArea.java
===================================================================
--- src/uk/me/parabola/splitter/SubArea.java (Revision 37)
+++ src/uk/me/parabola/splitter/SubArea.java (Arbeitskopie)
@@ -22,7 +22,10 @@
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
+import java.util.Comparator;
import java.util.Formatter;
+import java.util.HashMap;
+import java.util.TreeMap;
import java.util.Iterator;
import java.util.List;
import java.util.Locale;
@@ -186,6 +189,9 @@
while (it.hasNext()) {
Map.Entry<String,String> entry = it.next();
writer.append("<tag k='");
+//Test:
+if (entry.getKey().equals("is_in"))
+ handleRegionName(entry.getValue());
writeAttribute(entry.getKey());
writer.append("' v='");
writeAttribute(entry.getValue());
@@ -209,4 +215,69 @@
void setMapid(int mapid) {
this.mapid = mapid;
}
+
+//Test:
+ private HashMap<String, Integer> nameMap = new HashMap<String, Integer>();
+
+ // Remember all region, city, country names in a sorted list and count frequency.
+ private void handleRegionName(String name) {
+ String[] names = name.split("[,;]");
+ for (String n : names) {
+ n = n.trim();
+ if (nameMap.containsKey(n)) {
+ Integer count = nameMap.get(n);
+ nameMap.put(n,count+1);
+ }
+ else
+ nameMap.put(n,1);
+ }
+ }
+
+ public boolean containsName(String name) {
+ return nameMap.containsKey(name);
+ }
+
+
+ public String getRegionName(AreaList areas) {
+ // First sort the name list by frequency.
+ TreeMap<Integer, String> sortedMap = new TreeMap<Integer,String>(new ReverseComparator());
+ for (Map.Entry<String,Integer> entry : nameMap.entrySet())
+ sortedMap.put(entry.getValue(), entry.getKey());
+
+ // Find the mostly used unique names and return them.
+ // This should scale in a good manner over all tile sizes.
+ StringBuilder region = new StringBuilder();
+ int nameCount = 3;
+ for (Map.Entry<Integer,String> entry : sortedMap.entrySet()) {
+ String name = entry.getValue();
+ // An 'unique' element can appear in max 2 subareas.
+ int isUnique = 2;
+ for (SubArea a : areas) {
+ if (a!=this && a.containsName(name)) {
+ if (isUnique-- < 0)
+ break;
+ }
+ }
+ if (isUnique >= 0) {
+ //region.append(entry.getKey());
+ //region.append("=");
+ region.append(name);
+ if (--nameCount <= 0)
+ break;
+ region.append(",");
+ }
+ }
+ return region.toString();
+ }
+
+ // Sorts the Integers big numbers first.
+ private class ReverseComparator implements Comparator<Integer> {
+ public int compare (Integer a, Integer b) {
+ return -a.compareTo(b);
+ }
+
+ public boolean equals(Integer a, Integer b) {
+ return a.equals(b);
+ }
+ }
}
Index: src/uk/me/parabola/splitter/Main.java
===================================================================
--- src/uk/me/parabola/splitter/Main.java (Revision 37)
+++ src/uk/me/parabola/splitter/Main.java (Arbeitskopie)
@@ -18,6 +18,7 @@
import org.apache.tools.bzip2.CBZip2InputStream;
import org.xml.sax.SAXException;
+import org.apache.tools.bzip2.CBZip2InputStream;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
@@ -244,6 +245,7 @@
for (SubArea a : areaList) {
w.println();
w.format("mapname: %d\n", a.getMapid());
+ w.format("area-name: %s\n", a.getRegionName(areaList));
w.println("description: OSM Map");
w.format("input-file: %d.osm.gz\n", a.getMapid());
}
Eigenschaftsänderungen: src/org/apache/tools/bzip2/BZip2Constants.java
___________________________________________________________________
Hinzugefügt: svn:executable
+ *
Eigenschaftsänderungen: src/org/apache/tools/bzip2/CBZip2InputStream.java
___________________________________________________________________
Hinzugefügt: svn:executable
+ *
_______________________________________________
mkgmap-dev mailing list
[email protected]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev