Re: [Talk-us] Admin boundaries tied to roads

2010-04-28 Thread Val Kartchner
On Tue, 2010-04-27 at 10:57 -0700, am12 wrote:
  I'm saying that abbreviations are part of every day life, and locals know
 
  what to abbreviate and what not to. 
 
 Sure, according to their local usage, which will be inconsistent with local
 usage in other places.  What one local thinks is an obvious abbreviation
 usage because everyone knows it will not be obvious to a map user from
 elsewhere.
 
  How does commercial text-2-speech handle this? 
 
 Unabbreviated, better-structured data.

So, does that mean that street names like 40th Street, for instance
should be expanded to Fortieth Street?

- Val -


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-27 Thread am12

I understand that this is a collaborative project, where standards are as
much defined by what somebody decides to do as anything else.  Neither the
wiki pages nor mailing list opinions (or votes) are definitive mandates. 
Given that, I'll toss my opinion out here.

 I'm saying that abbreviations are part of every day life, and locals know

 what to abbreviate and what not to. 

Sure, according to their local usage, which will be inconsistent with local
usage in other places.  What one local thinks is an obvious abbreviation
usage because everyone knows it will not be obvious to a map user from
elsewhere.

 How does commercial text-2-speech handle this? 

Unabbreviated, better-structured data.

 Can we 
 agree for now that, with appropriate local knowledge, it will be
acceptable
 to strip just these prefixes out of the name tag into another tag? 

Supplemental tags are great, but don't remove it from the name tag. 
Accepted OSM usage is the name tag is the complete full name.  There are
other variations like local_name or alt_name for the shortened version.

  There would have to be both a Something XYZ and a Something 
 ABC in the same general area for you to get lost. 

Apparently you don't have many of these in your local area so you don't
seem too concerned about it.  My local area?  I have them, and it's a pain.

  Multiply this by the 
 already small percentage of both ABC and XYZ being uncommon
abbreviations, 
 and you have a really small set.

And keeping unabbreviated data still eliminates this problem completely.

To me, it's pretty simple: you can go from more data to less easily (full
to abbreviated), but when you extrapolate backwards from less to more you
will lose somewhere.

Remember the mantra about don't tag for the renderer?  It's there for a
reason.  OSM, in philosophy, is not about creating a pretty map.  It is
about creating an underlying map data set, and creating a pretty map is one
of the key uses of it, but not the only one.  Printing abbreviations is a
job for the renderer.  

I understand the feeling that I can't change the renderer myself, but I
can change the data entry myself, so that's the right thing for me to do. 
But it still doesn't make it the best solution.  Let's make the data as
clear and unambiguous as possible, and if the renderer needs fixing, work
on it there.

That's my free opinion, worth every penny :-)

- Alan Millar



___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-26 Thread Val Kartchner
On Mon, 2010-04-26 at 16:31 -0700, Alan Mintz wrote:
 Good. We also need to settle on a set of component tags to make best use of 
 the information present in those edits -  particularly to separate out 
 cardinal directions from those that are really part of the name. Can we 
 agree for now that, with appropriate local knowledge, it will be acceptable 
 to strip just these prefixes out of the name tag into another tag? Should I 
 propose a set of component tags for a (hopefully quick) vote? The suffixes 
 and root tags could then be populated at the same time (without stripping 
 them from the name).

I second you proposing this.  We need to separate out the prefix, suffix
and root.  Though you need to remember these things when you make your
proposal: http://vidthekid.info/misc/osm-abbr.html

- Val -


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-25 Thread andrzej zaborowski
Hi Alan,

On 24 April 2010 06:33, Alan Mintz alan_mintz+...@earthlink.net wrote:
 At 2010-04-22 13:09, andrzej zaborowski wrote:
  On 22 April 2010 04:24, Alan Mintz alan_mintz+...@earthlink.net wrote:
   At 2010-04-21 17:12, andrzej zaborowski wrote:
  On 22 April 2010 01:18, Apollinaris Schoell ascho...@gmail.com wrote:
    On Wed, Apr 21, 2010 at 3:36 PM, andrzej zaborowski balr...@gmail.com
    wrote:
    Where's damage in that -- is it in that you can now read the name out
    without checking the documentation for what that funny string means in
    that particular database that is TIGER?
  
   I just had a machine crash as I was trying to find stats, but I'll bet 
 that
   at least 90% of the cases are St, Ave/Av, and Blvd/Bl, with the
   occasional Ln and Cir/Cr thrown in. When there's a lone N, S, E, or 
 W
   as a prefix to a street name, it's clear to everyone what that means. 
 These
   are the same abbreviations that _everyone_ uses every day - children,
   adults, businesses, governments, etc.
  
  Well, you just gave examples of the obvious ones, I'm not claiming any of
 these are not known.  But the list has 672 different forms.

 My point, though, was that we were going to a lot of trouble for a small
 percentage of real-world cases that _might_ (see below) present a problem
 for someone to understand.

Right, but we don't want to be inconsistent or we again have to keep
lists of exception to the normal rules in every tool.  Even if we
just wanted to document that on the wiki (or elsewhere, really doesn't
need to be wiki) for new mappers, then it would have to say something
like Don't use abbreviations in name=, except final St in English
speaking countries and Foo in Bar speaking countries and... and.. and
so on...  Let's just avoid this area completely.



  (but even the easy ones are hard for non-human consumers because St has
 at least three possible meanings, all three quite popular across the db).

 I'm sorry, but as a suffix (i.e. for the regex / St$/), what else does St
 mean but Street?

Sure you can have a regex for every allowed abbreviation, perhaps a
few regexes for some of the more complicated ones like St before names
of saints, and then for every language and every source of data, at
which point you start having to look at the source= tag or other tags
before you can fully interpret name=, because in TIGER data Stra at
the end is for Stravenue while in other places (nominatim's current
list of abbreviations) Stra at the end is for Straight.



   And I will do so again. My problem is mostly that this was done without a
   safety net. You clobbered existing data with no easy way to walk it
 back...
  
  Well, the way to walk it back is pretty easy, all the names can be
 taken from version-1 or reassembled from the tiger tags, so no worries there.

 This doesn't work for streets that were edited by users. Again, my problem
 is that, in thousands of edits, I specifically only expanded, for example,
 the prefix N to North when it is logically part of the root name. When
 it is logically a housenumber suffix, as it is in the majority of southern
 CA, I left the prefix alone. The road name may have been otherwise edited,
 though (to correct spelling, rename completely, etc.) This was to be used
 in the future when we could agree on a way to correctly separate these
 component parts of the name, as they are and must be in any database to be
 used with routing and street addressing in the real world. To walk it
 back, we will have to query the history of the way and find the version
 before the bot, to see what was done. It's not just v1, or TIGER, because
 it may have been otherwise edited. It's not even v[last-1] any more because
 there may have been other edits since the bot (I've done many myself).

Well I can provide you a list of the original names before I touched
them with the script along with their id's and versions so you can
check if the name has been edited afterwards, if you need to revert
these edits.  Note the edits also contain hundreds if not thousands of
my manual fixes for some frequent typos in TIGER and for some cases of
wrong segmentation into direction_prefix, base_name etc.

 I don't understand. Why do I have to remember them? Am I not capable of
 inferring their meaning? Do I have to infer anything anyway, since they are
 likely to be similar/identical to signage?

You have to if you want to give the name to somebody on the phone or
find a name someone gave you on the phone.

Cheers

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-23 Thread Lord-Castillo, Brett






-Original Message-
From: Apollinaris Schoell [mailto:ascho...@gmail.com] 
Sent: Friday, April 23, 2010 9:47 AM
To: Lord-Castillo, Brett
Cc: 'talk-us@openstreetmap.org'
Subject: Re: [Talk-us] Admin boundaries tied to roads


On 23 Apr 2010, at 7:13 , Lord-Castillo, Brett wrote:

 On 19 Apr 2010, at 20:24, Apollinaris Schoell wrote:
 On 19 Apr 2010, at 20:07 , Alan Mintz wrote:
 Not to mention that merging them will result in the inability to hide 
 these 
 boundaries. When doing a bunch of editing on a road that follows one, in 
 the past, I've taken the time to verify that the boundary doesn't share 
 any 
 nodes with anything and then remove it from my local OSM file manually so 
 I 
 don't have to constantly deal with it. If it shares nodes with anything 
 else, this is no longer possible.
 
 fully agree, the good thing is these boundaries are tiger data and bad data 
 anyway and should be replaced with better boundaries
 
 While I understand the mantra of TIGER=Bad because of the state of the road 
 data, this is not true for the boundary data. Most of the
 boundary data comes directly from recorded surveys (something not available 
 for roads) and is not bad data for most of the United
 States. The rural areas would be the one exception (mostly because they did 
 not have surveys converted to digital layers in 2000), but
  rural areas are also highly likely to have realigned boundary roads that no 
 longer correspond to the original boundaries.
 
 I can tell for sure that they are completely wrong in California. They are 
 not even close to USGS 24k, don't align with official county
 borders from official sources and don't align with natural features, fences 
 which are sometimes visible on Yahoo. 


Yes, California is one of the well-known exceptions. Their LUCA program fell 
apart (and this time around has been split into two separate regions as a 
result). If you take the Midwest states though, like Iowa, Minnesota, Missouri 
with their 300+ counties between them, the TIGER lines are directly from 
official sources, especially the 2009 updates.

Brett Lord-Castillo
Information Systems Designer/GIS Programmer
St. Louis County Police
Office of Emergency Management
14847 Ladue Bluffs Crossing Drive
Chesterfield, MO 63017
Office: 314-628-5400 Fax: 314-628-5508 Direct: 314-628-5407

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-23 Thread Brad Neuhauser
I'd agree with Brett on the boundaries.  The Census data is not
perfect by any means, but it's pretty good, at least in my
area--Minnesota.  (and orders of magnitude better than it was in
2000!)  And if it's not good in your area, you should talk to your
local government and make sure they're participating in the Census'
yearly Boundary  Annexation Survey.
http://www.census.gov/geo/www/bas/bashome.html

I can tell for sure that they are completely wrong in California. They are not 
even close to USGS 24k, don't align with official county borders from official 
sources and don't align with natural features, fences which are sometimes 
visible on Yahoo.

To further respond to this, there is no claim by the Census that it's
survey accuracy, or that it aligns with other data.  Fundamentally, it
is created by the Census for internal purposes, and all TIGER boundary
data is relative to the other TIGER data. (just like a lot of traced
OSM data is relative to the Yahoo imagery)  Everybody gets access to
it for free and you can use it when its good or ignore it when its bad
or modify it when its in between.  The bigger issue with it being
imported into OSM is the currency, because municipal boundaries are
always changing, and as has been mentioned, boundaries are not usually
something that is easily verifiable on the ground

Cheers,
Brad

On Fri, Apr 23, 2010 at 9:54 AM, Lord-Castillo, Brett
blord-casti...@stlouisco.com wrote:






 -Original Message-
 From: Apollinaris Schoell [mailto:ascho...@gmail.com]
 Sent: Friday, April 23, 2010 9:47 AM
 To: Lord-Castillo, Brett
 Cc: 'talk-us@openstreetmap.org'
 Subject: Re: [Talk-us] Admin boundaries tied to roads


 On 23 Apr 2010, at 7:13 , Lord-Castillo, Brett wrote:

 On 19 Apr 2010, at 20:24, Apollinaris Schoell wrote:
 On 19 Apr 2010, at 20:07 , Alan Mintz wrote:
 Not to mention that merging them will result in the inability to hide 
 these
 boundaries. When doing a bunch of editing on a road that follows one, in
 the past, I've taken the time to verify that the boundary doesn't share 
 any
 nodes with anything and then remove it from my local OSM file manually so 
 I
 don't have to constantly deal with it. If it shares nodes with anything
 else, this is no longer possible.

 fully agree, the good thing is these boundaries are tiger data and bad 
 data anyway and should be replaced with better boundaries

 While I understand the mantra of TIGER=Bad because of the state of the road 
 data, this is not true for the boundary data. Most of the
 boundary data comes directly from recorded surveys (something not available 
 for roads) and is not bad data for most of the United
 States. The rural areas would be the one exception (mostly because they did 
 not have surveys converted to digital layers in 2000), but
  rural areas are also highly likely to have realigned boundary roads that 
 no longer correspond to the original boundaries.

 I can tell for sure that they are completely wrong in California. They are 
 not even close to USGS 24k, don't align with official county
 borders from official sources and don't align with natural features, fences 
 which are sometimes visible on Yahoo.


 Yes, California is one of the well-known exceptions. Their LUCA program fell 
 apart (and this time around has been split into two separate regions as a 
 result). If you take the Midwest states though, like Iowa, Minnesota, 
 Missouri with their 300+ counties between them, the TIGER lines are directly 
 from official sources, especially the 2009 updates.

 Brett Lord-Castillo
 Information Systems Designer/GIS Programmer
 St. Louis County Police
 Office of Emergency Management
 14847 Ladue Bluffs Crossing Drive
 Chesterfield, MO 63017
 Office: 314-628-5400 Fax: 314-628-5508 Direct: 314-628-5407

 ___
 Talk-us mailing list
 Talk-us@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk-us


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-23 Thread Anthony
On Fri, Apr 23, 2010 at 11:01 AM, Brad Neuhauser
brad.neuhau...@gmail.comwrote:

 The bigger issue with it being
 imported into OSM is the currency, because municipal boundaries are
 always changing, and as has been mentioned, boundaries are not usually
 something that is easily verifiable on the ground


I'd say the biggest issue is the fact that, when the census bureau couldn't
find data on municipalities, they decided to just make shit up.  They
picked some arbitrary boundary which had roughly the right number of people
in it, and then named it after an actual place which happened to be nearby.

The CDPs are horrible when used for any purpose other than interpreting
census data.  I really wish the census bureau had named them CDP 1283,
CDP 1284, CDP 1285, etc.
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-23 Thread Alan Mintz

At 2010-04-23 18:11, Anthony
wrote:

A navi system is more useful if the instructions and signs
match.


Depends on your purpose. If you're trying to navigate to the
missigned street (e.g. California Street, where the sign
reads Carolina Street), you don't want to get a response of
street not found. For most other purposes you'd rather
have the incorrect name (at least until it gets 
fixed).
Yeah - this is always a quandary. In my experience, the street sign
usually ends up being right anyway, so I'm usually asking the
responsible authority to fix their GIS and/or the source map (yes, even
tract maps that are decades old :) ). I don't really consider this as
original research, since it's really a matter of reconciling
sources, but it's admittedly time consuming and requires additional
research that many mappers (understandably) may not want to do. Still, I
think it's value that I can add, not only to OSM, but also for my fellow
citizens.
When the sign is wrong, I notify the signing authority and, if it seems
that they intend to fix it soon (the usual case), I put the correct value
in the name tag and the signed value in the alt_name tag, with a note tag
describing the situation. If there is no easy contact with the authority,
or it seems they may not fix it soon, I reverse the tagging. Either way,
there are notes/FIXMEs there to remind me (or others) to survey again in
the future.
BTW, technically, I would call surveying/photographing, and then mapping
based on it, original research :)
P.S.
http://www.openstreetmap.org/browse/way/56123368
is one of those strange cases where it's been signed and likely known wrong according to the cited docs, because the signed name is more logical in context. I name'd it as signed and put the recorded name in the official_name tag instead. If there's anyone nearby that would like to have a look, It'd be useful to know how it's signed at the intersection with Outer Traffic Circle here: http://www.openstreetmap.org/browse/node/122696036 .

--
Alan Mintz alan_mintz+...@earthlink.net



___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-23 Thread Alan Mintz
At 2010-04-23 07:47, Apollinaris Schoell wrote:
  While I understand the mantra of TIGER=Bad because of the state of the 
 road data, this is not true for the boundary data. Most of the boundary 
 data comes directly from recorded surveys (something not available for 
 roads) and is not bad data for most of the United States. The rural 
 areas would be the one exception (mostly because they did not have 
 surveys converted to digital layers in 2000), but rural areas are also 
 highly likely to have realigned boundary roads that no longer correspond 
 to the original boundaries.
 

I can tell for sure that they are completely wrong in California. They are 
not even close to USGS 24k, don't align with official county borders from 
official sources and don't align with natural features, fences which are 
sometimes visible on Yahoo.

I don't know about completely. The parts of the Kern/LA/Orange/San 
Bernardino/Riverside/San Diego borders that I have surveyed are at least 
close to the signage at important points (admittedly a weak standard), but 
I've also gone hunting for detail in law in some spots and found that the 
borders were right as of their date of creation in the source data. I 
remember manually fixing a little bit of the OC/LA border in La Habra from 
some sort of change description - maybe something out the BAS project. What 
a pain that was.

Is anyone working on borders currently? Is the BAS a reasonable source?

--
Alan Mintz alan_mintz+...@earthlink.net


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-23 Thread Alan Mintz
At 2010-04-22 13:33, andrzej zaborowski wrote:
 On 22 April 2010 17:40, Apollinaris Schoell ascho...@gmail.com wrote:
  On 21 Apr 2010, at 17:12 , andrzej zaborowski wrote:
  The signs are posted there by authorities so this is similar to having
  access to a tiny piece of a map or database made by these authorities.
  For maps people usually agreed on this list that we don't trust them.
 
 
  are you saying authorities are wrong and we should correct what they 
are doing and follow tiger or USPS standards instead?
 
 I'm saying we should name the objects what they're called, not what it is 
written as in somebody's database.

what they're called, though, may indeed be from somebody's database, 
when that database is the county recorder's or assessor's. The recorder, in 
particular, should be the truth by definition, except when you can see that 
there's an obvious mistake and can confirm it with them.

--
Alan Mintz alan_mintz+...@earthlink.net


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-23 Thread Apollinaris Schoell

On 23 Apr 2010, at 19:46 , Alan Mintz wrote:

 At 2010-04-23 07:47, Apollinaris Schoell wrote:
 
 I don't know about completely. The parts of the Kern/LA/Orange/San 
 Bernardino/Riverside/San Diego borders that I have surveyed are at least 
 close to the signage at important points (admittedly a weak standard), but 
 I've also gone hunting for detail in law in some spots and found that the 
 borders were right as of their date of creation in the source data. I 
 remember manually fixing a little bit of the OC/LA border in La Habra from 
 some sort of change description - maybe something out the BAS project. What 
 a pain that was.
 

depends on the definition, for me a difference of 100-200m is too bad. any GPS 
or verbal description is better if matched with Yahoo. In some corners even 
worse complex edges have been entirely clipped.
USGS is pretty good and matches county borders. County borders are from 
official state data and are high accuracy. Also Sat matches well when borders 
follow natural features.
USGS tracing is very difficult because borders are often hard to identify among 
other features.


 Is anyone working on borders currently? Is the BAS a reasonable source?

what is BAS? any better source will be useful

 
 --
 Alan Mintz alan_mintz+...@earthlink.net
 
 
 ___
 Talk-us mailing list
 Talk-us@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk-us


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-23 Thread Alan Mintz
At 2010-04-22 13:09, andrzej zaborowski wrote:
 On 22 April 2010 04:24, Alan Mintz alan_mintz+...@earthlink.net wrote:
  At 2010-04-21 17:12, andrzej zaborowski wrote:
 On 22 April 2010 01:18, Apollinaris Schoell ascho...@gmail.com wrote:
   On Wed, Apr 21, 2010 at 3:36 PM, andrzej zaborowski balr...@gmail.com
   wrote:
   Where's damage in that -- is it in that you can now read the name out
   without checking the documentation for what that funny string means in
   that particular database that is TIGER?
 
  I just had a machine crash as I was trying to find stats, but I'll bet that
  at least 90% of the cases are St, Ave/Av, and Blvd/Bl, with the
  occasional Ln and Cir/Cr thrown in. When there's a lone N, S, E, or W
  as a prefix to a street name, it's clear to everyone what that means. These
  are the same abbreviations that _everyone_ uses every day - children,
  adults, businesses, governments, etc.
 
 Well, you just gave examples of the obvious ones, I'm not claiming any of 
these are not known.  But the list has 672 different forms.

My point, though, was that we were going to a lot of trouble for a small 
percentage of real-world cases that _might_ (see below) present a problem 
for someone to understand.


 (but even the easy ones are hard for non-human consumers because St has 
at least three possible meanings, all three quite popular across the db).

I'm sorry, but as a suffix (i.e. for the regex / St$/), what else does St 
mean but Street?


  And I will do so again. My problem is mostly that this was done without a
  safety net. You clobbered existing data with no easy way to walk it 
back...
 
 Well, the way to walk it back is pretty easy, all the names can be 
taken from version-1 or reassembled from the tiger tags, so no worries there.

This doesn't work for streets that were edited by users. Again, my problem 
is that, in thousands of edits, I specifically only expanded, for example, 
the prefix N to North when it is logically part of the root name. When 
it is logically a housenumber suffix, as it is in the majority of southern 
CA, I left the prefix alone. The road name may have been otherwise edited, 
though (to correct spelling, rename completely, etc.) This was to be used 
in the future when we could agree on a way to correctly separate these 
component parts of the name, as they are and must be in any database to be 
used with routing and street addressing in the real world. To walk it 
back, we will have to query the history of the way and find the version 
before the bot, to see what was done. It's not just v1, or TIGER, because 
it may have been otherwise edited. It's not even v[last-1] any more because 
there may have been other edits since the bot (I've done many myself).


 ...Then TIGER also includes Spanish names and the
 list has abbreviations for those too, which rarely anyone in US can
 read, while they can cope with unabbreviated ok.
 
  I don't agree. Much of the US speaks Spanish. Many more possess the
  tremendous brainpower and enoUGH grade-school Spanish required to know that
  Cl. in front of a street name might mean Calle or Cam. might mean Camino,
  or that S means Sur and N means Norte.
 
 But do you remember the 600 abbreviations used in tiger?  It's neither 
practical or useful or helps anyone, they're much like numerical 
codes.  The one single thing they may be good for is for rendering at lower 
zoom levels.

I don't understand. Why do I have to remember them? Am I not capable of 
inferring their meaning? Do I have to infer anything anyway, since they are 
likely to be similar/identical to signage? Also, to me lower zoom levels 
is almost any level at which I want to see a map. Anything more than a 
small neighborhood, and it's all we can do just to fit the root of the name 
in - we don't need any _more_ characters.


  name: The pre-balrog name
 
 99% percent of the cases this was an arbitrary version of name, taken 
from a database which was chosen only on the basis of its license, not 
because it was more correct or anything.  So I don't see any reason to hang 
on to it.

If I understand you correctly, I disagree completely. In my experience in 
southern CA, 90% of the time, TIGER is correct with the exception of the 
presence of the directional prefix. The real problem was the geometry[1].


  In the Los Angeles area, I rarely saw expanded names (which is why I
  continue to abbreviate), except for those rare instances where someone drew
  a street from scratch before TIGER (apparently), and not even all of those.

BTW, from my previously cited data chunk (35988 unique names in about 4400 
sq mi (11000 sq km) of southern CA) , I can now say that only ~0.2% of 
suffixes were present in their expanded form (i.e. Street, Avenue, etc.).


 You could surely change the wiki but it's a conclusion that a lot of
 people individually seem to come to so I'm sure you wouldn't even need
 a bot before someone would add a phrase to that effect.
 
  I don't know 

Re: [Talk-us] Admin boundaries tied to roads

2010-04-22 Thread andrzej zaborowski
On 22 April 2010 04:24, Alan Mintz alan_mintz+...@earthlink.net wrote:
 At 2010-04-21 17:12, andrzej zaborowski wrote:
On 22 April 2010 01:18, Apollinaris Schoell ascho...@gmail.com wrote:
  On Wed, Apr 21, 2010 at 3:36 PM, andrzej zaborowski balr...@gmail.com
  wrote:
  Where's damage in that -- is it in that you can now read the name out
  without checking the documentation for what that funny string means in
  that particular database that is TIGER?

 I just had a machine crash as I was trying to find stats, but I'll bet that
 at least 90% of the cases are St, Ave/Av, and Blvd/Bl, with the
 occasional Ln and Cir/Cr thrown in. When there's a lone N, S, E, or W
 as a prefix to a street name, it's clear to everyone what that means. These
 are the same abbreviations that _everyone_ uses every day - children,
 adults, businesses, governments, etc.

Well, you just gave examples of the obvious ones, I'm not claiming any
of these are not known.  But the list has 672 different forms.
(but even the easy ones are hard for non-human consumers because St
has at least three possible meanings, all three quite popular across
the db).

 And I will do so again. My problem is mostly that this was done without a
 safety net. You clobbered existing data with no easy way to walk it back.
 The existing name value should have been put in a foo_name tag so we could
 at least see what used to be. I would at least encourage that a bot be run
 to find these edits, find the previous version in history, and do this, if
 we can't soon agree on a better schema to split the name up into components
 at the same time.

Well, the way to walk it back is pretty easy, all the names can be
taken from version-1 or reassembled from the tiger tags, so no worries
there.


I don't know who defined the ones used in TIGER but this is not the
only way to abbreviate the names, that is proven by USPS having their
own list that is not identical.  The most popular words will be the
same in both lists but some are really cryptic and arbitrary, could as
well be numeric codes.  Then TIGER also includes Spanish names and the
list has abbreviations for those too, which rarely anyone in US can
read, while they can cope with unabbreviated ok.

 I don't agree. Much of the US speaks Spanish. Many more possess the
 tremendous brainpower and enoUGH grade-school Spanish required to know that
 Cl. in front of a street name might mean Calle or Cam. might mean Camino,
 or that S means Sur and N means Norte.

But do you remember the 600 abbreviations used in tiger?  It's neither
practical or useful or helps anyone, they're much like numerical
codes.  The one single thing they may be good for is for rendering at
lower zoom levels.




 name: The pre-balrog name

99% percent of the cases this was an arbitrary version of name, taken
from a database which was chosen only on the basis of its license, not
because it was more correct or anything.  So I don't see any reason to
hang on to it.


  The reason it was done with a script is that doing it manually was
  taking a lot of time and mappers were spending that time doing this
  instead of going out mapping. Â And it's always been on the wiki about
  not using abbreviated names, even when the original import was done,
  ignoring this.

 So what most newbies, including myself, did, was to follow the style of the
 majority of the data, instead of the often-outdated, incomplete, and
 inaccurate wiki, which is often not even self-consistent.

The majority of the data in this case was an imported dataset that
hasn't even been fully reviewed by a human, so while I agree learning
by example is a good way to make a quick start, it doesn't mean if you
followed the example then it's the only correct way to go.
I'm not using wiki as an argument to tell you what you should do, but
I think it's a good way to see what others were thinking.  I have
never edited the Key:name page, and I had never read it before
noticing that using abbreviations in a dataset that is supposed to be
parseable is a recipe for problems.



 In the Los Angeles area, I rarely saw expanded names (which is why I
 continue to abbreviate), except for those rare instances where someone drew
 a street from scratch before TIGER (apparently), and not even all of those.


You could surely change the wiki but it's a conclusion that a lot of
people individually seem to come to so I'm sure you wouldn't even need
a bot before someone would add a phrase to that effect.

 I don't know about a lot. I mostly just hear people regurgitate the
 don't abbreviate mantra without justification. Admittedly, maybe it's
 because it's already been hashed out to death and I'm late to the party.
 Regardless, maybe I'm not alone, and it deserves some re-thinking.

 Do people that are actually mapping (not bulk-importers) really want to
 type in North Martin Luther King, Junior Boulevard Southwest and then
 proofread that to make sure they didn't typo anything?

It completely depends on what 

Re: [Talk-us] Admin boundaries tied to roads

2010-04-22 Thread andrzej zaborowski
On 22 April 2010 17:40, Apollinaris Schoell ascho...@gmail.com wrote:
 On 21 Apr 2010, at 17:12 , andrzej zaborowski wrote:
 The signs are posted there by authorities so this is similar to having
 access to a tiny piece of a map or database made by these authorities.
 For maps people usually agreed on this list that we don't trust them.


 are you saying authorities are wrong and we should correct what they are 
 doing and follow tiger or USPS standards instead?

I'm saying we should name the objects what they're called, not what it
is written as in somebody's database.


 Is the wiki any better as a reference than what is in the osm DB? I could
 change the wiki and then will someone write a bot to reverse it? Is the wiki
 written with the situation in US in mind?

 Well one good rule is if there should be any rules then they should be 
 global.


 no not at all. US is very different in many aspects and has to be done 
 different. several countries don't use abbrev names on maps or addresses. 
 Most street names don't even have a st/ave/blvd/ct … postfix at all and so 
 there is no reason to even discuss this topic. And in case they use abbrev 
 it's only when there is a need to shorten. But all official use will be 
 expanded. But in US it looks very much it's the opposite. abbrev is the 
 standard use model and expanded name is the exception

Seriously?  I can't think of a single place in Europe where the
street part is not commonly abbreviated just like what you describe
(maybe Germany, but I wouldn't know).  Just look at some paper maps or
postal addresses, or google, you will very rarely find the names
spelled out in full.  In the UK it's pretty much like in the US with
regard to the feature type suffix (St/Ave...) ([1]) but people have
been fixing it in OSM for some time, in Germany I think they use Str.
though not sure how commonly.  In all the slavic countries Street is
abbreviated as ul. prefix and Avenue as al. practically always
(just look at Belarus in OSM), in Hungary it's a Ut. prefix, in
Spain C/ (although the OSM community there agreed to not go with the
popular forms and spell everything out and put in any optional
articles someone might possibly squeeze in when referring to the
street -- basically use the longest form, to avoid ambiguity.  So you
won't find C/ in OSM even though it's on the signs), in Turkey it's
Sk. for sokak, in Greece it's something like Od, I don't remember
exactly.  Someone on IRC yesterday asked whether they should put the
Greek names in all caps because the street signs are in all caps.  I
guess your anwser would be yes, they should?

Cheers

1. http://osm.org/go/erdGBcIdM-

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-21 Thread andrzej zaborowski
On 20 April 2010 05:24, Apollinaris Schoell ascho...@gmail.com wrote:
 Sounds a lot like the IMO ill-considered road name expansion that was
 apparently agreed upon by a small group of people without input from the
 majority of active mappers whose work has been damaged.

 agreed, no idea why this was done. it's a change without much benefit but 
 lot's of damage.

Where's damage in that -- is it in that you can now read the name out
without checking the documentation for what that funny string means in
that particular database that is TIGER?  You can now also write an
intelligent search engine that will understand both forms, you can
pipe the names through text-to-speach and do a lot more.

The reason it was done with a script is that doing it manually was
taking a lot of time and mappers were spending that time doing this
instead of going out mapping.  And it's always been on the wiki about
not using abbreviated names, even when the original import was done,
ignoring this.

Cheers

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-21 Thread Alan Mintz
At 2010-04-21 17:12, andrzej zaborowski wrote:
On 22 April 2010 01:18, Apollinaris Schoell ascho...@gmail.com wrote:
  On Wed, Apr 21, 2010 at 3:36 PM, andrzej zaborowski balr...@gmail.com
  wrote:
  Where's damage in that -- is it in that you can now read the name out
  without checking the documentation for what that funny string means in
  that particular database that is TIGER?

I just had a machine crash as I was trying to find stats, but I'll bet that 
at least 90% of the cases are St, Ave/Av, and Blvd/Bl, with the 
occasional Ln and Cir/Cr thrown in. When there's a lone N, S, E, or W 
as a prefix to a street name, it's clear to everyone what that means. These 
are the same abbreviations that _everyone_ uses every day - children, 
adults, businesses, governments, etc.

Even when travelling to another country, it takes me very little time to 
understand what common abbreviations are used for in addresses.


  there is damage by doing it wrong, others have pointed to it already.

And I will do so again. My problem is mostly that this was done without a 
safety net. You clobbered existing data with no easy way to walk it back. 
The existing name value should have been put in a foo_name tag so we could 
at least see what used to be. I would at least encourage that a bot be run 
to find these edits, find the previous version in history, and do this, if 
we can't soon agree on a better schema to split the name up into components 
at the same time.


  I am not deep enough into the history of the abbreviations used and who
  defined them. But I am pretty sure there is a lot of errors.

Errors that I, and a lot of other mappers, painstakingly fixed by hand, 
based on ground surveys and research into public records. In particular, 
I'm worried about the cases where I spelled out North because it was 
actually part of the name, as opposed to a cardinal direction related to 
addresses, which I left alone, hoping to later move the latter directions 
to a addr:direction_prefix tag, while leaving the former along. I can no 
longer distinguish between the two.



I don't know who defined the ones used in TIGER but this is not the
only way to abbreviate the names, that is proven by USPS having their
own list that is not identical.  The most popular words will be the
same in both lists but some are really cryptic and arbitrary, could as
well be numeric codes.  Then TIGER also includes Spanish names and the
list has abbreviations for those too, which rarely anyone in US can
read, while they can cope with unabbreviated ok.

I don't agree. Much of the US speaks Spanish. Many more possess the 
tremendous brainpower and enoUGH grade-school Spanish required to know that 
Cl. in front of a street name might mean Calle or Cam. might mean Camino, 
or that S means Sur and N means Norte.


  - in the city I live there is no street sign with street, avenue, 
 boulevard,
   and even more surprising there are no abbreviations either. osm
  principle is to map what's on the ground. So tiger import is definitely
  wrong and expanding the names is also wrong. on the other hand postal
  address usually use it in one or the other form so it's not completely
  fiction.

Exactly. Many places in Orange County have the bad habit of leaving the 
suffix off the large street signs at intersections, perhaps as a way of 
saving space to reduce sign size and cost. Just because the big sign says 
just Orange doesn't mean that the street's real name is Orange Street, nor 
that it shouldn't be entered into any reasonable database or map that way. 
map what's on the ground is the wrong thing to do so often that I don't 
really understand why it was decided upon, nor why people continue hold it 
up on a pedestal, despite continuing problems with it.


For the record street signs on different ends of the same street often
use different forms and you'll sometimes find really strange
conventions, so while I agree mapping what's on the ground is good
because stuff can be confirmed, in this case it's not a solution.  In
many places you'll find the names are all caps on the signs but in a
local newspaper they're capitalized the usual way.

And the signs are sometimes wrong. In the thousands of streets I've 
photographed and mapped, I've corrected hundreds of signage 
errors/inconsistencies, often requiring substantial research into records, 
and resulting in notification of the appropriate authority to fix the 
records and/or signs (for free :( ).


  - many geocding engines do not find expanded names. even google doesn't in
  many cases. To me it looks like nearly anyone doesn't use the expanded name
  at all. So my question is is the expanded name really the correct name?

Exactly! Sounds like it's only useful purpose is text-2-speech. Here's what 
I'd like to see:

name: The pre-balrog name
name_direction_prefix: The 1-2 char cardinal direction before the root
use_name_direction_prefix: {yes|no} Yes indicates that the 
name_direction_prefix 

Re: [Talk-us] Admin boundaries tied to roads

2010-04-20 Thread Richard Welty
On 4/20/10 3:44 AM, Frederik Ramm wrote:
 Hi,

 Alan Mintz wrote:

 At 2010-04-19 10:45, Mike N. wrote:
  
I see that the separate VS tangled argument has been settled in the US by
 the Duplicate Node attack bots, who have blindly merged all duplicate
 nodes.

 http://www.openstreetmap.org/browse/way/38855677

 Is this really happening? Can someone describe exactly what criteria are
 being used, and just how it was decided that this was a good idea?
  
 It seems that someone is, more or less blindly, using the JOSM validator
 de-duplication. Doesn't look like a bot but, as Richard said, has
 similar results.

given the way that it is currently set up, i'll wager that a lot of less 
experienced
josm users are doing this, because the validator, in its current form, 
leads them
down this path.

richard


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-19 Thread Ian Dees
On Mon, Apr 19, 2010 at 12:45 PM, Mike N. nice...@att.net wrote:

 From an old message:

  I take the point that 'road realignment' may
  require the boundary also to move, but the word is MAY and so what ever
  happens
  to the road, the location of the boundary needs to be checked separately!
  It is
  quite surprising in the UK how many roads are being moved, but that does
  not
  also move the original boundary.

  I see that the separate VS tangled argument has been settled in the US by
 the Duplicate Node attack bots, who have blindly merged all duplicate
 nodes.

 http://www.openstreetmap.org/browse/way/38855677


When I imported GNIS last year, a fairly significant portion of the data
(2-5%) had POI with coordinates exactly the same as another POI (e.g. a post
office inside a town hall building). I wonder what these duplicate nod bots
are doing with those nodes...
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-19 Thread Richard Welty
On 4/19/10 1:45 PM, Mike N. wrote:
  From an old message:


 I take the point that 'road realignment' may
 require the boundary also to move, but the word is MAY and so what ever
 happens
 to the road, the location of the boundary needs to be checked separately!
 It is
 quite surprising in the UK how many roads are being moved, but that does
 not
 also move the original boundary.
  
I see that the separate VS tangled argument has been settled in the US by
 the Duplicate Node attack bots, who have blindly merged all duplicate
 nodes.

 http://www.openstreetmap.org/browse/way/38855677

i don't know if settled is the word for it, the debate is still open, 
but currently the
josm validator reports duplicate nodes as errors, and provides a fix 
button that
merges them. it's not fully automated like a bot, but the result is 
effectively the same.

richard


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] Admin boundaries tied to roads

2010-04-19 Thread Alan Mintz
At 2010-04-19 10:45, Mike N. wrote:
   I see that the separate VS tangled argument has been settled in the US by
the Duplicate Node attack bots, who have blindly merged all duplicate
nodes.

http://www.openstreetmap.org/browse/way/38855677

Is this really happening? Can someone describe exactly what criteria are 
being used, and just how it was decided that this was a good idea? Seems 
like the wrong thing to do - city and county boundaries are often defined 
in law, or by survey, and do not necessarily keep up with changes in road 
alignment. I have resisted editing most of these boundaries until/unless I 
take the time to research the true definition of the boundary.

Not to mention that merging them will result in the inability to hide these 
boundaries. When doing a bunch of editing on a road that follows one, in 
the past, I've taken the time to verify that the boundary doesn't share any 
nodes with anything and then remove it from my local OSM file manually so I 
don't have to constantly deal with it. If it shares nodes with anything 
else, this is no longer possible.

Sounds a lot like the IMO ill-considered road name expansion that was 
apparently agreed upon by a small group of people without input from the 
majority of active mappers whose work has been damaged.

--
Alan Mintz alan_mintz+...@earthlink.net


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us