Hi all,

In French, from the top of my head, I can think of : 

Rue, Ruelle, Avenue, Boulevard, Quai, Chaussée, Route, Cour, Cours, Cité, 
Chemin, Place, Esplanade, Passage, Allée, Carrefour, Sentier, Square, Villa.

This list is without a doubt not complete but should cover more than 95% of 
named addresses in France.

They should only be ignored from index if they're in the first place and 
followed by anything else.


Cheers,
Paco

Le 14 févr. 2015 à 08:50, Marko Mäkelä <[email protected]> a écrit :

> On Thu, Feb 12, 2015 at 01:24:29PM +0000, Steve Ratcliffe wrote:
>> So finally I will merge the mixed index branch.
> 
> I believe that the database terminology for this is 'inverted index' or 
> 'fulltext index'.
> 
>> I think it would be best to selectively enable it per country along with 
>> lists of names to avoid. This would be best done by people from or familiar 
>> with the countries in question.
> 
> In fulltext search, these are called 'stopwords'.
> 
> It might not be necessary to do anything to for countries where street names 
> are commonly written as a single word. Example: "Main Street" would be 
> "Hauptstrasse" in German, "Huvudgatan" in Sweden and "Päätie" in Finnish. 
> Only if the first part of the street name is a proper name such as a person's 
> name, the second part could be written as a separate word, separated by a 
> space or dash.
> 
> That said, I guess it would still make sense to introduce some stopwords. 
> Words that I can think of:
> 
> Swedish: gata, gatan, gränd, gränden, stig, stigen, (stråk, stråket)
> Finnish: tie, katu, polku, kuja, (raitti, taival)
> German: Straße, Strasse, Weg, Allee, Chaussee
> Estonian: mnt, maantee, tn, tänav, pst, puiestee
> 
> In Estonia, it seems to be common to write the tn, mnt or pst as a separate 
> word.
> 
> I could be missing some stopwords in Estonian and for German-speaking 
> countries. Also, it could be that the French loan words Allee and Chaussee 
> are sometimes accented.
> 
> The Finnish and Swedish words that I have put in parenthesis should be very 
> rare, typically used for ways for non-motorized traffic.  I don't think that 
> including them would pollute the index much. You might in fact want to search 
> for such a name when you are looking for a nice walking or cycling route 
> (i.e., you expect there to exist some random-famous-person-name-stråket, but 
> you do not know the random name).
> 
>       Marko
> _______________________________________________
> mkgmap-dev mailing list
> [email protected]
> http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________
mkgmap-dev mailing list
[email protected]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Reply via email to