Hi all,

wouldn't it be easier to let mkgmap report those words
which appear in more than n (e.g. 20) roads and use that list to 
produce a user-defined list of stop-words?

Gerd


> From: [email protected]
> Date: Sat, 14 Feb 2015 15:06:16 +0100
> To: [email protected]
> Subject: Re: [mkgmap-dev] mixed index branch merge
> 
> Hi all,
> 
> In French, from the top of my head, I can think of : 
> 
> Rue, Ruelle, Avenue, Boulevard, Quai, Chaussée, Route, Cour, Cours, Cité, 
> Chemin, Place, Esplanade, Passage, Allée, Carrefour, Sentier, Square, Villa.
> 
> This list is without a doubt not complete but should cover more than 95% of 
> named addresses in France.
> 
> They should only be ignored from index if they're in the first place and 
> followed by anything else.
> 
> 
> Cheers,
> Paco
> 
> Le 14 févr. 2015 à 08:50, Marko Mäkelä <[email protected]> a écrit :
> 
> > On Thu, Feb 12, 2015 at 01:24:29PM +0000, Steve Ratcliffe wrote:
> >> So finally I will merge the mixed index branch.
> > 
> > I believe that the database terminology for this is 'inverted index' or 
> > 'fulltext index'.
> > 
> >> I think it would be best to selectively enable it per country along with 
> >> lists of names to avoid. This would be best done by people from or 
> >> familiar with the countries in question.
> > 
> > In fulltext search, these are called 'stopwords'.
> > 
> > It might not be necessary to do anything to for countries where street 
> > names are commonly written as a single word. Example: "Main Street" would 
> > be "Hauptstrasse" in German, "Huvudgatan" in Sweden and "Päätie" in 
> > Finnish. Only if the first part of the street name is a proper name such as 
> > a person's name, the second part could be written as a separate word, 
> > separated by a space or dash.
> > 
> > That said, I guess it would still make sense to introduce some stopwords. 
> > Words that I can think of:
> > 
> > Swedish: gata, gatan, gränd, gränden, stig, stigen, (stråk, stråket)
> > Finnish: tie, katu, polku, kuja, (raitti, taival)
> > German: Straße, Strasse, Weg, Allee, Chaussee
> > Estonian: mnt, maantee, tn, tänav, pst, puiestee
> > 
> > In Estonia, it seems to be common to write the tn, mnt or pst as a separate 
> > word.
> > 
> > I could be missing some stopwords in Estonian and for German-speaking 
> > countries. Also, it could be that the French loan words Allee and Chaussee 
> > are sometimes accented.
> > 
> > The Finnish and Swedish words that I have put in parenthesis should be very 
> > rare, typically used for ways for non-motorized traffic.  I don't think 
> > that including them would pollute the index much. You might in fact want to 
> > search for such a name when you are looking for a nice walking or cycling 
> > route (i.e., you expect there to exist some 
> > random-famous-person-name-stråket, but you do not know the random name).
> > 
> >     Marko
> > _______________________________________________
> > mkgmap-dev mailing list
> > [email protected]
> > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> 
> _______________________________________________
> mkgmap-dev mailing list
> [email protected]
> http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
                                          
_______________________________________________
mkgmap-dev mailing list
[email protected]
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Reply via email to