Yes I thought of High Street and Victoria Street shortly after sending the 
email. But you could get rid of High, Victoria and Street from the index and 
still keep the full name in the index. It would work in English but not very 
well where street and avenue is at the beginning.

Hhhm, probably better to have a country exclusion list not to discard the likes 
of Victoria and High in the UK if the counting algorithm is used.

Geoff.

Steve Ratcliffe <[email protected]> wrote:
>On 06/08/13 18:57, Geoff Sherlock wrote:
>> Hi Steve,
>>
>> When you collect the data for the index you could also increment a
>count
>> for each word. Then only add the word to the index if the count is
>less
>> than a optional value (default say 10000). This should work for most
>> languages and reduce the size of the index, although it will require
>> more memory for compiling the map.
>
>I was looking into doing something like that. Turns out though that it
>is not as easy as it sounds. So for example, in English, the words
>'the' 
>and 'square' are top words that could be removed. Yet there are
>names such as 'The Square' and there are a whole bunch of similar
>problems.
>
>Ideally we need methods that fail in a safe way by only rejecting a
>word if it it (reasonably) certain that it should not be there. At
>the moment I am thinking that this will probably require language 
>specific rules.
>
>..Steve
>
>_______________________________________________
>mkgmap-dev mailing list
>[email protected]
>http://lists.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________
mkgmap-dev mailing list
[email protected]
http://lists.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Reply via email to