On 10/06/2014 18:40, Ted Dunning wrote:
> On Tue, Jun 10, 2014 at 8:08 AM, Lee Goddard <lee...@gmail.com
> <mailto:lee...@gmail.com>> wrote:
>
> Is it possible to weight the individual initials as words?
>
> Would you recommend employing a stemmer?
>
>
> Yes it is definitely possible. But don't just use any stemmer. You
> need to adapt something so that you preserve initial letters and
> likely uses heuristics such as possibly preserving case.
Am I going to have to write a parser in Java for that, or is it a matter
of combing what is in the box? I've previously created indexes of photos
(my own parser) and indexes of documents, but indexing a single company
name is quite a new idea to me.
You will also probably want to include alternative forms in other
> fields. These would include nicknames, stock symbols and
> abbreviations.
Not in this — it's simply an interface to find information held by the
state on the affairs of a company, so the alternative forms are of the
final element of the company registered name: it might be 'Limited' but
people may search 'ltd', it may be 'SE' but people may search 'european'.
TIA
Lee