You could theoretically use Solr synonyms to expand the actual sharp
character (♯) to BOTH "#" AND "sharp". At index time. I guess at query
time you'd need to expand it to just one or the other -- I think
expanding to two things at query time is going to be a mess. I haven't
tried this myself, tried using Solr synonyms to expand something to more
than one alternative. I think Solr synonym analyzer supports it, but I
expect, based on what I know of how Solr works, that there will be some
gotchas.
I _do_ notice actual sharp and flat symbols in my library MARC data for
musical pieces, catalogers apparently do enter them sometimes. As most
users probably don't know how to (or won't think to) enter sharp and
flat characters directly, if it's important that these titles be
findable including the sharp/flat part, it seems like something has to
be done. But I haven't gotten to it yet. (Unless maybe all these library
records already have alternate titles listed in 246 or whatever using
straight ascii of some kind, I don't know).
In general, I've been able to avoid having to expand to multiple
synonyms -- but cant' really do that with ♯, #, 'sharp', I think,
precisely because '#' is not always a sharp sign, it can be other things
too, so you don't want to collapse all....
Wait, maybe just map ♯ to "#"? At both query and index time. Then user
can't search for "F sharp", but they can search for either "F♯" or "F#",
and both will match original source "F♯". That seems the simplest
solution. Although it would still be neat to play around with synonym
expansion to see if you can make "F sharp" at query time match too.
On 5/31/2011 12:05 PM, Thomas Dowling wrote:
Many thanks.
I like the idea of catching the sharp and flat symbols - the only problem
is that lazy music students tend to use "#" and "b". ("Concerto in F#
minor for Bb Bass Clarinet").
Thomas
On 05/31/2011 11:59 AM, Jonathan Rochkind wrote:
Multi-word synonyms are tricky.
You probably want to make sure this synonym is only expanded at index
time, and not at search time. See some background in the
SynonymFilterFactory section of
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
I think the synonym approach is a fine way to search for greek letters by
name; it's possible some of the new Unicode stuff in Solr 3.1 might expand
greek letters too, but I think actually probably not (because you don't
neccesarily want that in the general case), I think synonyms is probably
your best bet. (Same for things like expanding the musical sharp or flat
glyph to "sharp" or "flat", which I've considered).