Welcome to the list.

This is hard with no quick and easy answers.  For a similar index, but
books rather than music, I index author and title separately into 2
fields, author and title combined into another field, author and title
and blurb and whatever all combined into yet another field.  Each
search generates a complex BooleanQuery against all the indexed fields
with high boost for the separate title and author fields, medium boost
for title + author, low or no boost for the catchall field.  There are
also boosted span queries to try and make multiple words close
together in the right order score higher.

Then there is custom (de)boosting of individual docs based on things
like publication date, format, language, rating and more.

Thinking about a couple of your examples along these lines:

"Queen".  A high boost search on artist might be enough since lucene
will score artist: Queen higher than artist: Stupid Queen Tribute
Band.  Because artist names are typically short and well known I might
also consider a case-normalized exact match (Keyword) search with an
even higher boost.

"queen bohemian rhapsody" is harder.  Do you have other data fields
you can use to differentiate between the original and the cover
version?  You could build some sort of relevance field at indexing
time where known artists count +ve and words like "tribute" or "cover"
count -ve and build that into the scoring.

Hope that helps.  Good luck!


--
Ian.


On Sun, Jan 15, 2012 at 6:19 AM, Johnny Marnell <johnnymarn...@gmail.com> wrote:
> hi all,
>
> short of it: i want "queen bohemian rhapsody" to return that song named
> "Bohemian Rhapsody" by the artist named "Queen", rather than songs with
> titles like "Bohemian Rhapsody (Queen Cover)".
>
> i'm indexing a catalog of music with these types of docs and their fields:
>
> artist (artistName), album (albumName, artistName), and song (songName,
> albumName, artistName).
>
> the client is one search box, and i'm having trouble handling searching
> over multiple multifields and weighting their exactness.  when a user types
> "queen", i want the artist Queen to be the first hit, and then albums &
> songs titled "queen".
>
> if "queen bohemian rhapsody" is searched, i want to return that song, but
> instead i'm getting songs like "Bohemian Rhapsody (Queen Cover)" by "Stupid
> Queen Tribute Band" because all three terms are in the songName, i'm
> guessing.  what kind of query do i need?
>
> i'm indexing all of these fields as multi-fields with ngram, shingle (i
> think this might be really useful for my use case?), keyword, and standard.
>  that appears to be working, but i'm not sure how to combine all of this
> together over multiple multi-fields.
>
> if anyone has good links to broadly summarized use cases of Indexing and
> Querying, that would be great - i would think this would be a common
> situation but i can't find any good resources on the web.  and i'm having
> trouble understanding scoring and boosting.
>
> this was my first post, hope i did it right, thanks so much!
>
> -j

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to