related application

Mathieu Lecarme Fri, 15 Jun 2007 14:53:32 -0700

Compass use a trick to manage father-son indexation.

If you index "collection", with a fields Date, wich are the newestpicture inside, and putting all picture's keyword to it collection?

Then, with a keyword search, you will find the collection with themost tag occurence number and date score.It could be quick and clean, each night, you recompute score with agefactor, your magic bonus and th random stuff. Iterating over only 300things.The problem is if your photograph knows how it works and cheatingwith uploading every day a new silly photo, just for boosting alltheir collection.

If age of picture in a same collection are not too differents, thistrick can be magic, if new pictures are uploaded often, it willgenerate silly score. Maybe with small pack of picture with near dateand collection score indexed and merging the firts same collectionpart after sorting?


M.

Le 15 juin 07 à 22:33, Antoine Baudoux a écrit :

Well maybe i didnt explain my problem very well. I have a databasewith over 3 million images, with each image belonging to one out of300 possible collections. A query could return more than 100.000images (for example if they search for a popular image keyword).
I want to sort my result with a combination of image date/collection scoring : Recent images with high-score collections comefirst, then recent image with lower collection. As the usernavigate through the images, less recent images start to appear,also sorted by collection score.
So imagine i make a query for "nature". This query would return100.000 images. Your sugestion is that i dont make any attempt tosort the images in Lucene.i just make a query with no sort. I wouldthen need to load the 100.000 rows from my database, then sortthose 100.000 rows with my custom-defined ordering. Are-you surethis method would be faster than custom Query, or a ValueSourcequery? Since lucene already indexes and caches field values, I'mnot very sure.
Walt explain differently what I said.
Lucene can be efficiently use for selecting objects, withoutsorting or scoring anything, then, with id stored in Lucene, youcan sort yourself with a simple Sortable implementation.The only limit is that lucene gives you not too much results, withyour 300 maximal responses, you can play with it easily.
M.
Le 15 juin 07 à 19:07, Walt Stoneburner a écrit :
Antoine Baudoux writes:
I want to be able to give a score to each collection.
Keep in mind, Lucene is computing a score based on quite a number of
things from how often a term is used in a document, how often it
appears in the collection of documents, how long the query is, etc.
If your concept of a document's score changes, then I'd beinclined tothink you're possibly using Lucene in a manner it wasn't designedfor.
That said, I have two thoughts.

THOUGHT ONE
Use Lucene to locate "records" for you --- what you really are
interested in getting back _from Lucene_ is the primary key.  Then,
use this key to do a lookup in your database of the score of the day
and sort accordingly.  The idea is that Lucene finds, your table
scores, and because of that you won't need to re-index whensomething
changes.

THOUGHT TWO
Use boosting.  COLLECTION_ONE^5  COLLECTION_THREE^10  etc.  That way
/if/ the Lucene document appears in the collection, it's score is
weighted according to your preferences.  You're free to change the
boosts on a query-by-query basis without having to re-index.
I can use a Very big ... query ... I am afraid that it will beslow.
Try it. I think you'll find Lucene is _fast_. We do some prettyHUGE
and complicated queries and Lucene just screams.
I can add another field to each document, containing a computed
custom score, then i could sort on that field. But i want to avoid
this solution at all costs, since it would mean re-indexing all the
documents each time the collection scores change.
Or, use indirection - instead of keeping the score, keep the primary
key of a score table.  Then in a database, where speed won't be the
issue, perform the look up.   Honestly, if you're only got 300
categories, you could keep that simple table in memory using less
space than a small text file.
I would also like to implement random-sorting. ... Is it a goodsolution?
Is there another way to do it?
This really, really, really feels like you're force fittingLucene to
do some business logic piece of a larger application.  May I be so
bold as to ask what's the _actual_ problem you're trying to solve.
("I'm trying to make a hole in a piece of oak" as opposed to "What's
the best way to sharpen a Phillips screwdriver enough to cut wood?")
Keep in mind that the forum is for Lucene, so parts of yourquestions
may be answered outside of the forum.

-wls
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Several questions about scoring/sorting + random sorting in an image/related application

Reply via email to