RE: Monolithic documentation

Mick Semb Wever Tue, 07 Apr 2009 03:16:10 -0700

> OK, can we do it this way then?:
> 
> - how can I boost (say) one data supplier over the others (based on a
> properties switch, or database value)?


In Sesat-Kernel one single search request relates to a "mode". 
One mode delegates out to an unlimited number of "search-commands".
Each search command can be manipulated in various ways.
After they have all been run the mode can be manipulated in various
ways, for example federating the search command results together,
removing duplicates, etc.

The search commands are of different types: fast4, fast5, yahoo, solr,
etc etc; each with various specific attributes. There is also general
configuration for all command types on how to translate the query to the
index (query-builder), how to further customise the query
(query-transformers), and post processing of results (result-handlers).
This is all defined declaratively in the modes.xml file.

As mentioned there is also post processing on a mode. This is how the
search federation occurs, that is results from various commands can be
blended together into one pseudo-command making it looked like all
results actually came from one index (FederatorRunHandler.java)

To answer your question, boosting one supplier over another is the
algorithm you use in the blending of results in the federation.
Basic attributes for basic blending can be found in
FederatorRunHandlerConfig.java.
For now there are also three basic blending approaches, random, round
robin, and sequel.

It would be easy to add a blend algorithm that blended the results
together based on a "score" field from the index or database.

Boosting one index's score over another would be to use the
NumberOperationHandler (a result handler) on each result coming our of
the index to manipulate the value in the score field. For example
multiple the score field by some number. But any mathematical equation
supported by the JEP library can be used here so this can become
increasingly complex as your need requires. 

This presumes that the different supplier results are in different
indexes. 
If everything is in the one index then you could solve this by using the
index's rank profile (fast) or DispatchRequest's query fields (solr).
For Solr you can read up on
http://wiki.apache.org/solr/DisMaxRequestHandler which also has a boost
query and boost functions.

> - how could I do absolute boosting always displaying a certain supplier,
> also soft-switchable?

Using the NumberOperationHandler you would multiply the value in the
score field from results from that certain supplier against an extra
very high value to ensure they came first in the blending process.

> - what is the relevancy algorithm and can I create and apply my own
> relevancy algorithms without changing the base code? I might need a
> different one for games suppliers than I need for music suppliers for
> example.

The relevancy algorithm is usually done within the index.
For example in Solr you can use different DisMaxRequestHandlers that
will give you different ranking algorithms, or you can use the sort
parameter to ask for a particular type of sorting (eg by date, by size,
etc). Sorting works when one field in the index can be sorted on,
ranking algorithms are useful when the relevance is related to various
fields in each result.

Sesat-Kernel provides a nice navigation model around this to make it
easy to design the navigators on the webpage.

> - Do you support the Yahoo-style concepts, aka query direction, aka
> query steering, which identifies a "zone" of suppliers (named group)
> based on a query term? If so how?

Not that i know of. Is this part of Yahoo's Index Data Protocol (IDP)?

> - My suppliers will return their own interpretation of relevancy scores.
> How can I rebase them so I can compare one supplier's relevancy with
> another and prevent suppliers boosting their own stuff unfairly.

In this situation I would put each supplier into it's own index, and use
the the result handling and then federation approach as described in the
first question-answer paragraph above.
This would also make sense since each index can be refeed and optimised
to the supplier's needs. There is also isolation for security reasons.

> - Corporate animals like me don't have the time, inclination or
> equipment to build their own version. It looks like I would need to
> install Maven, JDK etc. Could you not offer prospective users a
> pre-built version with release notes, one for LINUX one for Windows? I
> think this would increase takeup, particularly in larger companies. FAST
> do this with Unity.

Sesat is not a commercial product. Neither does it have any companies
offering support for it, although you could enquire with T-Rank AS.

It is purely a code framework and suite of libraries. Even the
declarative side of affairs means building with a JDK and Maven a sesat
skin and deploying it along side the sesat-kernel in a java web
container like tomcat or jboss. So at some point you need a developer to
evaluate Sesat as a solution to your needs.

FAST Unity is not so different, less features overall and less
flexibility in the long run but a greater "wow" factor off the starting
block. Here you'll be paying instead to have the fast consultants do the
equivalent development behind the scenes. This generally leads to a
greater and greater need on the consultants and ends up being a far more
expensive route than just having your own developers in charge.

~mck

-- 
"When prosperity comes, do not use all of it." Confucius 
| semb.wever.org | sesat.no | sesam.no |

signature.asc
Description: This is a digitally signed message part

_______________________________________________
Kernel-development mailing list
[email protected]
http://sesat.no/mailman/listinfo/kernel-development

RE: Monolithic documentation

Reply via email to