> -----Original Message----- > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > As for what hyperbolicTf is trying to do ... it creates a hyperbolic function > letting you specify a hard max > no matter how many terms there are.
A picture -- or more precisely a graph -- would be worth a 1000 words. As it says in issue 577 "a hyperbolic tf function which is best explained by graphing the equation". That's great, but I couldn't find " Mark [Bennet's] nifty graph [...] (linked from his email)." Can anyone provide any help locating what sounds like a useful resource? The JavaDoc (which Chris probably also wrote way back when), says hyperbolic TANGENT function (http://www.dplot.com/fct_tanh.htm ). At least that clarifies the basic shape, even if I (and apparently others judging from the yearly questions on the Lucene list) have yet to work out the full impact of all the parameters and how hyperbolic tangent might compare to the 1 / sqrt( freq + C ) of the baseline which I believe, if used with the defaults, degenerates to DefaultSimilarity.tf formula. Another problem mentioned in the e-mail thread Chris linked is "people who know the 'sweetspot' of their data.", but I have yet to find a definition of what is meant by "sweetspot", so I couldn't say whether I know my data's sweet spot or not. Another question is how the tf_hyper_offset parameter might be considered. It appears to be the inflexion point of the tanh equation, but what term count might a caller consider centering there ( or consider being the approx. area that the graph is "mostly" level) ? Or more simply why 10? Any thoughts from anyone? I also note that the JavaDoc says that the default tf_hyper_base ("the base value to be used in the exponential for the hyperbolic function ") value is e. But checking the code the default is actually 1.3 (less than half e). Should I file a doc bug? To summarize: Does anyone have any resources along the lines of graphs of these (or any other) tf functions, general discussion of document collection sweet spot, and any insight into parameters of this class (hyperbolic tangent or otherwise)? -Paul > > : > And I am aware that SweetSpotSimilarity resulted from this paper > : > > : > http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf > > For the record, that paper did not result in SSS -- I wrote SSS ~Dec 2005 and > contributed it to Apache a > few months later on behalf of CNET Networks where i developed it to solve > some specific problems > we had with product data... > > https://issues.apache.org/jira/browse/LUCENE-577 > http://mail-archives.apache.org/mod_mbox/lucene-dev/200605.mbox/%3CF9F270C4-FA1E-460F- > A54F-E2E56AAD0286%40rectangular.com%3E > (and subsequent replies) > > ...Doron wrote the paper later, although you'll note lots of dicsussions > arround that time on the > mailing list about customizing Similarity based on domain specific data -- > the concepts certainly weren't > novel. > > : > However, I was wondering if there was a resource that explained (and gave > examples) of how SSS > : > works and what each parameter (hyperbolic, etc) means. I know this is a > Lucene list but I am > actually > > The functions are pretty clearly spelled out in the javadocs -- you just set > the options on the class to > control the constant values of the functions. The easiest way to understand > them is probably to use > something like gnuplot to graph them using various values for the constants, > and then compare to > graphs of the corrisponding functions from DefaultSimilarity. > > > > > -Hoss > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org