Christoph, Good to see you.
Can you look at the implementation of LogLikelihood and try to figure out how your new code could integrate with that? The log likelihood ratio test that is used in Mahout is very closely related to mutual information and anything you add should point to and have pointers from that implementation. Also, can you suggest use cases for these additional functions? On Wed, Jun 29, 2011 at 2:33 AM, Christoph Nagel (JIRA) <[email protected]>wrote: > Entropy implementation in Map/Reduce > ------------------------------------ > > Key: MAHOUT-747 > URL: https://issues.apache.org/jira/browse/MAHOUT-747 > Project: Mahout > Issue Type: New Feature > Components: Math > Affects Versions: 0.6 > Reporter: Christoph Nagel > > > Hi again, > > because I got much to work with entropy and information gain ratio, I want > to implement the following distributed algorithms: > * Entropy ( > https://secure.wikimedia.org/wikipedia/en/wiki/Entropy_%28information_theory%29 > ) > * Conditional Entropy ( > https://secure.wikimedia.org/wikipedia/en/wiki/Conditional_entropy) > * Information Gain > * Information Gain Ratio ( > https://secure.wikimedia.org/wikipedia/en/wiki/Information_gain_ratio) > > This issue is at first only for entropy. > > Some questions: > * In which package do the classes belong. I put them first at > 'org.apache.mahout.math.stats', don't know if this is right, because they > are components of information retrieval. > * Entropy only reads a set of elements. As input i took a sequence file > with keys of type Text and values anyone, because I only work with the keys. > Is this the best practise? > * Is there a generic solution, so that the type of keys can be anything > inherited from Writable? > > In Hadoop is a TokenCounterMapper, which emits each value with an > IntWritable(1). I added a KeyCounterMapper into > 'org.apache.mahout.common.mapreduce' which does the same with the keys. > > Will append my patch soon. > > Regards, Christoph. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > >
