[
https://issues.apache.org/jira/browse/MAHOUT-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059600#comment-13059600
]
Sean Owen commented on MAHOUT-747:
----------------------------------
Yes that's what I was getting at originally... you can't really distribute this
entirely. The counting is distributable though, yes.
I don't think there's a strong convention for writing null,value or value,null
when outputting a single value. I think that since it saves you writing another
class, you can just go with null,value.
There are some other items here I'd like to tweak in the code but they are
small. For example I may want to move or merge some of these classes. But yes I
can take a look soon and put it in. It's a simple job that does something
useful, so seems OK to add.
> Entropy implementation in Map/Reduce
> ------------------------------------
>
> Key: MAHOUT-747
> URL: https://issues.apache.org/jira/browse/MAHOUT-747
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.6
> Reporter: Christoph Nagel
> Attachments: MAHOUT-747.patch
>
>
> Hi again,
> because I got much to work with entropy and information gain ratio, I want to
> implement the following distributed algorithms:
> * Entropy
> (https://secure.wikimedia.org/wikipedia/en/wiki/Entropy_%28information_theory%29)
> * Conditional Entropy
> (https://secure.wikimedia.org/wikipedia/en/wiki/Conditional_entropy)
> * Information Gain
> * Information Gain Ratio
> (https://secure.wikimedia.org/wikipedia/en/wiki/Information_gain_ratio)
> This issue is at first only for entropy.
> Some questions:
> * In which package do the classes belong. I put them first at
> 'org.apache.mahout.math.stats', don't know if this is right, because they are
> components of information retrieval.
> * Entropy only reads a set of elements. As input i took a sequence file with
> keys of type Text and values anyone, because I only work with the keys. Is
> this the best practise?
> * Is there a generic solution, so that the type of keys can be anything
> inherited from Writable?
> In Hadoop is a TokenCounterMapper, which emits each value with an
> IntWritable(1). I added a KeyCounterMapper into
> 'org.apache.mahout.common.mapreduce' which does the same with the keys.
> Will append my patch soon.
> Regards, Christoph.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira