Hi,
Your problem is similar to mahout naive bayes example for wiki
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
Hope this helps.
thanks
prashant
On 08/03/2011 03:23 AM, crookeddy wrote:
Hello all,
We are trying to split up a large XML file into chunks of even elements. To
be specific, we have a very large file with a root element and a few large
<doc></doc> elements. The goal is to split up the large file into a lot of
smaller files with an even number of<doc> elements in each file generated.
Each<doc> element is on each line, so I guess the specific question is how
do we tell a reducer to stop outputting to a file once it reaches a certain
number of lines.
Thanks for any help.
Oleg