Hi,
Your problem is similar to mahout naive bayes example for wiki
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example Hope this helps.

thanks
prashant
On 08/03/2011 03:23 AM, crookeddy wrote:
Hello all,

We are trying to split up a large XML file into chunks of even elements. To
be specific, we have a very large file with a root element and a few large
<doc></doc>  elements. The goal is to split up the large file into a lot of
smaller files with an even number of<doc>  elements in each file generated.

Each<doc>  element is on each line, so I guess the specific question is how
do we tell a reducer to stop outputting to a file once it reaches a certain
number of lines.

Thanks for any help.

Oleg

Reply via email to