While reviewing Decision Forest code, I noticed that computing the "out of
bag" error (OOB) of the forest while training it made the implementation
really messy. I made a lot of assumptions about the way Hadoop works
internally (especially the way it splits the data), this proven many times
to be buggy because with each new version of Hadoop I hade to "tweak" the
code to make it run.

So I am asking the users and developers alike: is computing the OOB really
necessary ? if yes, I will spend the time to figure out a better way to
compute it, but if no I will just get rid of it for now and leave a JIRA
issue about getting it back again if someone actually need it.

Reply via email to