Re: Getting Started with Classification

Robin Anil Wed, 22 Jul 2009 18:51:04 -0700

Did you try CBayes. Its supposed to negate the class imbalance effect
to some extend




On Thu, Jul 23, 2009 at 5:02 AM, Ted Dunning<[email protected]> wrote:
> Some learning algorithms deal with this better than others.  The problem is
> particularly bad in information retrieval (negative examples include almost
> the entire corpus, positives are a tiny fraction) and fraud (less than 1% of
> the training data is typically fraud).
>
> Down-sampling the over-represented case is the simplest answer where you
> have lots of data.  It doesn't help much to have more than 3x more data for
> one case as another anyway (at least in binary decisions).
>
> Another aspect of this is the cost of different errors.  For instance, in
> fraud, verifying a transaction with a customer has low cost (but not
> non-zero) while not detecting a fraud in progress can be very, very bad.
> False negatives are thus more of a problem than false positives and the
> models are tuned accordingly.
>
> On Wed, Jul 22, 2009 at 4:03 PM, Miles Osborne <[email protected]> wrote:
>
>> this is the class imbalance problem  (ie you have many more instances for
>> one class than another one).
>>
>> in this case, you could ensure that the training set was balanced (50:50);
>> more interestingly, you can have a prior which corrects for this.  or, you
>> could over-sample or even under-sample the training set, etc etc.
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: Getting Started with Classification

Reply via email to