[R] Confused - better empirical results with error in data

2009-09-07 Thread Noah Silverman
Hi, I have a strange one for the group. We have a system that predicts probabilities using a fairly standard svm (e1017). We are looking at probabilities of a binary outcome. The input data is generated by a perl script that calculates a bunch of things, fetches data from a database, etc.

Re: [R] Confused - better empirical results with error in data

2009-09-07 Thread S Ellison
Predicting whilst confused is unlikely to produce sound predictions... my vote is for finding out why before believing anything. Noah Silverman n...@smartmediacorp.com 09/07/09 8:33 PM Hi, I have a strange one for the group. We have a system that predicts probabilities using a fairly standard

Re: [R] Confused - better empirical results with error in data

2009-09-07 Thread Mark Knecht
On Mon, Sep 7, 2009 at 12:33 PM, Noah Silvermann...@smartmediacorp.com wrote: SNIP So, this is really a philosophical question.  Do we:    1) Shrug and say, who cares, the SVM figured it out and likes that bad data item for some inexplicable reason    2) Tear into the math and try to figure

Re: [R] Confused - better empirical results with error in data

2009-09-07 Thread Noah Silverman
You both make good points. Ideally, it would be nice to know WHY it works. Without digging into too much verbiage, the system is designed to predict the outcome of certain events. The broken model predicts outcomes correctly much more frequently than one with the broken data withheld. So,

Re: [R] Confused - better empirical results with error in data

2009-09-07 Thread Noah Silverman
You both make good points. Ideally, it would be nice to know WHY it works. Without digging into too much verbiage, the system is designed to predict the outcome of certain events. The broken model predicts outcomes correctly much more frequently than one with the broken data withheld. So,

Re: [R] Confused - better empirical results with error in data

2009-09-07 Thread Mark Knecht
On Mon, Sep 7, 2009 at 1:22 PM, Noah Silvermann...@smartmediacorp.com wrote: SNIP The data is listed in our CSV file from newest to oldest.  We are supposed to calculated a valued that is an average of some items.  We loop through some queries to our database and increment two variables -

Re: [R] Confused - better empirical results with error in data

2009-09-07 Thread Noah Silverman
Interesting point. Our data is NOT continuous. Sure, some of the test examples are older than others, but there is no relationship between them. (More Markov like in behavior.) When creating a specific record, we actually account for this in our SQL queries which tend to be along the lines