Hi everybody -
I've formed a Kaggle team to tackle the Higgs Boson problem discussed
earlier on these forums. There are still two slots open if anyone else
would like to join my team - first come, first served. I, myself may be
somewhat busy with other things over the next few weeks as I've a trip
planned to England and I'll be going to the J conference on July 24-25.
Presumably I'll have finished crafting a talk by then as well.
If anyone wishes to form a competing team, please do. In the interests of
promoting a good J turn-out, I'm making available the the trivial code I've
put together so far for the the basic grunt work of reading in the input
files and making a file to submit (below).
Good luck!
Devon
--
Devon McCormick, CFA
---
require 'tables/dsv'
NB.* getTrainingData: break apart .csv file of training data into useful
NB. arrays.
getTrainingData=: 3 : 0
'trntit trn'=. split (',';'') readdsv y NB. Split title row from data
trn=. }:"1 trn [ lbls=. ;_1{"1 trn NB. Separate character labels from
numbers
trn=. }."1 trn [ evid=. 0{"1 trn NB. Pull off event IDs as vec
trn=. ".&>trn NB. Data as simple numeric mat
trn=. }:"1 trn [ wts=. _1{"1 trn NB. Pull off weights column as vec
trntit;lbls;wts;evid;<trn
NB.EG 'trntit lbls wts evid trn'=. getTrainingData 'Data/training.csv'
)
NB.* getTestData: read .csv file of test data into useful arrays.
getTestData=: 3 : 0
'tsttit tst'=. split (',';'') readdsv y NB. Split title row from data
tst=. }."1 tst [ evidt=. 0{"1 tst NB. Pull off event IDs as vec
tst=. ".&>tst NB. Test data as simple
numeric mat
tsttit;evidt;<tst
NB.EG 'tsttit evidt tst'=. getTestData 'Data/test.csv'
)
NB.* calcAMS: calculate AMS metric based on R code in AMSscore.R.
calcAMS=: 3 : 0
'wts actual guesses'=. y NB. Weights vector, actual values, predicted
values.
NB. Sum signal weights according to predicted label 's' or 'b'.
's b'=. guesses +//. wts*actual='s'
's b'=. ('s'~:{.guesses)|.s;b NB. Correct order from guess order
br=. 10 NB. Regularization term to reduce variance of measure.
%:+:s-~(s+b+br)*^.>:s%b+br
)
NB.** Example of using the above code to create a submission file:
NB. 1!:44 pp=. {path to code}
'trntit lbls wts evid trn'=. getTrainingData 'Data/training.csv'
'tsttit evidt tst'=. getTestData 'Data/test.csv'
NB. Initial attempt: simple regression.
coeffs=. (lbls='s')%.trn NB. Regress s=1 on factors
ests=. trn+/ . * coeffs NB. Estimates based on regression
's'+/ . = lbls NB. Number of signals
85667
lbls #/. lbls NB. # 's' vs. 'b'
85667 164333
('s'={.lbls)|.lbls #/. lbls NB. Put 'b' # first
164333 85667
threshold=. (/:~ests){~-'s'+/ . = lbls NB. Guess threshold for correct
# of each signal
(ests>:threshold) #/. ests NB. Verify correct number 'b' vs. 's'
164333 85667
guess1=. 'bs'{~ests>:threshold NB. Estimates to labels: 's' signal, 'b'
background.
calcAMS wts;lbls;guess1 NB. Measure of goodness is 19.8468
(higher is better)
19.8468
NB. Build submission file.
submission=.
('EventId';'RankOrder';'Class'),(":&.>350000+i.#ests),.(":&.>>:/:/:ests),.<"0
guess1
submission writedsv 'Data/NEJedi0.csv';',';''
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm