Thank you as always for the thoughtful and detailed reply. I'm still working through the examples
On Sun, Mar 30, 2014 at 8:21 PM, Raul Miller <[email protected]> wrote: > > You've a 9% reduction in cost and a 25% reduction in cars, that means > you've a 21% increase in cost per car, which implies a significant increase > from your gas guzzler which was compensated for by removing a car. > > Yes, but I don't think it's correct to spread the cost across all the cars for what I'm trying to achieve. > If you factored in the costs of the cars themselves you'd probably have a > very different picture (you'd have a penalty for disposal costs or a gain > from sales or some mixed bag from marketing and accounting plans - but in > any event your costs change based on how you judge them). > > I agree, that could be an interesting way to look at at it. > colN=:3 :0 > {.y&{"1`'' > ) > '`Period Car Hours Miles TotalCost'=: colN"0 i.5 > In the back of my head I was hoping there was a way to select columns by name without having to define a verb for each. This is a really neat approach. Thank you for sharing. > ((constant,.speed,.sqspeed,.mpg,.distance)%.&normalize TotalCost) > aggregate > 0.997644 1.00445 1.01226 1.00182 1.00445 > > It took me a little bit to figure out what this was doing. There are many ways to describe it, but my simple explanation is that it's testing for the relationship between % of variable to the % of total cost. It's doing an ordinary least squares or linear regression on each of the inputs that have been transformed to be the % of the column's total For example, the speed column above was 1.00445, which can be calculated on its own as: ] (normalize speed aggregate)%.(normalize TotalCost aggregate) 1.00445 In R terms, that would be: > coefficients(lm(z$Speed~z$TotalCost+0)) z$TotalCost 1.004446 Where z was defined as: z<-lapply(df, function(x) { x/sum(x) }) And df is a data.frame of the values df <- data.frame(Period=c(rep(1,4), rep(2,3)), Car=c(0,1,2,3,0,1,99), Hours=rep(0.5,7), Miles=c(30,30,30,15,30,25,40), TotalCost=c(rep(2.75,4),2.75,2.5, 3) ) df$Speed <- df$Miles / df$Hours I am getting my head wrapped around why it's ] (normalize TotalCost aggregate)%.(normalize speed aggregate) I would have thought that it was the ratio of TotalCost as a function of the ratio of Speed where TotalCost is the dependent variable. I will have to think this through more. > A perfect match would give us a 1. Smaller than 1 indicates an element of > negative correlation while greater than 1 indicates an element of positive > correlation. So these are all pretty close. So let's go with occam's razor > (aka "pick the stupidest er... I mean simplest... thing that could possibly > work") and say that the constant contribution is something a given and we > want to focus on the changes which remain after removing that. > > Makes sense > ((constant,.speed,.sqspeed,.mpg,.distance)%.&normalize TotalCost > -&normalize constant) aggregate > |NaN error > > Ouch. > > Looking at the underlying data: > > (TotalCost -&normalize constant) aggregate > 0 0 0 0 0 _0.012987 0.012987 > > Now it makes sense why constant is included. It simplifies removing the constant contribution concept. Said a different way, I think it lets us test whether each variable changes at the same rate relative to Total Cost. More thought needed here. > Our total cost is almost constant. > > Let's try blaming the square of the speed instead, just for comparison > purposes: > > ((constant,.speed,.sqspeed,.mpg,.distance)%.&normalize TotalCost > -&normalize sqspeed) aggregate > 4.31408e_32 4.90329e_17 8.94067e_17 4.17142e_17 4.90329e_17 > > Almost nothing left, but at least it's not so close to the kernel that we > get an error. > > Basically there's very little variation in this data, and almost any > decision we make about assigning overall blame seems equally good (or > almost equally bad). But maybe we knew that already when we noticed we had > more models than months. > > I'll work up a few more examples and test out this concept more. It looks like it has some potential Thanks ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
