Re: [Jchat] Analyzing differences between periods with multiple variables changing

Raul Miller Mon, 31 Mar 2014 05:04:29 -0700

*Please* take this with a few grains of salt.

I'm flattening the data and then subtracting the average, you always wind
up in the kernel.


It's a perfect prediction if NaN is your idea of the perfect answer. (NaN
basically means any number you want is valid there, which is tautologically
correct but way too boring for most people.)

Put differently, you'll have to introduce some observer bias into your
system if you want to use anything like this approach, and that observer
bias is going to influence the results you get back out.

I'll stop here, for now.

Thanks,

-- 
Raul


On Sun, Mar 30, 2014 at 8:21 PM, Raul Miller <[email protected]> wrote:

> On Sun, Mar 30, 2014 at 5:12 PM, Joe Bogner <[email protected]> wrote:
>
>> > Not to mention a 25% reduction in cars...
>> >
>> > Yes, I don't think the reduction in cars matters here though
>
>
> Oh?
>
> You've a 9% reduction in cost and a 25% reduction in cars, that means
> you've a 21% increase in cost per car, which implies a significant increase
> from your gas guzzler which was compensated for by removing a car.
>
> If you factored in the costs of the cars themselves you'd probably have a
> very different picture (you'd have a penalty for disposal costs or a gain
> from sales or some mixed bag from marketing and accounting plans - but in
> any event your costs change based on how you judge them).
>
>
>> > I'd be tempted to set up a variety of simple models for cost, assume a
>> > linear correlation and then use %. to see what kinds of numbers I get
>> from
>> > those.
>> >
>> Models might be:
>> >
>> > constant cost
>> > linear cost based on speed
>> > cost based on square of speed
>> > cost based distance driven
>> > cost based on mpg
>> >
>>
>> This is somewhat similar to the path I was starting to go down. I'm not
>> exactly sure how to make your suggestions actionable yet in terms of a
>> model. If it's relatively simple to explain, I would be very interested
>> (and others may be too).
>>
>>
> colN=:3 :0
>   {.y&{"1`''
> )
> '`Period Car Hours Miles TotalCost'=: colN"0 i.5
> constant=: 1"1
> speed=: Miles % Hours
> sqspeed=: speed ^ 2:
> mpg=: Miles % TotalCost NB. assume cost is proportional to fuel consumed
> distance=: Miles
>
> Now, I've got a problem: I've got five models and I've only two months. So
> I'm not going to be able to do a good correlation of models to underlying
> data if I limit myself to monthly averages or totals or something like
> that. Instead, I'll have to consider the monthly data in aggregate.
>
> Month1=.(4 # 1),.(i. 4),.(4 # 0.5),.(30 30 30 15),.(4 # 2.75)
>
> Month2=.(3 # 2),.(0 1 99),.(3 # 0.5),.(30, 25, 40),.(2.75, 2.5,3.0)
>
> aggregate=: Month1,Month2
>
> Now, let's say I want to see how total cost correlates to each of these
> models. I think we're just concerned with trends here, so we should
> probably normalize magnitudes.
>
> normalize=: %"1 +/
>
> Now let's look at the data.
>
>    ((constant,.speed,.sqspeed,.mpg,.distance)%.&normalize TotalCost)
> aggregate
> 0.997644 1.00445 1.01226 1.00182 1.00445
>
> A perfect match would give us a 1. Smaller than 1 indicates an element of
> negative correlation while greater than 1 indicates an element of positive
> correlation. So these are all pretty close. So let's go with occam's razor
> (aka "pick the stupidest er... I mean simplest... thing that could possibly
> work") and say that the constant contribution is something a given and we
> want to focus on the changes which remain after removing that.
>
>    ((constant,.speed,.sqspeed,.mpg,.distance)%.&normalize TotalCost
> -&normalize constant) aggregate
> |NaN error
>
> Ouch.
>
> Looking at the underlying data:
>
>    (TotalCost -&normalize constant) aggregate
> 0 0 0 0 0 _0.012987 0.012987
>
> Our total cost is almost constant.
>
> Let's try blaming the square of the speed instead, just for comparison
> purposes:
>
>    ((constant,.speed,.sqspeed,.mpg,.distance)%.&normalize TotalCost
> -&normalize sqspeed) aggregate
> 4.31408e_32 4.90329e_17 8.94067e_17 4.17142e_17 4.90329e_17
>
> Almost nothing left, but at least it's not so close to the kernel that we
> get an error.
>
> Basically there's very little variation in this data, and almost any
> decision we make about assigning overall blame seems equally good (or
> almost equally bad). But maybe we knew that already when we noticed we had
> more models than months.
>
> I sort of cooked up this analysis mechanism on the fly - it's a bit
> opportunistic in character (you have to stumble across a well fitting
> model, it really only tells you whether models or maybe some combination of
> models are better or worse than each other). But hopefully it at least
> shows you what I was trying to put into words.
>
> I was either going to:
>> 1. Calculate the would-be cost by hold each variable constant. Example:
>> calculate the cost if the the miles were the same and the speed were the
>> same and then changing one at a time.
>>
>
> For that you need a model that takes you from miles to cost. But if you
> had that you might not need to do any analysis to see where it's going.
>
> That said, if you had enough data you might be able to approximate by
> selecting subsets of the data (based on one column) and pretending that
> your selection function is your model function.
>
>  2. Calculate the impact by the ratio of each change -- assuming each are
>> linear and on the same scale. 10% reduction in miles should be a 10%
>> reduction in cost assuming MPH is held constant... Something like that
>>
>
>  Same issue here, I imagine.
>
> I'll keep thinking and welcome all other ideas
>>
>> I have some crude code started here too that's using inverted tables:
>> https://gist.github.com/joebo/fd61043076beafeace30 , just to make it more
>> concrete
>>
>
> Keep in mind that ultimately you'll have to verify your math by converting
> what it says back into something more concrete. Ultimately it's someone's
> understanding and effort which makes the difference, and math is just a
> tool to give you alternate perspectives.
>
> Thanks,
>
> --
> Raul
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] Analyzing differences between periods with multiple variables changing

Reply via email to