Re: [FRIAM] Subtle problem with BI

Mikhail Gorelkin Sat, 03 Mar 2007 21:10:38 -0800

The situation is just opposite :-) Now we develop simple, static (and well
(without contradictions) and fully defined from the beginning) let's say
Newtonian-type of software systems. I mean mainstream software development;
particularly, IT systems. But to better ( without over-simplifications and
lots of assumptions :-) ) reflect / simulate the nature of modern business
(dynamic, chaotic in some aspects, and constantly evolving) we need to learn
to build soft-systems with the same characteristics. (I am afraid that today
we just offer to business some technological models without estimation of
their adequacy for particular cases. How about creating a theory-like
business analysis? :-) ) It's another models. Another level! Dynamic,
self-adaptive ( without any refactoring :-) ), self-evolving, and complex
software. For example, the global IT system as a whole for a big and
globally distributed company ( including all transactions :-) ). As we grow,
our perception of the world, like a data-collecting system, is changing
constantly and... we cope with this fact! and we make better and better
decisions. So, each evolving system...




--Mikhail


----- Original Message ----- 
From: "Jack Stafurik" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Saturday, March 03, 2007 3:58 PM
Subject: Re: [FRIAM] Subtle problem with BI


> These are issues I (and many others) have grappled with for many years. I
> have strong opinions that deftly straddle both sides. So - I can't be 
> wrong!
> To address the points Mikhail raised, I'll use the context of using data 
> to
> predict sales.
>
> 1) "we assume that our data reflect adequately business issues (customer
> behavior) "
>
> The question here is what is "adequately", and what is "customer 
> behavior".
> Defining these precisely is very important to developing an accurate, 
> useful
> prediction system. Understanding what is "adequate" is tough. For the
> client, it initially means "better than what I do now." Later, it evolves
> into something like "Error = 5 - 10 %."  For sales prediction, this is an
> impossible standard. So in the end, the client will be unhappy!
>
> Several problems cause this. First, the customers are not homogeneous.
> Different groups behave differently to the same stimuli. And the groupings
> you can develop of similarly behaving customers for one product is not the
> same as for another. I.e., knowing how a customer responds to a Coke
> promotion doesn't necessarily tell you how he/she will respond to a Tide
> promotion. Second, you don't always have the most important data you need.
> Normally for sales, you will have price and volume data for the item of
> interest and competitors (identifying competitors is another problem ...).
> But many important data pieces that have major effects on sales (or stock
> prices, inventory levels, etc.) are not what I call "observable" in the 
> data
> the client can give you. This "unobservable" data can include a major sale
> on the item by the WalMart across the street from a store, a major 
> snowstorm
> that keeps people out of the stores, errors in the shelf price tag,
> stockouts in the distribution chain, local population changes due to
> holidays, etc. While sometimes this "unobservable" data can be gotten, it
> takes a lot of work and is very expensive. Third, even though you may have
> what you think is lots of data (typical retail data sets hold tens of
> billions of transactions), it isn't enough! By the time you develop a 
> model
> you think has all the important variables/features (e.g., price, time of
> day, day of week, day of month, month of year, prices of major competitive
> items in store, etc.), and develop a reasonable number of values for each
> that lead to different behavior, you find you have a very large
> multidimensional matrix, which for many of the elements will have only a 
> few
> (0 - 10) observations. Theoretically, you need 20+ observations per 
> element
> to give you statistically valid results. Fourth, often the data you get is
> "dirty", with e.g. price errors, unidentified replacement products, and so
> on. We have found that anywhere from 30 - 80% of the time required to do 
> an
> analysis/model development task is needed to understand and clean the data
> the client provides.
>
> There are of course other problems, but the ones above tend to be the most
> significant.
>
> 2) "we update (patch) our data-collecting software very often."
>
> I don't understand why this is a problem. Normally, data collection 
> software
> for business (e.g., Point of Sale cash register data) is pretty robust. I
> assume here he means that as new types of data (e.g., new
> variables/features) are discovered or developed and as dirty data is
> cleaned, that the models you develop will change. This should be done. The
> process we use to develop statistical BI models is a) clean the data, b)
> examine it to understand it as much as possible and identify important
> features/variables, c) talk to experts to develop "domain knowledge", d)
> develop with the client desired performance specifications, e) develop and
> test a model, f) figure out why the results are so bad, g) modify
> algorithms, add or subtract data types, h) repeat until results are "good
> enough", money runs out, client gets antsy, etc.
>
> I think that changing your data structures and models is usually an
> important and necessary part of developing a model that will meet your
> client's accuracy requirements.
>
> Nuff said.
>
> Jack Stafurik
>
>>
>> Message: 1
>> Date: Sat, 03 Mar 2007 11:23:20 -0500
>> From: "Phil Henshaw" <[EMAIL PROTECTED]>
>> Subject: Re: [FRIAM] Subtle problem with BI
>> To: "'The Friday Morning Applied Complexity Coffee Group'"
>> <[email protected]>
>> Message-ID: <[EMAIL PROTECTED]>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> I don't quite understand the details, but sounds link a kind of 'ah ha'
>> observation of both natural systems in operation and the self-reference
>> dilemma of theory.   My rule is try to never change the definition of
>> your measures.  It's sort of like maintaining software compatibility.
>> if you arbitrarily change the structure of the data you collect you
>> can't compare old and new system structures they reflect nor how your
>> old and new questions relate to each other.   It's such a huge
>> temptation to change your measures to fit your constantly evolving
>> questions, but basically..., don't do it.  :)
>>
>>
>>
>> Phil Henshaw                       ????.?? ? `?.????
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 680 Ft. Washington Ave
>> NY NY 10040
>> tel: 212-795-4844
>> e-mail: [EMAIL PROTECTED]
>> explorations: www.synapse9.com <http://www.synapse9.com/>
>>
>> -----Original Message-----
>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
>> Behalf Of Mikhail Gorelkin
>> Sent: Tuesday, February 27, 2007 5:06 PM
>> To: FRIAM
>> Subject: [FRIAM] Subtle problem with BI
>>
>>
>>
>> Hello all,
>>
>>
>>
>> It seems there is a subtle problem with BI (data mining, data
>> visualization, etc.). Usually we assume that our data reflect adequately
>> business issues (customer behavior), and in the same time we update
>> (patch) our data-collecting software very often, which reflects the very
>> fact of its (more or less) inadequacy! So, our data also have such
>> inadequacy! but we never try to estimate it 1) to improve our software;
>> 2) to make our business decision more accurate. It looks like both our
>> data-collecting software and BI are linked together forming a business
>> (and cybernetic!) model.
>>
>>
>>
>> Any comments?
>>
>>
>>
>> Mikhail
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> http://redfish.com/pipermail/friam_redfish.com/attachments/20070303/eb14ee4a/attachment-0001.html
>>
>> ------------------------------
>>
>> _______________________________________________
>> Friam mailing list
>> [email protected]
>> http://redfish.com/mailman/listinfo/friam_redfish.com
>>
>>
>> End of Friam Digest, Vol 45, Issue 3
>> ************************************
>>
>>
>> -- 
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.5.446 / Virus Database: 268.18.6/709 - Release Date: 3/3/2007
>> 8:12 AM
>>
>>
>
>
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> lectures, archives, unsubscribe, maps at http://www.friam.org
> 


============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org

Re: [FRIAM] Subtle problem with BI

Reply via email to