The situation is just opposite :-) Now we develop simple, static (and well (without contradictions) and fully defined from the beginning) let's say Newtonian-type of software systems. I mean mainstream software development; particularly, IT systems. But to better ( without over-simplifications and lots of assumptions :-) ) reflect / simulate the nature of modern business (dynamic, chaotic in some aspects, and constantly evolving) we need to learn to build soft-systems with the same characteristics. (I am afraid that today we just offer to business some technological models without estimation of their adequacy for particular cases. How about creating a theory-like business analysis? :-) ) It's another models. Another level! Dynamic, self-adaptive ( without any refactoring :-) ), self-evolving, and complex software. For example, the global IT system as a whole for a big and globally distributed company ( including all transactions :-) ). As we grow, our perception of the world, like a data-collecting system, is changing constantly and... we cope with this fact! and we make better and better decisions. So, each evolving system...
--Mikhail ----- Original Message ----- From: "Jack Stafurik" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Saturday, March 03, 2007 3:58 PM Subject: Re: [FRIAM] Subtle problem with BI > These are issues I (and many others) have grappled with for many years. I > have strong opinions that deftly straddle both sides. So - I can't be > wrong! > To address the points Mikhail raised, I'll use the context of using data > to > predict sales. > > 1) "we assume that our data reflect adequately business issues (customer > behavior) " > > The question here is what is "adequately", and what is "customer > behavior". > Defining these precisely is very important to developing an accurate, > useful > prediction system. Understanding what is "adequate" is tough. For the > client, it initially means "better than what I do now." Later, it evolves > into something like "Error = 5 - 10 %." For sales prediction, this is an > impossible standard. So in the end, the client will be unhappy! > > Several problems cause this. First, the customers are not homogeneous. > Different groups behave differently to the same stimuli. And the groupings > you can develop of similarly behaving customers for one product is not the > same as for another. I.e., knowing how a customer responds to a Coke > promotion doesn't necessarily tell you how he/she will respond to a Tide > promotion. Second, you don't always have the most important data you need. > Normally for sales, you will have price and volume data for the item of > interest and competitors (identifying competitors is another problem ...). > But many important data pieces that have major effects on sales (or stock > prices, inventory levels, etc.) are not what I call "observable" in the > data > the client can give you. This "unobservable" data can include a major sale > on the item by the WalMart across the street from a store, a major > snowstorm > that keeps people out of the stores, errors in the shelf price tag, > stockouts in the distribution chain, local population changes due to > holidays, etc. While sometimes this "unobservable" data can be gotten, it > takes a lot of work and is very expensive. Third, even though you may have > what you think is lots of data (typical retail data sets hold tens of > billions of transactions), it isn't enough! By the time you develop a > model > you think has all the important variables/features (e.g., price, time of > day, day of week, day of month, month of year, prices of major competitive > items in store, etc.), and develop a reasonable number of values for each > that lead to different behavior, you find you have a very large > multidimensional matrix, which for many of the elements will have only a > few > (0 - 10) observations. Theoretically, you need 20+ observations per > element > to give you statistically valid results. Fourth, often the data you get is > "dirty", with e.g. price errors, unidentified replacement products, and so > on. We have found that anywhere from 30 - 80% of the time required to do > an > analysis/model development task is needed to understand and clean the data > the client provides. > > There are of course other problems, but the ones above tend to be the most > significant. > > 2) "we update (patch) our data-collecting software very often." > > I don't understand why this is a problem. Normally, data collection > software > for business (e.g., Point of Sale cash register data) is pretty robust. I > assume here he means that as new types of data (e.g., new > variables/features) are discovered or developed and as dirty data is > cleaned, that the models you develop will change. This should be done. The > process we use to develop statistical BI models is a) clean the data, b) > examine it to understand it as much as possible and identify important > features/variables, c) talk to experts to develop "domain knowledge", d) > develop with the client desired performance specifications, e) develop and > test a model, f) figure out why the results are so bad, g) modify > algorithms, add or subtract data types, h) repeat until results are "good > enough", money runs out, client gets antsy, etc. > > I think that changing your data structures and models is usually an > important and necessary part of developing a model that will meet your > client's accuracy requirements. > > Nuff said. > > Jack Stafurik > >> >> Message: 1 >> Date: Sat, 03 Mar 2007 11:23:20 -0500 >> From: "Phil Henshaw" <[EMAIL PROTECTED]> >> Subject: Re: [FRIAM] Subtle problem with BI >> To: "'The Friday Morning Applied Complexity Coffee Group'" >> <[email protected]> >> Message-ID: <[EMAIL PROTECTED]> >> Content-Type: text/plain; charset="iso-8859-1" >> >> I don't quite understand the details, but sounds link a kind of 'ah ha' >> observation of both natural systems in operation and the self-reference >> dilemma of theory. My rule is try to never change the definition of >> your measures. It's sort of like maintaining software compatibility. >> if you arbitrarily change the structure of the data you collect you >> can't compare old and new system structures they reflect nor how your >> old and new questions relate to each other. It's such a huge >> temptation to change your measures to fit your constantly evolving >> questions, but basically..., don't do it. :) >> >> >> >> Phil Henshaw ????.?? ? `?.???? >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> 680 Ft. Washington Ave >> NY NY 10040 >> tel: 212-795-4844 >> e-mail: [EMAIL PROTECTED] >> explorations: www.synapse9.com <http://www.synapse9.com/> >> >> -----Original Message----- >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On >> Behalf Of Mikhail Gorelkin >> Sent: Tuesday, February 27, 2007 5:06 PM >> To: FRIAM >> Subject: [FRIAM] Subtle problem with BI >> >> >> >> Hello all, >> >> >> >> It seems there is a subtle problem with BI (data mining, data >> visualization, etc.). Usually we assume that our data reflect adequately >> business issues (customer behavior), and in the same time we update >> (patch) our data-collecting software very often, which reflects the very >> fact of its (more or less) inadequacy! So, our data also have such >> inadequacy! but we never try to estimate it 1) to improve our software; >> 2) to make our business decision more accurate. It looks like both our >> data-collecting software and BI are linked together forming a business >> (and cybernetic!) model. >> >> >> >> Any comments? >> >> >> >> Mikhail >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> http://redfish.com/pipermail/friam_redfish.com/attachments/20070303/eb14ee4a/attachment-0001.html >> >> ------------------------------ >> >> _______________________________________________ >> Friam mailing list >> [email protected] >> http://redfish.com/mailman/listinfo/friam_redfish.com >> >> >> End of Friam Digest, Vol 45, Issue 3 >> ************************************ >> >> >> -- >> No virus found in this incoming message. >> Checked by AVG Free Edition. >> Version: 7.5.446 / Virus Database: 268.18.6/709 - Release Date: 3/3/2007 >> 8:12 AM >> >> > > > ============================================================ > FRIAM Applied Complexity Group listserv > Meets Fridays 9a-11:30 at cafe at St. John's College > lectures, archives, unsubscribe, maps at http://www.friam.org > ============================================================ FRIAM Applied Complexity Group listserv Meets Fridays 9a-11:30 at cafe at St. John's College lectures, archives, unsubscribe, maps at http://www.friam.org
