Re: [HACKYSTAT-DEV-L] DailyProjectUnitTest perofrmance improvement.

(Cedric) Qin ZHANG Sat, 08 Apr 2006 19:33:49 -0700

I don't quite follow the "listener" approach. Perhaps spending some timetaking about it in the meeting would be a good idea. But I am convincedthat merely using a DB without major architectural change won't solveour problem. -Cedric



Philip Johnson wrote:

Hi Cedric,
Why we have to put them together? Why not having two types of daily
project objects, one for telemetry style analysis, one for daily project
details style analysis. My 2 cents.
This is an interesting question. We should look into this further,and figure out whether:
(a) DailyProjectDetails and Telemetry have no way in principle toshare caching mechanisms, and the code would be clearer as twodistinct classes.
(b) The current separation is an artifact of different people workingon this code at different times. A redesign of this class might showopportunities to share code/infrastructure
(c) Our recent performance issues might indicate that we need torevisit our overall approach to caching in order to avoid looping"silos" that result in the same raw data being revisited over andover. Such a redesign of our analysis approach might change thingssuch that what is now (a) becomes (b), or what is now (b) becomes (a)!
I have actually been contemplating a rather radical thought: what ifthe dailyprojectdata objects define "listeners" that are passed asensor data instance, and instead of repeatedly looping through thesensor data for a day, the caching infrastructure instead does exactlyone pass through the sensor data, calling each defined "listener" onthe sensor data in turn? Then, an analysis like dailyprojectsummarywould loop through all defined dailyprojectdata objects, create a listof the relevent listeners, and then do one pass through the sensordata for the day.
This would work well if we had a "fast" way of doing workspacecomparison, and if each sensor data instance cached the workspaceinstance associated with its path. I think. :-) There are probablyother issues I haven't realized yet.
As always, I have been wondering if the performance problem has to dowith not having a fast backend relational database for storing thesensor data. It doesn't look like such a change would help us here,since the slowdown seems related to workspace computation andcomparison, which is processing that occurs after the sensor data isretrieved from the repository, and/or processing algorithms that areexponential in the number of top-level workspaces.
I've also been wondering whether there is something intrinsicallywrong in our conceptualization of workspaces or projects---are therealternative ways to organize the data that would achieve the same endswithout incurring these kinds of problems? So far, I haven't been ableto come up with anything obviously better.
To put this in perspective, we've been increasing the functionalityand expressiveness of the system for around two years now withrelatively little effort put into performance issues. Currently,we're now dealing with a single project that generates tens ofthousands of sensor data instances per day, and performing many verydifferent kinds of analyses on that data stream. It's notunreasonable that we are now discovering that some of our "simplistic"implementations are not scaling well. I am hopeful that, just asHongbing identified and removed a problem in DailyProjectUnitTest inthree days of work, we can work together to think through the broaderissues in a relatively short period of time.
Comments?

Cheers,
Philip

Re: [HACKYSTAT-DEV-L] DailyProjectUnitTest perofrmance improvement.

Reply via email to