Re: [HACKYSTAT-DEV-L] DailyProjectUnitTest perofrmance improvement.

Philip Johnson Sat, 08 Apr 2006 12:34:01 -0700

Hi Cedric,

Why we have to put them together? Why not having two types of daily
project objects, one for telemetry style analysis, one for daily project
details style analysis. My 2 cents.

This is an interesting question. We should look into this further, andfigure out whether:

(a) DailyProjectDetails and Telemetry have no way in principle to sharecaching mechanisms, and the code would be clearer as two distinct classes.

(b) The current separation is an artifact of different people working onthis code at different times. A redesign of this class might showopportunities to share code/infrastructure

(c) Our recent performance issues might indicate that we need to revisitour overall approach to caching in order to avoid looping "silos" thatresult in the same raw data being revisited over and over. Such a redesignof our analysis approach might change things such that what is now (a)becomes (b), or what is now (b) becomes (a)!

I have actually been contemplating a rather radical thought: what if thedailyprojectdata objects define "listeners" that are passed a sensor datainstance, and instead of repeatedly looping through the sensor data for aday, the caching infrastructure instead does exactly one pass through thesensor data, calling each defined "listener" on the sensor data in turn?Then, an analysis like dailyprojectsummary would loop through all defineddailyprojectdata objects, create a list of the relevent listeners, and thendo one pass through the sensor data for the day.

This would work well if we had a "fast" way of doing workspace comparison,and if each sensor data instance cached the workspace instance associatedwith its path. I think. :-) There are probably other issues I haven'trealized yet.

As always, I have been wondering if the performance problem has to do withnot having a fast backend relational database for storing the sensor data.It doesn't look like such a change would help us here, since the slowdownseems related to workspace computation and comparison, which is processingthat occurs after the sensor data is retrieved from the repository, and/orprocessing algorithms that are exponential in the number of top-levelworkspaces.

I've also been wondering whether there is something intrinsically wrong inour conceptualization of workspaces or projects---are there alternativeways to organize the data that would achieve the same ends withoutincurring these kinds of problems? So far, I haven't been able to come upwith anything obviously better.

To put this in perspective, we've been increasing the functionality andexpressiveness of the system for around two years now with relativelylittle effort put into performance issues. Currently, we're now dealingwith a single project that generates tens of thousands of sensor datainstances per day, and performing many very different kinds of analyses onthat data stream. It's not unreasonable that we are now discovering thatsome of our "simplistic" implementations are not scaling well. I amhopeful that, just as Hongbing identified and removed a problem inDailyProjectUnitTest in three days of work, we can work together to thinkthrough the broader issues in a relatively short period of time.


Comments?

Cheers,
Philip

Re: [HACKYSTAT-DEV-L] DailyProjectUnitTest perofrmance improvement.

Reply via email to