After sleeping it over...

> Now set
> all Advanced Weather settime() to 0.0 and retest with METAR. Wow!!!
> Improvement, not as good as Basic Weather , but much better. Worst fps is
> stable at 24, av. is still unstable.  

You had a worst of 27 in your list 
http://dl.dropbox.com/u/57645542/stagger-data.htm
running everything and were unhappy. Now a stable worst of 24 makes you happy? 
It'd be good to test this not with METAR but with high pressure core.

Basically, if you clock all loops at 0.0, as far as Nasal is concerned you make 
the average frame duration come down to the worst frame duration. Which would 
indeed trade framerate for smoothness, except... on my system, this runs into 
the garbage problem, so I get a really bad worst frame delay when I run all 
stuff at all times. 

So the lesson I learned is as long as garbage collection is a problem, avoid 
running stuff on a per-frame basis whenever you can. This looks to be very 
system specific, though.

> Right now, with Advanced Weather we have a weather simulator with a
> FlightSim attached. We're spending 10 (yes 10!) times as long in the Events
> Sub-module with Advanced Weather than in Basic, and 5 times as long as we
> spend in Flight.

> For loops are bad - in C++ as much as in Nasal.  For loops hold up
> the  execution of the main loop, even when they do very little. 

I'm not quite sure you actually appreciate the task being done, so let me 
expand a bit.

First of all, in terms of raw floating point operations, the water shader beats 
flight (and weather) probably by a few orders of magnitude. So what we really 
have is a water reflection simulator with a few percent of the remaining 
performance dedicated to environment and flight. What saves our ass in 
performance lists is just that water reflection can run on the GPU and hence 
doesn't even show up in CPU load comparison - it slows you down nevertheless in 
practice, dramatically so. 

Flight has the task of solving equations of motion for a single object, based 
on a series of coefficients (to be determine by functions or interpolation 
tables). At the core is a set of differential equations, we know how to solve 
them by discretizing, replacing derivatives by differences applying some error 
corrections and cleverness and integrating forward in time. I know the problem 
reasonably well, I could write code doing it if needed.

Advanced Weather has to solve equations of motion for all clouds in the scene 
(on a nice summer afternoon, say we have about 5000 Cumulus clouds, 1000 of 
these have a significant thermal underneath). If the equations of motion would 
be as complex as for an airplane, we'd have a framerate (if we start out having 
60 fps without clouds) of 60/6000 = 0.01. That wouldn't be good.

So we use very simple equations of motion which give plausible behaviour. Say 
these are a factor 100 faster, which gives us a framerate of 1 fps. Still not 
good. This is where the brute-force approach ends.

Next, we realize that not all objects need priority attention. Nearby objects 
like the thermal we're in are more relevant than distant objects (like the 
thermal we're not in). Visible objects like clouds need more attention than 
invisible objects like thermals - it's more acceptable for a thermal to have a 
'jumpy' motion than for a cloud nearby. 

Thus, we do not need to solve every equation of motion per graphical frame. 
It's sufficient to solve them at a slower rate, we can visit every object in 
the scene only once in a while. 

But it shouldn't be too jumpy - a 10 m discontinuous motion can't be felt in a 
glider, but a 100 m jump can (I tested these and compared with my real-life 
experience). Clouds in the scene should best not move discontinuously at all 
(you'd be the first to point this out...). Thermals should not be decorrelated 
from cap clouds over time, rain should not leak out underneath clouds,... that 
limits the way we can distribute tasks. In other words, the weather system 
needs to know where the clouds are even if the clouds are actually moved in the 
scene by the shader. The weather system still needs to correlate thermals and 
do the vertical motion of the clouds in response to terrain obstacles, and for 
that it doesn't just need the information where the cloud is but also what the 
terrain underneath is like.

Say in doing all that cleverly, we can get the framerate from 1 to 30 fps. It's 
clear that assigning priorities will cause some fluctuations in the framerate, 
because the system has to make guesses what is important at the precise moment, 
and sometimes it guesses wrong and the task it decided to do is easier than 
expected.

At the same time, we may also move through the scene. We have supersonic 
aircraft, the Concorde at cruise altitude for instance goes through 3 weather 
tiles every 50 seconds, that gives you a whopping 16 seconds to get to know the 
terrain, build 5000 clouds dependent on what you fly over, display them, and 
remove them again.

If we have loops with one iteration per graphical frame and run at a stable 30 
fps, we get to load and unload 480 clouds in the available timeslot. Not good. 
Not good at all, that's just 1/10 of what we need with no emergency reserve. So 
we would be able to out-fly the weather, clouds would remain behind and create 
an ever-increasin backlog of unremoved clouds, which would pile up in 
memory,... we don't want that.

So, we want to have a system which is robust against 'just being there and 
watching clouds drift' as in, say, flying a balloon where you expect that the 
local conditions don't change because you move with the weather and a system 
which is robust against racing through everything at Mach 3 (or even Mach 6 
with the X-15). At the same time, it shouldn't create discontinuities as you 
slow down an land, so there can't be a separate supersonic and a slow mode.

This is where the internal for-loops come in. Doing one object per frame is way 
too slow, you need to do several. If you are capable of building and removing 
30 clouds per frame, you're Concorde-save. Of course, if you are in a slow 
plane, that number is more than you need - but then, to distinguish at which 
rate you currently need to build clouds to fill all frames equally (which 
depends on your speed, if you're flying turns or not, on the current 
visibility, on the average number of clouds per weather tile in the current 
situation) requires a hideously complex heuristics - which also may guess wrong 
in the end.

In addition, we struggle with problems such as 'We want to pre-build a cloud 
configuration beyond the visibility range, so that it is already there when we 
need a new tile. Unfortunately, that's a 'can't do' - there is no terrain 
loaded beyond visual range, and since we want to build a cloud pattern which is 
consistent with the terrain, we need terrain info before building clouds. That 
gives a relatively narrow window of opportunity in which clouds can be 
generated as we approach tile edges, we can't really do it all the time.

I could go on with that for a while, there's a whole host of other 
complications waiting...

But - please just don't give me 'for loops are bad'. I don't get the impression 
that you understand well why the individual subsystems do their tasks the way 
they do. The reality is a pretty nightmare of scheduling tasks and priorizing 
resources.

I've spent months learning how to schedule tasks so that things are robust and 
still reasonably fast. You look in the end at a cloud-filled sky, think that 
this somehow looks like a somewhat prettier version of what Basic Weather does, 
and observe that it's just a bit slower and jerkier. But that's not what it is 
underneath at all. Basic Weather has no real understanding of the terrain, it 
can't auto-generate a plausible configuration of thermals, it can't move them 
in lockstep with cap clouds, it can't age thermals and decay them when they 
reach a water surface,... Advanced Weather can.

> "Optimising" code is bad. You might make it better for your system,
> but make it worse for everyone else. OSG/OpenGL do a pretty good job all  
> by themselves

Here's my version:

Optimizing code is badly needed. OSG/OpenGL don't do a pretty good job all by 
themselves for sufficiently complex tasks and that's a fact - been there, seen 
it - Advanced Weather without any optimization would drive you to single digit 
framerates no matter what system you run easily. What we're talking here is the 
tip of the iceberg - can we optimize task scheduling even more such that you 
get a smooth 50 fps out. The actual task length to completely unload a weather 
tile is about 4 seconds on my box, the length to build and load it is often ~8 
seconds, and I'd say that's pretty well hidden these days given that these are 
by nature discontinuous operations. 

Your strategy appears to be to just throw resources at the problem and not try 
any optimization (which makes it smooth allright), and in the tasks you have 
had to deal with, you apparently got away with it. It's all nice that you get 
enough framerate out of the water shader and can affort to compute cloud 
coverage in the shader per frame per pixel, but those of us not running high 
end machines would like to run it as well, and I am glad I figured out a way to 
make it 50% faster for me.

> For loops are bad - in C++ as much as in Nasal.  For loops hold up
> the  execution of the main loop, even when they do very little.

In fact, any computation holds up the execution of the main loop, that's 
trivial. Inside the main loop, certain tasks need to be done. That takes time 
(and reduces framerate). The framerate loss is proportional to the time of the 
task inside the main loop, independent on how the task is done technically. I 
can easily come up with a function that stalls the main loop a lot without 
going into a for loop.

Simple counter-example to your claim -  if you want a smooth motion of 10 
objects in the visual field, you can't move one per frame. That makes for jerky 
motion of everything. You need to move all per frame, so you need the for loop, 
because the task is such that it demands it.  On the other hand, in a fuel 
consumption calculation, you might get away by doing one tank per frame instead 
of looping over all tanks, since the precision of a per-frame fuel consumption 
isn't actually needed.

The nature of the task dictates whether you need for loops or not inside the 
main loop. So what needs to be optimized is the time spent per iteration of the 
main loop, given the constraints posed by the task to be done.

Cheers,

* Thorsten
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel

Reply via email to