Hi,
> the TMU2 returns an unexpected results
Looks like the expected result for me.
> I know the reason is that TMU2 uses a set of predefined texture vertexes
> and the destination polygon is a RECTANGLE. Is there is any configurable
> options to inverse the texture vertex and the destination mesh vertex?
No. This uses more FPGA resources and/or is slower, and it's not needed for
MilkDrop.
The older TMU (< 0.3) is able to do what you want, but has other missing
features.
> unfortunately, I can't understand the arithmetic of interpolation in TMU2
> completely, could you give some advice on this? Or there is a description
> of the interpolation in details?
There is indeed some math behind it.
The basic one-dimensional interpolation problem for the TMU is the following:
you have two points A(x0, y0) and B(x1, y1) with integer coordinates and
x1 > x0, and you want to interpolate linearly the parameter y between y0 and
y1 for all integer values x between x0 and x1.
If we are allowed to use rational (non-integer) numbers, this is easy: just
compute the slope s = (y1-y0)/(x1-x0), start from (x, y) = A(x0, y0), and
then, each time you increment x by 1, increment y by s.
This is perfect in the ideal world of mathematics, but in practice there are
two major problems:
- since we are dealing with pixels, the value of y should be quantized, i.e.
rounded to the nearest integer. I really mean *rounded to the nearest
integer*, not floored. Flooring potentially yields to artifacts in the image.
- how do we represent a rational number with bits?
To solve these two issues, TMU/TMU2 use a tricky representation of rational
numbers: it decomposes them as
q = a + b/c with the conditions: b < 2c, c > 0. a and b can be negative.
This has two interesting properties:
- a is the best integer approximation of q. From an hardware point of view,
rounding a rational number represented in this way is for free. It is
immediately available and can be sent to another pipeline stage without any
combinational timing overhead.
- addition is a relatively simple operation if the two denominators c are
equal (which is the case for the interpolation problem) and if q > 0:
=================
q = a + b/c
q' = a' + b'/c
A = a+a'
B = b+b'
if(B > 2c) {
if(q' > 0) A = A+1 else A = A-1
B = B-c
}
Q = a+b = A + B/c
=================
Virtex-4 FPGAs are able to map this algorithm with few levels of logic and the
resulting hardware computes the sum in less than 10ns.
The whole thing is inspired by Bresenham's algorithm.
Now how do we move from this 1D interpolation to a 2D one on the rectangles?
Easy: the TMU2 first interpolates the texture coordinates on the vertical
edges of the rectangles:
A B
| |
| |
v v
C D
(use a fixed font)
Then, in a second stage, it interpolates again this interpolation but on each
horizontal line, effectively interpolating the texture coordinates for each
pixel of the rectangle:
A B
| |
|---->|
| |
C D
The two stages are working in parallel in a pipeline (in a simplified model:
while the second stage interpolates a horizontal line, the first stage is
preparing the interpolated data for the next line). This makes the mapping
unit VERY FAST!
Why did I choose to do the interpolations in this order (vertical then
horizontal) and not the opposite? The answer lies in the way the pixels are
organized in the memory: by putting out pixels in the same order as they are
stored in the memory framebuffer, we can spew them out in SDRAM bursts,
maximizing the off-chip memory performance without having to store a lot of
pixels on-chip.
The TMU2 makes thing slightly more complicated as it does not use integer
texture coordinates, but fixed point ones with 6 bits of fractional part. This
enables it to perform good looking things like bilinear filtering and subpixel
displacements. But the general principle is the same.
Sébastien
_______________________________________________
http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org
IRC: #milkym...@freenode
Webchat: www.milkymist.org/irc.html
Wiki: www.milkymist.org/wiki