Hi Eike,

Eike Rathke wrote:
Please note that before we can integrate code or data contributed we need a signed Joint Copyright Assignment form (JCA) filled-out, see http://contributing.openoffice.org/programming.html#jca

I sent today the completed JCA to [EMAIL PROTECTED] (I can also append a copy of the JCA to this e-mail address if it is needed).

Furthermore, this isn't Fortran ;-) and it
would be much more eye-friendly if you fixed your CapsLock key and
refrained from using only capitalized letters in comments. Please use
normal capitalization instead. Thanks.
Btw, code is much more readable if you align the trailing comments (and
use proper capitalization, of course ;-)

Well, I do not know how the development environment of professional programmers looks like. I am currently using (and wrote this code inside) the free jEdit (http://sourceforge.net/projects/jedit/).

What I noted however is, that Calc lacks comments almost completely. ;-) It is quite difficult to work out what all the code does without having guiding comments.

My comments look actually decent in jEdit. I like them best as they are. I definitely do *NOT* like lower case comments, because it gets me confused: is it code or is it still comment. Obviously, you can NOT have all the code written uppercase, so to distinguish comments from code I write them in uppercase. And I feel that this way the code is substantially more readable. (You only need to read comments once to understand what's going on, but you need to have the full view of the code almost continuously.)

// WE GET EITHER A SINGLE MATRIX WHERE EVERY COLUMN IS A SEPARATE VARIABLE
//    DISADVANTAGE: ONLY ONE COLUMN PER VARIABLE
// OR MULTIPLE MATRICES, EACH MATRIX IS ONE VARIABLE
//    DISADVANTAGE:
//      CALC FUNCTIONS ACCEPT ONLY 30 PARAMS
//      SO THERE ARE AT MOST 30 VARIABLES

Not quite true. The UI in the Formula AutoPilot knows only 30 parameters
at maximum, the compiler and interpreter actually can handle more. ... But 
designing the parameters is more a question of how
other spreadsheet applications do it. We should follow that.

*gnumeric* : has 2 modes (actually 3: columns swapped for rows, too)
 1.) every column is one variable - my case 2: the one range scenario
 2.) every area is one variable: my case 1: multiple selection ranges

So gnumeric has it both ways.


*R*
R is complex. It is NOT graphically-oriented. Basically, you have only *ONE data vector* for such a simple ANOVA, BUT it contains *ALL* the values for *ALL* variables (which is counterintuitive for a novice).

A 2nd vector of the same length as the first vector, matches those values to the variable they belong., e.g.
vector1 = (val1, val2, val3, val4, val5, ..., val100)
vector2 = (1, 1, 1, 1, 2, 2, ..., 10)
where val1-4 are data points for the first variable, val5-... are data points for variable 2, val...-100 are data points for variable 10.

Vector2 MUST also be a factor (and NOT a numeric vector). When you perform the ANOVA, you do a linear fit of vector1 on vector2: "aov(vector1 ~ vector2)". Quite complex for beginners.

Conclusion
=======
So, I feel that both methods MUST be present:
1. I find it usually more simple to have one range, so this option should exist definitely 2. However, often the variables are scattered on the sheet, so you do NOT have a single selection, but multiple areas (gnumeric solved this nice), AND therefore we need something that accepts and interprets discontinuous selections.

For passing area references or arrays/matrices you may also want to take
a look ...

I actually saw some code for other functions like 'ScInterpreter::ScPearson()' and imagined what all those functions do and so implemented a way to get the data based on that template. Unfortunately, I have NOT found a clear description what every function does. Maybe there were better ways to implement it.

What I really need is either *one range* (for 2nd case) or *an array of ranges* (for 1st case).

        SCSIZE nR[iVarNr], nC[iVarNr];

This actually doesn't work. Automatically allocating variable-length
arrays on the stack is a GCC extension and doesn't work with all
compilers. Instead it should be ... = new xxx[];

Whenever I wrote some programs, I preferred to use vectors (or list objects). NO need to manually destroy objects, NO ugly pointer arithmetic, NO memory allocation issues. All done cleanly. And you can increase/decrease size dynamically and do NOT have to keep track of the changes.

So, I virtually never wrote code using the 'new' operator. I hope that somebody experienced adjusts the code accordingly.

        double fValX[iVarNr] [nMAX]; // THE VALUES

I also don't see why we need  fValX[iVarNr][nMAX]  elements, where maybe
most of it will be unused if only one matrix has nMAX elements. I think
this can be much improved by *storing just the needed elements*.

Theoretically, YES. With vectors, no problem, increase size dynamically. With simple arrays, NO idea how to do that. We will know how many values have to be stored only after iterating through the matrix elements. I have a better solution, BUT: *How great is the cost of iterating two times through the range* vs storing the data values during the first iteration? I have NO idea, but I presumed initially, that this will be very costly. I might have been wrong.

This alternative would be: (in pseudocode)
- iterate through all ranges (these represent different variables)
-- iterate through elements of one range (this belong all to one variable)
      // NEEDED to detect IF it is TRUE element
      // ALSO permits calculating
      --- calculate sum of elements;
      --- determine number of data values;
 -- END INNER LOOP
 -- mean[ith variable] = SUM / No of elements;
 -- No elements[ith variable] = No of elements;
 -- GrandMEAN += SUM;
 -- TotalNoOfElements += No of elements;
- END OUTER LOOP

// FOLLOWS 2nd Iteration
- iterate through all ranges (these represent different variables)
-- iterate through elements of one range (this belong all to one variable)
      // NEEDED AGAIN to detect IF it is TRUE element
      --- calculate sum of residuals[i] += (Xi - mean[i]) * (Xi - mean[i]);
 -- END INNER LOOP
 -- fMSB += No elements[i] * (mean[i] - GrandMEAN) * (mean[i] - GrandMEAN)
- END OUTER LOOP

This code is even simpler and consumes far less memory, BUT as I sad, I have NO idea how much more slower it would be. [I believe it is slower.]

If you have an idea, how this code would fare, please tell me, I'm interested, too.

I'll reply to Niklas tomorrow, it is quite late now.

Sincerely,

Leonard Mada

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to