Hi Eike,
Eike Rathke wrote:
Please note that before we can integrate code or data contributed we
need a signed Joint Copyright Assignment form (JCA) filled-out, see
http://contributing.openoffice.org/programming.html#jca
I sent today the completed JCA to [EMAIL PROTECTED] (I can also
append a copy of the JCA to this e-mail address if it is needed).
Furthermore, this isn't Fortran ;-) and it
would be much more eye-friendly if you fixed your CapsLock key and
refrained from using only capitalized letters in comments. Please use
normal capitalization instead. Thanks.
Btw, code is much more readable if you align the trailing comments (and
use proper capitalization, of course ;-)
Well, I do not know how the development environment of professional
programmers looks like. I am currently using (and wrote this code
inside) the free jEdit (http://sourceforge.net/projects/jedit/).
What I noted however is, that Calc lacks comments almost completely. ;-)
It is quite difficult to work out what all the code does without having
guiding comments.
My comments look actually decent in jEdit. I like them best as they are.
I definitely do *NOT* like lower case comments, because it gets me
confused: is it code or is it still comment. Obviously, you can NOT have
all the code written uppercase, so to distinguish comments from code I
write them in uppercase. And I feel that this way the code is
substantially more readable. (You only need to read comments once to
understand what's going on, but you need to have the full view of the
code almost continuously.)
// WE GET EITHER A SINGLE MATRIX WHERE EVERY COLUMN IS A SEPARATE VARIABLE
// DISADVANTAGE: ONLY ONE COLUMN PER VARIABLE
// OR MULTIPLE MATRICES, EACH MATRIX IS ONE VARIABLE
// DISADVANTAGE:
// CALC FUNCTIONS ACCEPT ONLY 30 PARAMS
// SO THERE ARE AT MOST 30 VARIABLES
Not quite true. The UI in the Formula AutoPilot knows only 30 parameters
at maximum, the compiler and interpreter actually can handle more. ... But
designing the parameters is more a question of how
other spreadsheet applications do it. We should follow that.
*gnumeric* : has 2 modes (actually 3: columns swapped for rows, too)
1.) every column is one variable - my case 2: the one range scenario
2.) every area is one variable: my case 1: multiple selection ranges
So gnumeric has it both ways.
*R*
R is complex. It is NOT graphically-oriented. Basically, you have only
*ONE data vector* for such a simple ANOVA, BUT it contains *ALL* the
values for *ALL* variables (which is counterintuitive for a novice).
A 2nd vector of the same length as the first vector, matches those
values to the variable they belong., e.g.
vector1 = (val1, val2, val3, val4, val5, ..., val100)
vector2 = (1, 1, 1, 1, 2, 2, ..., 10)
where val1-4 are data points for the first variable, val5-... are data
points for variable 2, val...-100 are data points for variable 10.
Vector2 MUST also be a factor (and NOT a numeric vector). When you
perform the ANOVA, you do a linear fit of vector1 on vector2:
"aov(vector1 ~ vector2)". Quite complex for beginners.
Conclusion
=======
So, I feel that both methods MUST be present:
1. I find it usually more simple to have one range, so this option
should exist definitely
2. However, often the variables are scattered on the sheet, so you do
NOT have a single selection, but multiple areas (gnumeric solved this
nice), AND therefore we need something that accepts and interprets
discontinuous selections.
For passing area references or arrays/matrices you may also want to take
a look ...
I actually saw some code for other functions like
'ScInterpreter::ScPearson()' and imagined what all those functions do
and so implemented a way to get the data based on that template.
Unfortunately, I have NOT found a clear description what every function
does. Maybe there were better ways to implement it.
What I really need is either *one range* (for 2nd case) or *an array of
ranges* (for 1st case).
SCSIZE nR[iVarNr], nC[iVarNr];
This actually doesn't work. Automatically allocating variable-length
arrays on the stack is a GCC extension and doesn't work with all
compilers. Instead it should be ... = new xxx[];
Whenever I wrote some programs, I preferred to use vectors (or list
objects). NO need to manually destroy objects, NO ugly pointer
arithmetic, NO memory allocation issues. All done cleanly. And you can
increase/decrease size dynamically and do NOT have to keep track of the
changes.
So, I virtually never wrote code using the 'new' operator. I hope that
somebody experienced adjusts the code accordingly.
double fValX[iVarNr] [nMAX]; // THE VALUES
I also don't see why we need fValX[iVarNr][nMAX] elements, where maybe
most of it will be unused if only one matrix has nMAX elements. I think
this can be much improved by *storing just the needed elements*.
Theoretically, YES. With vectors, no problem, increase size dynamically.
With simple arrays, NO idea how to do that. We will know how many values
have to be stored only after iterating through the matrix elements. I
have a better solution, BUT:
*How great is the cost of iterating two times through the range* vs
storing the data values during the first iteration? I have NO idea, but
I presumed initially, that this will be very costly. I might have been
wrong.
This alternative would be: (in pseudocode)
- iterate through all ranges (these represent different variables)
-- iterate through elements of one range (this belong all to one
variable)
// NEEDED to detect IF it is TRUE element
// ALSO permits calculating
--- calculate sum of elements;
--- determine number of data values;
-- END INNER LOOP
-- mean[ith variable] = SUM / No of elements;
-- No elements[ith variable] = No of elements;
-- GrandMEAN += SUM;
-- TotalNoOfElements += No of elements;
- END OUTER LOOP
// FOLLOWS 2nd Iteration
- iterate through all ranges (these represent different variables)
-- iterate through elements of one range (this belong all to one
variable)
// NEEDED AGAIN to detect IF it is TRUE element
--- calculate sum of residuals[i] += (Xi - mean[i]) * (Xi - mean[i]);
-- END INNER LOOP
-- fMSB += No elements[i] * (mean[i] - GrandMEAN) * (mean[i] - GrandMEAN)
- END OUTER LOOP
This code is even simpler and consumes far less memory, BUT as I sad, I
have NO idea how much more slower it would be. [I believe it is slower.]
If you have an idea, how this code would fare, please tell me, I'm
interested, too.
I'll reply to Niklas tomorrow, it is quite late now.
Sincerely,
Leonard Mada
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]