Hi Eike,

Eike Rathke wrote:
[...]
As I mentioned in the previous post: raising an error is one option (the best), but it depends on some pre-requisites:
- ability to easily find string-as-number cells
- easy conversion to number
 [I know that Cor's extension does exist,
  BUT Calc should do it natively]

No. Why should it? If an extension can do all work there is no need to
reimplement it in the core again.

My notion of *doing it natively* covers: everything that is bundled with the original application. If it is an extension, fine, but is should be installed *by default* with Calc. This is the requirement that must be met before setting the default to raise an error. So every user that witnesses the error can immediately correct the error.

Now, there is also an issue with the current Novell-implementation - as I read from IssueZila. Empty cells do raise errors, too. This should definitely be avoided. An empty cell is just empty, it is not a string and not a number. It should be skipped by default.

I am going back to R (and the S+ language). R has a special data type called NA (Not Available). Functions in R are also able to interpret differently this, and the user has control over the interpretation of NA. To quote from the R-help:
"na.action: is a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful."

Therefore, one can instruct R to ignore NAs, to remove NAs, or to fail if a NA is encountered. This option can be passed individually to every function.

This would be a good way to handle missing values in Calc, too, but most importantly, this concept can be extended to string-to-number interpretation/ conversion:
- the default should be *string.fail*
- while other sensible options might be
*string.omit* (Calc does it currently) and
*string.convert* (Excels' way, but more consistently than in Excel).
The user can then explicitly specify how any function/cell will behave. This is portable, this is reliable. This is reproducible, as every cell has it individually specified (i.e. if the user needs some non-default option, he is free to pass this option in the formula he uses without affecting any other part of the spreadsheet).

Of course, this is not quite the way spreadsheet formulas get interpreted today, so I still have to think how to pass these options at formula/cell level. But it is the far superior way to handle the issue.

The reason why interpreting strings as numeric values on the fly is
harmful and we do not want to implement it.
BUT you do it. You interpret it *always* as 0. This is the worst possible scenario.

It is at least consistent for =A1+A2 and =SUM(A1:A2).
  Eike

It might be consistent, but not accurate and very wrong what the outcome is. Excel can be said is only half the time wrong, while Calc is always. I hate Excel's handling And I hope Calc will do it better in the future. To take an example from numerical analysis (as we had a related thread recently ;) ), one may compute something with an incredible precision, over 100 bits, but IF all bits are wrong, the accuracy is still a meager zero.

Now some remarks about causes of string-numbers (was mentioned in the other e-mail):

1. it is true that this happens when pasting from the clipboard in the wrong locale (again a problem of the spreadsheet application). As I said, I worked extensively in a multi-locale environment and the only sensible way to deal with this is that everyone uses the en-locale. This is usually not implementable and will remain a pain.

BUT: there are not many decimal separators (most probably only "." and "," will be encountered reliable often in practice), so sensing a strange paste should trigger a notification that pasted value is a string (non-obtrusively like a small icon in the cell). I still do not have an optimal concept.

2. most instances of numbers-to-strings that I encountered (by a very large margin) were user-typed strings. The user knowingly entered the string because of a formatting error (most commonly x.y as date, but sometimes something else). Fixing the date would probably remove a lot of these instances.

3. csv-input: this was often problematic and could result in text instead of numbers. I hope that modern spreadsheets behave better.

3.') exporting a spreadsheet with a proprietary software: this was frequent in the distant past (csv was more frequent), but shouldn't pose any major problem in the age of ODF; csv's might still be used, so fixing any problems with csv should be pursued.

I hope that Calc becomes the best spreadsheet program around. However, the competition is tough. Excel 2007, Mac's iNumbers and various dedicated programs are eager to step in.


Sincerely,

Leonard

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to