[sc-issues] [Issue 5658] Spreadsheet thinks a number is a string (text)

discoleo Sun, 27 Apr 2008 13:49:59 -0700

To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=5658



User discoleo changed the following:

                What    |Old value                 |New value
================================================================================
                      CC|'cornouws,daxkelson,er,fl,|'cornouws,daxkelson,discol
                        |ft,huw,jeongkyu,jrtapper,k|eo,er,fl,ft,huw,jeongkyu,j
                        |ohei,mloiseleur,mmeeks,pmi|rtapper,kohei,mloiseleur,m
                        |ke,thing'                 |meeks,pmike,thing'
--------------------------------------------------------------------------------




------- Additional comments from [EMAIL PROTECTED] Sun Apr 27 20:49:52 +0000 
2008 -------
Somehow, I do like Excel's handling better. Though it is NOT perfect either.

As a spreadsheet user, you can't always control what data is written in the
spreadsheet, as someone else might have written numbers as text. BUT having NO
easy way to see IF the numbers are text and the calculations are flawed is
surely the worst alternative.

The argument that numbers are right-aligned and text is left aligned is - well -
one of the worst arguments I heard. I have yet to come across a spreadsheet,
where text and number-alignments are not modified.

Handling localised data is a problem, but this is independent of the
"text-handling" (and is also a problem with VALUE() ). This is why I always
prefer to use the English-decimal separator.

While I am the guy with the strong typing in spreadsheets,
[http://www.openoffice.org/issues/show_bug.cgi?id=79924],
OOo does NOT enforce strong typing either, but chooses the worst possible 
outcome.

Added to this, the user does NOT have an easy way to convert text-numbers into
numerical numbers.

Also, this distinction between text and numbers is rather artificial. If it were
specific units or currency, than distinction between a scalar and some other
type makes sense, but it has little relevance otherwise and I believe that this
difference is overplayed in spreadsheets.

There are only 2 ways to somehow correctly handle this issue. Both should be
implemented:

A.) interpret text as numbers
  - this should apply for every formula / function
  - offer option per spreadsheet to store this state
  - also, option per spreadsheet to set the decimal separator

B.) raise error when numerical operation involves strings
  - this option MUST include a method to easily convert
    string numbers into numerical values

The correct handling of this issue will implement both methods. Also, the option
how to handle a particular spreadsheet should be stored within the spreadsheet.

Of course, for compatibility reasons, it might be feasible to have a 3rd option
as well: ignore text. This is of course a little bit more difficult to define. I
would presume that ignoring text will handle text-cells as empty cells and will
skip them and NOT handle them as numerical zero cells. This is probably not the
case in the existing implementation.

Also, in this latter case, the user MUST be notified that cells were skipped. I
am not sure how this notification should look like. Some icon displayed within
the specific cell seems suitable. Propagating this "notification" to downstream
calculations seems a wise idea, too. Righlt-clicking on this icon should display
the number of "text-cells" that were skipped. And there should be a method to
review those cells and convert to numbers.

When thinking more thoroughly, I become more and more convinced, that the whole
data model and error handling mechanism of Calc needs to be redesigned from
bottom up.

A more advanced alternative is based on data-handling by R:
[http://cran.R-project.org]
 - e.g. IF a value is missing, one can pass additional arguments
   to R-functions, how to handle those missing values, e.g.:

   "na.action: is a function which indicates what should happen when
    the data contain NAs. The default is set by the na.action setting
    of options, and is na.fail if that is unset. The ‘factory-fresh’ default
    is na.omit. Another possible value is NULL, no action. Value na.exclude
    can be useful."

I believe this is a far more transparent alternative. And it would cover many
other areas, not just the text-issue. [By the way, spreadsheets have a very poor
handling of missing values, and this would be a good alternative, too.]

Setting options at spreadsheet level how to handle text and missing values seems
sensible:
is.text = FAIL
is.NA = OMIT
[options: FAIL, OMIT, CONVERT for text -> number]
[alternatives to is.xxx functions/options could be: text.action and NA.action]

---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[sc-issues] [Issue 5658] Spreadsheet thinks a number is a string (text)

Reply via email to