Niklas Nebel wrote:
It fixes the issues with missing line breaks, but it does create new
issues with unwanted/unhandled breaks. Some problem areas:
- Conversion to unformatted text, especially for clipboard or DDE links
- Other line-based formats (DIF, SYLK)
- Conversion of formulas to text cells (Paste Special, unselect Formulas)
- Text content in the file format (<text:p> versus <text:line-break>)
The only way to be sure is to go through all these GetString calls and
see if the line feed character is handled correctly.
I've had a good look at many of these and have posted a new patch fixing
various multiline problems. These have been a mixture of non-formula
cell problems that have always been there as well as further
fixes to multiline formula cells. These are in the SYLK, DIF,
HTML, unformatted text and Quattro Pro filters, details in the patch
http://www.openoffice.org/issues/show_bug.cgi?id=35913. It includes some
subtle changes which I hope are okay as they bring these non-Calc
formats in line with other spreadsheet programs, I've been looking
closely at Excel, Gnumeric and Quattro Pro. Excel is definitely the most
polished and I've mostly based compatibility on this program. Of
particular note is the unformatted text and SYLK quoting convention
change.
The biggest area for change though is DDE links and I need some help
here before implementing them. Firstly, tabs within a cell are broken in
the current versions of Calc and the problems are closely related to
newline characters within cells. Excel deals with both tabs and newlines
in cells and as this is a working solution, I'd like to know if there is
anyone who can provide some information as to how it works. Somehow it
is doing the impossible, here is why...
If a cell contains either a newline (\n) or a tab (\t), it escapes the
entire contents with an opening and a closing quote ("). If a cell is
quoted like this and it contains a quote character in the contents, then
the quote is escaped by double quoting, ie " is replaced by "". Note
that within cells, a newline is represented by \n, not \r\n, even though
this is Windows. The end of a line, however, is designated by \r\n and
cells are separated by \t. My latest patch has replicated this protocol
when copying text. With this info in mind, consider two adjacent cells
both containing three single quotes and another cell containing a tab
within quotes, so visually where | indicates the division between cells,
the contents are:
1) """|"""
2) "\t"
When copying or dde linking using unformatted text, we get the following
for both:
"""\t"""\r\n
So it is impossible to distinguish these two sets of contents. However,
Excel always distinguishing them correctly when dde linking, not pasting
though. Initially I was wondering if it uses the 'item' information that
comes alongside the dde data, eg "R1C1:R1C2" to help determine the
number of rows/columns. However, this is not possible as it also deals
with this case of having a fixed number of cells, as in this 3 cell case:
1) "\t"|"""|"""
2) """|"""|"\t"
both of which result in the following dde data:
"""\t"""\t"""\t"""\r\n
When simply copying into Excel, it does not always get it right, which I
would expect. Also dde linking unformatted text from Word gives Excel
problems, so the question is how does it solve it for dde linking, which
contains the same textual data? I have a hunch it uses dde links using
the SYLK format instead as when debugging paste linking unformatted text
from Excel into Calc, a SYLK request arrives in addition to unformatted
text. In my patch, I've fixed SYLK quoting, however, Calc's version of
SYLK still does not match the standard approach used by Excel and I
presume the original Multiplan, so I *think* the SYLK format is
incorrect, so when dde linking to Excel from OOo, Excel doesn't get it
right, but Excel to Excel it does.
I've arrived at a juncture. Firstly, does anyone have a good insight
into all this? Secondly, assuming the dde links are done using SYLK, is
it okay to change this in OOo to match?
Finally, how does this relate to adding in the newline support? Well,
Calc uses \n as the line terminator on Unix and \r\n on Windows for
unformatted text copy/paste and linking. If \n exists within cells and
is escaped with quotes as on Windows, then the same problem arises as I
showed above with tabs in not being able to determine if \n is the end
of the line or a new line within a cell. That would mean for dde
linking, \r\n would need to be used on Unix (\n is used at the moment),
but this may not be such a surprise given that dde linking is a Windows
protocol. I'm hoping the solution is to use SYLK for dde linking, but
then the OOo SYLK format would need tweaking.
A couple more related queries...
- Does anyone know of any other unix DDE clients, in particular
spreadsheets? If not the impact of changing the end of line terminator
from \n to \r\n won't be so big.
- Calc implies that it supports DDE links as a server using SYLK, DIF,
HTML, etc in addition to plain text but in actual fact they all end up
calling the text format. This can be observed by debugging impex.cpp
when doing a copy from Calc and paste link in something like Excel. Is
this a known quirk? There seem to be other DDE problems eg some of the
paste link graphic formats into Word give errors.
- It has been getting progressively harder to get a tab character into a
cell, from 2.4 to DEV300_m16 to OOO300_m7. Is this an accident or are
there some bug fixes related to this behaviour? I can't see any pattern,
eg aa\tbb\n will keep the tab in OOO300_m7, but aa\tbb will not and
aa\tbb will keep the tab in 2.4, but aa\tb will not.
The patch is getting rather big with numerous knock on fixes. Is this
the sort of point in time that a child workspace should be created for
it to ease development?
Apologies for long email, but it is all a great big can of worms!!
William
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]