Hi William,

On Saturday, 2008-11-08 23:48:43 +0000, William S Fulton wrote:

> I've had a good look at many of these and have posted a new patch fixing  
> various multiline problems.

This is great! I think that patch does it, other opinions?

> It includes some
> subtle changes which I hope are okay as they bring these non-Calc
> formats in line with other spreadsheet programs, I've been looking
> closely at Excel, Gnumeric and Quattro Pro. Excel is definitely the most
> polished and I've mostly based compatibility on this program. Of
> particular note is the unformatted text and SYLK quoting convention
> change.

Bringing SYLK closer to what Excel reads/writes is always good. I'm not
sure about the unformatted text, is the quoting convention now what
Excel writes to the clipboard if requested?


> The biggest area for change though is DDE links and I need some help  
> here before implementing them. Firstly, tabs within a cell are broken in  
> the current versions of Calc and the problems are closely related to  
> newline characters within cells. Excel deals with both tabs and newlines  
> in cells and as this is a working solution, I'd like to know if there is  
> anyone who can provide some information as to how it works. Somehow it  
> is doing the impossible, here is why...
>
> If a cell contains either a newline (\n) or a tab (\t), it escapes the  
> entire contents with an opening and a closing quote ("). If a cell is  
> quoted like this and it contains a quote character in the contents, then  
> the quote is escaped by double quoting, ie " is replaced by "".

So far the CSV conventions as if tab instead of comma was used as field
separator (TSV).

> Note  
> that within cells, a newline is represented by \n, not \r\n, even though  
> this is Windows. The end of a line, however, is designated by \r\n and  
> cells are separated by \t. My latest patch has replicated this protocol  
> when copying text. With this info in mind, consider two adjacent cells  
> both containing three single quotes and another cell containing a tab  
> within quotes, so visually where | indicates the division between cells,  
> the contents are:
>
> 1) """|"""
> 2) "\t"
>
> When copying or dde linking using unformatted text, we get the following  
> for both:
>
> """\t"""\r\n
>
> So it is impossible to distinguish these two sets of contents.

Following CSV rules, quotes would be escaped by doubling them. This
would give:

1) """"""""\t""""""""
2) """\t"""

But that does not seem to be what Excel delivers with DDE?

> However,  
> Excel always distinguishing them correctly when dde linking, not pasting  
> though. Initially I was wondering if it uses the 'item' information that  
> comes alongside the dde data, eg "R1C1:R1C2" to help determine the  
> number of rows/columns. However, this is not possible as it also deals  
> with this case of having a fixed number of cells, as in this 3 cell case:
>
>
> 1) "\t"|"""|"""
> 2) """|"""|"\t"
>
> both of which result in the following dde data:
>
> """\t"""\t"""\t"""\r\n
>
> When simply copying into Excel, it does not always get it right, which I  
> would expect. Also dde linking unformatted text from Word gives Excel  
> problems, so the question is how does it solve it for dde linking, which  
> contains the same textual data?

I think it simply uses a different protocol when linking between two
Excel documents, as you noticed and mentioned in your other mail.

> I have a hunch it uses dde links using  
> the SYLK format instead as when debugging paste linking unformatted text  
> from Excel into Calc, a SYLK request arrives in addition to unformatted  
> text. In my patch, I've fixed SYLK quoting, however, Calc's version of  
> SYLK still does not match the standard approach used by Excel and I  
> presume the original Multiplan, so I *think* the SYLK format is  
> incorrect, so when dde linking to Excel from OOo, Excel doesn't get it  
> right, but Excel to Excel it does.
>
> I've arrived at a juncture. Firstly, does anyone have a good insight  
> into all this? Secondly, assuming the dde links are done using SYLK, is  
> it okay to change this in OOo to match?

No, not unconditionally. You'd have to somehow distinguish between
a Calc-Calc link and other links that only understand unformatted text,
in sending or receiving.


> Finally, how does this relate to adding in the newline support? Well,  
> Calc uses \n as the line terminator on Unix and \r\n on Windows for  
> unformatted text copy/paste and linking. If \n exists within cells and  
> is escaped with quotes as on Windows, then the same problem arises as I  
> showed above with tabs in not being able to determine if \n is the end  
> of the line or a new line within a cell. That would mean for dde  
> linking, \r\n would need to be used on Unix (\n is used at the moment),  
> but this may not be such a surprise given that dde linking is a Windows  
> protocol.

Indeed. I'm also not aware of any application other than OOo supporting
DDE on other platforms, except OS/2 that has the same line end
convention.


> A couple more related queries...
> - Does anyone know of any other unix DDE clients, in particular  
> spreadsheets? If not the impact of changing the end of line terminator  
> from \n to \r\n won't be so big.
> - Calc implies that it supports DDE links as a server using SYLK, DIF,  
> HTML, etc in addition to plain text

Erm.. what are you referring with "Calc implies that it supports"?

> but in actual fact they all end up  
> calling the text format. This can be observed by debugging impex.cpp  
> when doing a copy from Calc and paste link in something like Excel. Is  
> this a known quirk? There seem to be other DDE problems eg some of the  
> paste link graphic formats into Word give errors.

AFAIK it was not intended to support anything else than cell content
with DDE.

> - It has been getting progressively harder to get a tab character into a  
> cell, from 2.4 to DEV300_m16 to OOO300_m7. Is this an accident or are  
> there some bug fixes related to this behaviour? I can't see any pattern,  
> eg aa\tbb\n will keep the tab in OOO300_m7, but aa\tbb will not and  
> aa\tbb will keep the tab in 2.4, but aa\tb will not.

I don't see any difference in behavior with OOO300_m9, given that a tab
can be entered to a cell only when copied from the clipboard. Or what
was your attempt?


> The patch is getting rather big with numerous knock on fixes. Is this  
> the sort of point in time that a child workspace should be created for  
> it to ease development?

Maybe the best, if this tends to last longer, for rebasing or other
developers to jump in. Do you want me to create one?


> Apologies for long email, but it is all a great big can of worms!!

Indeed..

  Eike

-- 
 OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer.
 SunSign   0x87F8D412 : 2F58 5236 DB02 F335 8304  7D6C 65C9 F9B5 87F8 D412
 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
 Please don't send personal mail to the [EMAIL PROTECTED] account, which I use 
for
 mailing lists only and don't read from outside Sun. Use [EMAIL PROTECTED] 
Thanks.

Attachment: pgpcaCEsQUQwq.pgp
Description: PGP signature

Reply via email to