https://bz.apache.org/ooo/show_bug.cgi?id=91028

--- Comment #10 from [email protected] ---
---snip---
sal_Bool ScImportExport::ImportStream( SvStream& rStrm, const String& rBaseURL,
sal_uLong nFmt )
{
    if( nFmt == FORMAT_STRING )
    {
        if( ExtText2Doc( rStrm ) )      // pExtOptions auswerten
            return sal_True;
    }
...

---snip---

The nFmt is FORMAT_STRING, so it immediately calls "ExtText2Doc( rStrm )", as
it visible in frame #0 in my previous comment.

Let's set a breakpoint there and load a CSV file with huge lines, 20000 each of
 a, b, c, d, e, separated by commas, 100004 characters on one line.

This line:
---line---
rStrm.ReadCsvLine( aLine, !bFixed, rSeps, cStr);
---line---

calls SvStream::ReadCsvLine() in main/tools/source/stream/stream.cxx, which
ultimately ends up in SvStream::ReadLine( ByteString& rStr ). There, it
incrementally reads chunks of 256 bytes or less data from the file, looks for
end of line, and appends them to the string that was passed in:

---line---
rStr.Append( buf, n );
---line---

Putting a breakpoint there and printing the length of the rStr string, shows
that it increases on each loop round, reaches 65535, then remains stuck there:

---snip---
Thread 1 hit Breakpoint 17, SvStream::ReadLine (this=this@entry=0x80b409830,
rStr=...) at source/stream/stream.cxx:736
736                 rStr.Append( buf, n );
$396 = 65024

Thread 1 hit Breakpoint 17, SvStream::ReadLine (this=this@entry=0x80b409830,
rStr=...) at source/stream/stream.cxx:736
736                 rStr.Append( buf, n );
$397 = 65280

Thread 1 hit Breakpoint 17, SvStream::ReadLine (this=this@entry=0x80b409830,
rStr=...) at source/stream/stream.cxx:736
736                 rStr.Append( buf, n );
$398 = 65535

Thread 1 hit Breakpoint 17, SvStream::ReadLine (this=this@entry=0x80b409830,
rStr=...) at source/stream/stream.cxx:736
736                 rStr.Append( buf, n );
$399 = 65535
---snip---

That's because rStr is of type ByteString, which in tools/inc/tools/string.hxx
is defined with a 16 bit maximum size limit:

---snip---
#ifdef STRING32
#define STRING_NOTFOUND    ((xub_StrLen)0x7FFFFFFF)
#define STRING_MATCH       ((xub_StrLen)0x7FFFFFFF)
#define STRING_LEN                 ((xub_StrLen)0x7FFFFFFF)
#define STRING_MAXLEN      ((xub_StrLen)0x7FFFFFFF)
#else
#define STRING_NOTFOUND    ((xub_StrLen)0xFFFF)
#define STRING_MATCH       ((xub_StrLen)0xFFFF)
#define STRING_LEN                 ((xub_StrLen)0xFFFF)
#define STRING_MAXLEN      ((xub_StrLen)0xFFFF)
#endif
---snip---

There are multiple ways to fix this. Globally define STRING32, which could have
many unintended consequences, such as larger spreadsheet cell strings. Pass a
different string buffer type to SvStream::ReadCsvLine(), so that it is
unaffected by that limit. Drop the stream entirely and switch to push-model CSV
parsing, which is the lightest on memory and could be paused and resumed, but
has more complex code.

-- 
You are receiving this mail because:
You are the assignee for the issue.
You are on the CC list for the issue.

Reply via email to