Bugs item #2586088, was opened at 2009-02-10 19:47
Message generated for change (Comment added) made by jflokstra
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2586088&group_id=56967

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PF/runtime
Group: MonetDB4 "stable"
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Wouter Alink (vzzzbx)
Assigned to: Jan Flokstra (jflokstra)
Summary: XQ: large text nodes

Initial Comment:
(monetdb nov2008 sp2 on Linux)

The following occurred:

wal...@ldc:~/tmp> xmlwf tmp.xml # content is well-formed
wal...@ldc:~/tmp> cat tmp.xml | mclient -lxq -I oops5.xml
MAPI  = mone...@localhost:50000
ACTION= mapi_stream_into
ERROR = !ERROR: Detected an entity reference loop
        !ERROR: shredder_parse: XML input not well-formed.
        !ERROR: CMDshred_stream: operation failed.
wal...@ldc:~/tmp>

What happened is that there is a text-node in tmp.xml which contains more than 
8M characters.

In shred_characters() in shredder.mx the maximum text content buffer size is 
set at 8M (1<<23). It ignores everything after the 8Mth character. If the 8Mth 
character is in the middle of an entity (like "&quot;"), then the error above 
is returned.

I was able to reproduce a document with the features described above using the 
following python script:

wal...@ldc:~/tmp> cat createLargeTextField.py
i=0
print "<aap>"
while i < 10000000:
        print '&quot;'
        i+=1
print "</aap>"
wal...@ldc:~/tmp> python createLargeTextField.py > tmp.xml


p.s. another issue, not really a bug, is that for each (small) portion after 
the 8Mth character of a text-node a warning is issued. I would expect only 1 
warning to be issued for each text-node that is too large. different bug-report?


----------------------------------------------------------------------

>Comment By: Jan Flokstra (jflokstra)
Date: 2009-02-20 09:31

Message:
Update: I could not check in the fix in Stable because of sticky bit so you
have to wait a little bit longer for the fix.

----------------------------------------------------------------------

Comment By: Jan Flokstra (jflokstra)
Date: 2009-02-20 09:24

Message:
I fixed the problem by making the character buffer dynamic. So whenever a
larger text node is used than any previous I realloc() the character buffer
to fit the larger text. I still use the initial value of 1<<23  so for
normal cases the change has no effect (in speed or size). For larger sizes
I think we now support up to MAXINT or whatever the max is that libxml2 can
handle.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2586088&group_id=56967

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs

Reply via email to