Re: Xerces-C API changes for XQilla

John Snelson Thu, 17 Apr 2008 08:32:50 -0700

Hi Boris,

Boris Kolpackov wrote:

John Snelson <[EMAIL PROTECTED]> writes:

1) Problem:

XPath 2.0 is just different to XPath 1.0. We've therefore got our own
version of DOMXPathResult (XPath2Result) which makes more sense in this
context:

http://xqilla.sourceforge.net/docs/dom3-api/classXPath2Result.html

Solution:

It's probably simple enough to either extend DOMXPathResult to include
the extra functionality in XPath2Result, or to include it as a new class
called DOMXPath2Result.


I did a quick check and it appears that the DOMXPathResult is very
similar to DOMXPath2Result. I would therefore suggest that we try
to add the missing functionality to DOMXPathResult as non-standard
extensions (though we should try to use names that will likely be
used in the next version of DOM3 when it is updated to include
support for XPath 2, for example getIntegerValue instead of asInt).
What is your feeling on this approach? Also did you base your
DOMXPath2Result on any draft spec (e.g., where do the asDouble,
asInt, etc., names come from)?

XPath2Result wasn't based on any draft spec IIRC. Gareth probablyremembers more about it's design, since I wasn't heavily involved inPathan at the time.

Having given it some thought, I think that merging the two would be thebest idea - it would be a breaking change for XQilla users, but it wouldbe a move to a more standard API.

2) Problem:

It's necessary to get access to DOMDocumentImpl, which isn't in the
public API, in order to implement the DOM3 XPath API. Needing access to
the Xerces-C source code to compile XQilla is a big problem for our
maintainers. We need DOMDocumentImpl for a number of reasons:

[...]

Solution:

Put DOMDocumentImpl in the public API.


The DOMDocumentImpl.hpp is now installed with the rest of the headers.
I've also changed all private data members and functions to be protected
in all DOM*Impl classes. Is there anything else we need to do?

I think that some headers related to DOMDocumentImpl are also needed. Anexhaustive list of the ones that XQilla includes are these:


xercesc/dom/impl/DOMAttrImpl.hpp
xercesc/dom/impl/DOMCasts.hpp
xercesc/dom/impl/DOMDocumentImpl.hpp
xercesc/dom/impl/DOMDocumentTypeImpl.hpp
xercesc/dom/impl/DOMElementNSImpl.hpp
xercesc/dom/impl/DOMNodeImpl.hpp
xercesc/dom/impl/DOMRangeImpl.hpp
xercesc/dom/impl/DOMTypeInfoImpl.hpp
xercesc/dom/impl/DOMWriterImpl.hpp

I imagine that you also need to include the headers that these filesinclude themselves.

5) Problem:

RegularExpression is not thread safe or consistent with it's use of
MemoryManager. It's also not quite flexible enough to implement XSLT
2.0's analyze-string, and it has bugs in the replace() methods.

http://www.w3.org/TR/xslt20/#analyze-string

Solution:

I have a patch that fixes all of this in Xerces-C 2.8, and I can update
it to apply to 3.0. I'm in the process of getting permission to sign the
contributor agreement.


Sounds good.

I've got permission now - hurrah! In the next couple of days when I findsome time I'll port the patches over to Xerces-C 3.0.

6) Problem:

The socket and WinSock HTTP InputStream implementations have fixed
buffers which can result in buffer overflow. They needlessly duplicate a
whole load of code that could be shared. In addition, a lot of
algorithms need access to the HTTP "Content-Type" header, to decide how
to parse a file, or what encoding it is in - for instance see XSLT 2.0's
unparsed-text() function:

http://www.w3.org/TR/xslt20/#unparsed-text

Solution:

I have a patch that implements this functionality for
UnixHTTPURLInputStream and BinHTTPURLInputStream (WinSock) in Xerces-C
2.8. I added BinInputStream::getContentType() to get access to the
"Content-Type" header. I can update this code for Xerces-C 3.0.


Sounds good. There are also Curl, MacOS, and libWWW net accessors.
Hopefully it will be easy to implement getContentType() for them.

For Curl the option to use seems to be CURLOPT_HEADERFUNCTION, althoughI haven't investigated more than that.


http://curl.rtin.bz/libcurl/c/curl_easy_setopt.html

For libWWW I haven't looked further by I imagine it should be possible.I looked into the MacOS net accessors, and it seemed to be impossible tosupport this API with them. However, recent email on the list suggeststhat these implementations have other problems (a problem with fork?).

Since you won't get any content type information back for file URLseither, I'd suggest that a null return should be the standard responsewhen the content type is unavailable.

7) Problem:

GrammarResolver has a bug where it fails to initialize it's XSModel if
the XMLGrammarPool it is created with is locked.

Solution:

We hack this at the moment, but it would be great if this could be fixed.


Would you be willing to work on a patch? Also I hit a bug in this area
once that may be related. This code works:

    auto_ptr<GrammarResolver> gr (new GrammarResolver (0));

    // load some schemas into gr

    XMLGrammarPool* gp = gr->getGrammarPool ();
    gp->lockPool ();
    XSModel* xsm = gp->getXSModel ();

While if I remove lockPool(), the returned XSModel is invalid. Or may
be this is how it is supposed to work.

I don't know if your problem is related. I'll try to work out a fix forthe problem I'm seeing, at least.


John

--
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Xerces-C API changes for XQilla

Reply via email to