On 4/25/17, 3:17 PM, "Roger Leigh" <rle...@codelibre.net> wrote:

> Switching to git would be wonderful.  We could also enable CI testing 
> with e.g. Travis or some other CI service on github at that time to 
> enable testing of all PRs, if that would be accceptable.  Or does the 
> Apache project provide any equivalent services internally?

There are already mirrors of the code at git.apache.org (and to github from 
there), and of course all CI tools can pull from svn just as easily as git. 
That's never been an impediment. I don't know if there are tests sufficient to 
be worth exercising like that or not.

> Regarding (3), it's a bit outside the scope of this CMake ticket.  My 
> intentions here were to get a build system which would provide a working 
> build on all platforms, including the unit tests.  I didn't want to go 
> down the rabbit hole at the same time.  Ideally, if we merge this to the 
> trunk and branch off a 3.2 and release that, more adventurous changes 
> could be then done on the trunk.  I'd rather have a working release with 
> the CMake support included than to do both and not have an immediately 
> usable and API compatible release!

+1

I wasn't suggesting anything else, and it makes sense to go ahead and branch 
again if there's going to be any real screwing around, I need a stable branch 
myself.

I have made some progress today after a few hours reviewing trunk and I'm only 
about 10 commits back from when I started cherry picking things back to the 3.1 
branch, at which point the trunk essentially froze. So far there is very little 
divergence, just a few small API additions that are unique to the trunk. So I 
don't foresee anything terribly risky about releasing this after some 
additional fixes, some testing, and incorporating your patch.

> That said, I'd not be averse to including support for standard C++; 
> using Xerces is often a bugbear due to its age.  All our code is now 
> C++11, with RAII wrappers to make Xerces play nicely.  Primarily the 
> lack of RAII, non-standard exception types, odd memory management 
> semantics and transcoding all input.

The problem with C++11 is it's just not portable to enough compilers outside of 
Windows. I'm aware gcc probably supports it but gcc on actual Linux distros 
that people still use heavily does not. If I can't build it on RH6 it's not 
usable for me, and since I'm the one doing most of the work right now...

Really, C++11 is beside the point. Simply good old C++ would fix many issues, 
but this code dates to back when using real C++ and the STL was just too 
non-portable, along with the usual Unix anti-C++ bias.

> Something worth noting is that our 
> (optional) ICU dependency switched to requiring C++11 with ICU 59.1.  It
>  switched to using the standard char16_t as its XML string type.  If 
> Xerces were to also switch (or at least use a suitable typedef), we 
>  could be using const char16_t* foo = u"UTF-16 strings" and/or u8"UTF-8" 
> strings directly in both the xerces sources and in client programs.  A 
> major usability improvement.

At a huge cost in portability unfortunately. Believe me, I wish that were 
viable for me. So, so much.

> In a recent performance testing exercise at work, we found string 
> transcoding inside xerces-c to be a major time sink--using valgrind 
> callgrind--it was one of the major uses of CPU time during parsing and 
> DOM processing.  It was slower than xerces-j for the same operations, 
> and this was likely to be a major cause.

I'm not sure that you're going to fix that. It's already using UTF-16 
internally. If there are problems with transcoding, I think that's just the 
cost of transcoding, I don't think the need to transcode goes away unless I'm 
missing something.

Anyway, within a week or two I expect to be able to put trunk in a position to 
accept your patch and we can continue on from there.

-- Scott


Reply via email to