That is extremely good advice that I absolutely intend to follow. Here's a bit about what I'm doing:
At Digital Ricoeur (https://digitalricoeur.org/), we have a corpus of hundreds of XML documents and growing, some of them book-length. These must be validated against a custom DTD that is derived from the TEI standard. (We also have a number of additional, project-specific validity requirements that we check with Racket contracts. These end up covering most of validation, but having a standard tool working from the DTD is important as a sanity check.) Currently, I validate the documents using xmllint (a program included with libxml2) running in an external process. This is mostly good, especially if all of the documents are valid (which of course we always hope is the case). However, things don't go so well if even one document is invalid: xmllint doesn't provide structured output, just error messages written to standard error and a non-zero exit code if any of the files were invalid. Currently, when that happens, we fall back to invoking xmllint on each file individually. That takes an extremely long time. Obviously there are a number of possible approaches. I've considered, for example, partitioning the list of files when some are invalid so that hopefully we could find some sub-groups that can be done all at once. What I'm currently exploring, though, is writing a helper program in Racket using the FFI. It will probably read a list of paths from standard in and write a hash table to standard out mapping each path to its validation result. I still plan to run the validation in a separate process—I don't like segfaults—just the subprocess will now happen to also be implemented in Racket, communicate in s-expressions, and not have to be invoked repeatedly to track down which specific files are invalid. (Of course, I haven't implemented this yet, so we'll see how it turns out in practice.) All of that said, though, thank you for the links to Oleg Kiselyov's work on validation! In the long term I would love to have a real XML validator in pure Racket and leave libxml2 behind altogether. I had been looking at some of the sxml packages (though all of my code now uses x-expressions in the sense of the xml module, and actually a restricted subset of those), but I hadn't seen the first link you sent, in particular. -Philip On Mon, Aug 27, 2018 at 9:34 AM Neil Van Dyke <n...@neilvandyke.org> wrote: > Rather than use FFI, would it work for your purposes to have the libxml2 > code in a separate process from Racket? That would avoid the likely C > memory bugs corrupting your Racket process. > > https://www.cvedetails.com/vulnerability-list/vendor_id-1962/product_id-3311/Xmlsoft-Libxml2.html > > I've done this before for XML in Racket, to get DSig support, when I > couldn't cost-justify implementing it in pure Racket at the time. (W3C > standards tend to be big and complicated, and your implementation of > DSig has to be perfectly compliant in many regards, to work at all.) > > Another possible option is to do what validation and other XML behavior > you need in pure Racket. Oleg Kiselyov did some work on validation, > and, if you have the time, you might implement more. > http://okmij.org/ftp/Scheme/xml.html#validation > https://pkgs.racket-lang.org/package/sxml > https://www.neilvandyke.org/racket/sxml-intro/ > > XML validation is good for system robustness, but every C library we > pull into a Racket process makes us less confident about robustness in a > different way. > > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.