Christian, Indeed, I concur that the wish list would grow; a generalized approach is what we need. I'll let you think about that. :-)
In the meantime, as you suggest, if I'm willing to cache the data first, I have many options. Certainly it's possible in my testing framework but as we build out, it'll be another issue. Alternatively, once I'm in BaseX -- I'm already deleting unwanted nodes including comments and PIs using a command script. Could I similarly do something like this? replace value of node //text()[empty(../*)] with normalize-space(//text()[empty(../*)]) ? (I'm pretty new to XQuery update. I suppose I could always just try it. :-) Thanks as always, Wendell On Fri, Feb 22, 2013 at 5:34 AM, Christian Grün <[email protected]> wrote: > Hi Wendell, > > the CHOP option has been introduced at a verly stage of BaseX, and I’m > not sure if we had added it today. We could add one or more additional > options to normalize whitespaces or removing PIs/comments from the > input, but the wish list, and the exception list, would probably > continue to grow, so I believe that it would be more convenient to > have a general pre-processing step that takes care of all the > normalization steps. I’m not sure, however, what’s the best approach > to do this within BaseX. If it’s possible to cache files on disk > before adding them to the database, I would recommend XQuery or BaseX > command scripts, XProc or anything else to prepare the data and delete > it afterwards. > > Comments are welcome, > Christan > ___________________________ > > On Wed, Feb 20, 2013 at 5:35 PM, Wendell Piez <[email protected]> wrote: >> Hi, >> >> I see the 'CHOP' option, turned on by default, for trimming leading >> and trailing whitespace and eliminating empty text nodes. >> >> What about going further? Is there a good way to normalize whitespace >> entirely, collapsing any runs of tab-LF-space into single spaces in my >> data? >> >> I think I mentioned earlier the idea of specifying an XSLT >> transformation to filter data on ingest (for a similar requirement, >> namely removing all comments and PIs). That might be going too far but >> any hints you can give me (or pointers to docs) about functionality to >> address this sort of thing in general would be welcome. >> >> Thanks! >> Wendell >> >> -- >> Wendell Piez | http://www.wendellpiez.com >> XML | XSLT | electronic publishing >> Eat Your Vegetables >> _____oo_________o_o___ooooo____ooooooo_^ >> _______________________________________________ >> BaseX-Talk mailing list >> [email protected] >> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk -- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^ _______________________________________________ BaseX-Talk mailing list [email protected] https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

