Christian,

Indeed, I concur that the wish list would grow; a generalized approach
is what we need. I'll let you think about that. :-)

In the meantime, as you suggest, if I'm willing to cache the data
first, I have many options. Certainly it's possible in my testing
framework but as we build out, it'll be another issue.

Alternatively, once I'm in BaseX -- I'm already deleting unwanted
nodes including comments and PIs using a command script. Could I
similarly do something like this?

replace value of node //text()[empty(../*)] with
normalize-space(//text()[empty(../*)])

?

(I'm pretty new to XQuery update. I suppose I could always just try it. :-)

Thanks as always,
Wendell


On Fri, Feb 22, 2013 at 5:34 AM, Christian Grün
<[email protected]> wrote:
> Hi Wendell,
>
> the CHOP option has been introduced at a verly stage of BaseX, and I’m
> not sure if we had added it today. We could add one or more additional
> options to normalize whitespaces or removing PIs/comments from the
> input, but the wish list, and the exception list, would probably
> continue to grow, so I believe that it would be more convenient to
> have a general pre-processing step that takes care of all the
> normalization steps. I’m not sure, however, what’s the best approach
> to do this within BaseX. If it’s possible to cache files on disk
> before adding them to the database, I would recommend XQuery or BaseX
> command scripts, XProc or anything else to prepare the data and delete
> it afterwards.
>
> Comments are welcome,
> Christan
> ___________________________
>
> On Wed, Feb 20, 2013 at 5:35 PM, Wendell Piez <[email protected]> wrote:
>> Hi,
>>
>> I see the 'CHOP' option, turned on by default, for trimming leading
>> and trailing whitespace and eliminating empty text nodes.
>>
>> What about going further? Is there a good way to normalize whitespace
>> entirely, collapsing any runs of tab-LF-space into single spaces in my
>> data?
>>
>> I think I mentioned earlier the idea of specifying an XSLT
>> transformation to filter data on ingest (for a similar requirement,
>> namely removing all comments and PIs). That might be going too far but
>> any hints you can give me (or pointers to docs) about functionality to
>> address this sort of thing in general would be welcome.
>>
>> Thanks!
>> Wendell
>>
>> --
>> Wendell Piez | http://www.wendellpiez.com
>> XML | XSLT | electronic publishing
>> Eat Your Vegetables
>> _____oo_________o_o___ooooo____ooooooo_^
>> _______________________________________________
>> BaseX-Talk mailing list
>> [email protected]
>> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk



--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^
_______________________________________________
BaseX-Talk mailing list
[email protected]
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to