Congrats on the latest version! Looking forward as usual to exploring the
new features.

However, I'm perplexed by the decision to remove the text parser from the
codebase. I understand the desire to streamline and remove dependencies
related to lower-value features, but I've always found the text parser to
be super useful. After installing Basex 10.8 beta today, I had to refactor
a process (parsing a set of interview transcripts generated by Zoom) that
involved creating a DB from a directory of text files.

In addition, I noticed some unexpected results in how the text was parsed
using standard methods. In BaseX 10.6, using the text parser in the GUI,
the output looks like this:

<text>WEBVTT

1
00:00:02.910 --&gt; 00:00:27.240
...
</text>

Here, each line end is just a newline character (\n).

Using file:read-text or fn:unparsed-text (in 10.6 and 10.8 beta), the
output looks like this:

<text>WEBVTT&#xD;
&#xD;
1&#xD;
00:00:02.910 --&gt; 00:00:27.240&#xD;
...
</text>

Here, each line end also has a carriage return (\r).

And if instead, I store it as an XQuery value, I see the newline characters
that aren't otherwise displayed in the GUI:

"WEBVTT&#xD;&#xA;&#xD;&#xA;1&#xD;&#xA;00:00:02.910 -->
00:00:27.240&#xD;&#xA;..."

So, the text parser seems to have done some normalization, which was also
helpful.

Any chance that it could be restored (by popular demand) in version 11? :)

Best regards,
Tim


-- 
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library

El vie, 4 ago 2023 a la(s) 06:55, Christian Grün ([email protected])
escribió:

> Dear all,
>
> We’re pleased to announce version 10.7 of BaseX, of our XML framework:
> https://basex.org.
>
> The new release is a big step forward towards BaseX 11:
>
> • We have added numerous new operators, functions and features of XQuery
> 4.
> • The GUI editor and result view now provide full support for Unicode
> characters.
> • Font rendering has been improved (you can tweak it in the Font Dialog)
> • More BaseX 11 preview features available (see docs.basex.org)
> • Various bug fixes (web:forward, job:eval, main-memory documents)
>
> Have fun,
> Your BaseX Team
>

Reply via email to