Hello Christian,

Thanks for your reply. I added the documents again with the default options
and I do get satisfying results. Not sure why I kept on using the settings
recommended in the documentation...
Would it be possible to add the tagsoup documentation link about the parser
options to the BaseX doc? That could be helpful.

Thanks,
- Tim

On Mon, Jan 30, 2023 at 10:42 PM Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Tim,
>
> I assume the article element will be preserved if you omit the
> nobogons HTMLPARSER option [1]. Usually, there’s no need to set
> specific options if the default behavior gives satisfying results.
>
> Best,
> Christian
>
> [1] http://vrici.lojban.org/~cowan/tagsoup/
>
>
>
> On Fri, Jan 27, 2023 at 8:05 PM Timothée <timog...@gmail.com> wrote:
> >
> > Hello all,
> >
> > I am trying to store HTML documents in BaseX. I setup a local instance
> of BaseX on my computer using Docker, and I imported this file in it:
> https://pastebin.com/HJdJgLv9
> >
> > On my local BaseX instance, the document is imported and
> "/html/body/article" does return the <article> node as expected.
> >
> > On my remote/production BaseX instance (using the same Dockerfile and
> image), the document is imported but the <article> tag is "stripped" (even
> though its contents / child nodes remain in the imported document).
> "/html/body/article" is empty.
> >
> > If I copy over the .basex files from my local database to my remote
> database, then the documents are complete like on my local instance. I also
> tried to import the documents again on my local instance, and the <article>
> tag gets stripped too (and the child nodes remain).
> >
> > What am I doing wrong when importing my documents? What did I do to
> import them properly in my current local instance? I tried a lot of options
> but I just can't figure out why this happens (I fiddled a lot with it).
> >
> > I used the following options when importing my documents, as per the
> documentation:
> > SET PARSER html
> > SET HTMLPARSER
> method=xml,nons=true,nocdata=true,nodefaults=true,nobogons=true,nocolons=true,ignorable=true
> > SET CREATEFILTER *.html
> >
> > I also use SET FTINDEX true but I don't think it would have an impact
> anyway.
> >
> > Thank you very much!
> > - Tim
>

Reply via email to