https://bugs.documentfoundation.org/show_bug.cgi?id=169077

--- Comment #11 from Regina Henschel <[email protected]> ---
(In reply to Michael Otto from comment #10)

> But I can't find a way for the HTML test file. There can <table>, 
> <tr> and <td> only be found in the text, not as HTML elements. 
> How should the test file be used?

The entry in the Identifier field has to be
//table

<table> is an HTML element.
The implementation of this import is in
https://opengrok.libreoffice.org/xref/core/sc/source/ui/dataprovider/htmldataprovider.cxx
I had not touched that. I have only repaired, that the wrong field was used.

> 
> Isn't the implementation of XPath for the Identifier contrary to the
> documentation?
>    "Identifier: The target ID for HTML provided data..."
> I expected that the HTML attribute "id" should be used to address the items
> (see also the id="content" and id="src" attributes in the test file).
> Otherwise we should change the documentation and mention XPath there.

Yes, the documentation needs to be improved. I have already added a comment to
the "WorkInProgress" version of the Calc Guide for version 26.2. Might be a
bugreport for the help is needed as well.

> 
> 
> In the Wikipedia example the entries where a <link ...> is contained 
> in addition, are skipped (e.g. 1st table line 4 Titanic col. 2 Deutscher 
> Titel) and many other entries are missing where additional HTML elements 
> are contained in the <td> element.

Yes, the current HTML import is very simple. I hesitated about whether anything
should be fixed at all. There was also bug 139409, where it was discussed
whether the entire feature should be removed. But the feature exists since
LibreOffice version 6, that is more than 7 years now. It should therefore work
at least to some extent.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to