https://bugs.documentfoundation.org/show_bug.cgi?id=141187

Tex2002ans <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Tex2002ans+LibreOffice@gmai
                   |                            |l.com

--- Comment #6 from Tex2002ans <[email protected]> ---
Yes, this is still an issue in:

Version: 24.2.1.2 (X86_64) / LibreOffice Community
Build ID: db4def46b0453cc22e2d0305797cf981b68ef5ac
CPU threads: 8; OS: Windows 10.0 Build 22631; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL threaded

- - -

I tested using BogdanB's attachment 173984 in comment 3.

0. Open file.
1. Add a random word or two inside the text.
2. File > Export As > Export as EPUB.
3. Press OK.
4. Unzip the EPUB and look inside the HTML.
   - Or use an EPUB editing program like Sigil or Calibre.

You'll see extra <span>s in the EPUB:

> Like lightning he darted off to the left and disappeared between the two 
> warehouses almost falling over the trash can lying în </span><span 
> class="span2">my</span><span class="span2"> the middle of the sidewalk. He 
> tried to nervously tap his way along in </span><span class="span2">my 
> </span><span class="span2">the inky darkness and suddenly stiffened:

with this blank class in the EPUB's CSS:

> .span2 {
> }

= = = = = = = = = = = =

I believe part of the root cause is spurious:

- officeooo:rsid

inside the ODT file, which get carried over into the HTML/EPUB export.

(I believe these RSIDs are "Random Session IDs"—to know when a certain text was
edited for Comparison / Tracked Changes reasons.)

- - -

If you take the ODT and:

- File > Save As
- Dropdown for "Save as Type:"
   - Choose "Flat XML ODF Text Document"

You can open the FODT up in a text editor and see code along these lines:

> <text:p text:style-name="P1">He heard quiet steps behind him. [...] almost 
> falling over the trash can lying în <text:span 
> text:style-name="T1">my</text:span> the middle of the sidewalk. He tried to 
> nervously tap his way along in <text:span text:style-name="T2">my 
> </text:span>the inky darkness and suddenly stiffened: it was a dead-end, [...]

where extra <text:span>s appear around everything you insert/edit.

Higher in the FODT document, you can see what "T1" and "T2" were equivalent to:

>  <style:style style:name="T1" style:family="text">
>   <style:text-properties officeooo:rsid="00019890"/>
>  </style:style>
>  <style:style style:name="T2" style:family="text">
>   <style:text-properties officeooo:rsid="0003570a"/>
>  </style:style>

The only thing these <text:span>s were there for was:

- officeooo:rsid

they didn't supply any other info.

- - -

There was a similar issue with "single URLs" getting split into "multiple
identical ones" here:

- Bug #112429 : "officeooo:rsid multiplies the links"
- Bug #148198 : "Editing single hyperlink breaks it into smaller ones"
   - Which got fixed in 7.5.0 and 7.4.0.2.

Mike Kaganski then came up with a patch to "merge identical hyperlinks of
adjacent text ranges on ODF export":

- https://bugs.documentfoundation.org/show_bug.cgi?id=148198#c19

= = = = = = = = = = = =

So, on EPUB Export, I would probably do some logic along these lines:

Case 1: Before

- If ODT's "text:span text:style-name" only has "officeooo:rsid":
   - Do not export this <span> to EPUB at all.
- If 2 "text:spans" are right next to each other and the only difference is
"officeooo:rsid".
   - Merge them together before HTML/EPUB export.
      - Similar to Bug 148198 above!

Case 2: After

You could have a pass that says:

- If the CSS class is empty/blank on the other end:
   - Delete that <span> out of the HTML/CSS/EPUB export completely.

= = = = = = = = = = = =

Note 1: Calibre's EPUB Editor has a fantastic feature called:

- "Remove Unused CSS"
- https://manual.calibre-ebook.com/edit.html#removing-unused-css-rules

which can do this type of thing in one button push:

- Tools > "Remove unused CSS"

It:

- Finds and purges all CSS and related HTML tags that that are blank / not in
use

making the leftover HTML *much* easier to work with.

- - -

Note 2: I've also written many topics about this type of HTML+CSS cleanup over
the years. Most recently:

2023: "Nested span, clean"
- https://www.mobileread.com/forums/showthread.php?p=4342160#post4342160

2023: "removing excessive <class> and other formatting horrors on epub"
- https://www.mobileread.com/forums/showthread.php?p=4312194#post4312194

2022: "Convert text formating from CSS to HTML"
- https://www.mobileread.com/forums/showthread.php?p=4188132#post4188132

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to