https://bugs.documentfoundation.org/show_bug.cgi?id=167871

Tex2002ans <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Tex2002ans+LibreOffice@gmai
                   |                            |l.com

--- Comment #4 from Tex2002ans <[email protected]> ---
Created attachment 202907
  --> https://bugs.documentfoundation.org/attachment.cgi?id=202907&action=edit
Find.and.Replace.-.Regular.Expression.-.SOFT.HYPHEN.-.U+00AD.odt

I attached a sample document with 3 examples:

- no hyphen
- HYPHEN-MINUS
- SOFT HYPHEN
--- Comment 0's bug definitely happens in 3rd sentence!

- - -

I confirm this happens in:

Version: 25.8.1.1 (X86_64)
Build ID: 54047653041915e595ad4e45cccea684809c77b5
CPU threads: 8; OS: Windows 11 X86_64 (build 22631); UI render: Skia/Vulkan;
VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

... but it looks like the:

- SOFT HYPHEN (U+00AD)

is handled strangely.

Currently, LO can "find" the text, but does not seem to "capture" it into a
regex Group.

But I think the root cause of this bug is...

If "Regular Expression" mode is ON:

- `[:alpha:]` SHOULD NOT match the SOFT HYPHEN character.
- `[:alpha:]` SHOULD ONLY match alphabetic characters.
- SOFT HYPHEN should be treated as a...
--- "punctuation mark", roughly equivalent to "a HYPHEN" (U+002D)!

- - -

STEPS TO REPRODUCE

0. Open attached document.

1. Edit > Find and Replace (Ctrl+H).

2. Expand "Other Options", then make sure these 2 checkboxes are ON:

- Regular Expressions
- Diacritic-sensitive

3. In the 2 boxes, type:

- Find: \b"([:alpha:]+)"\b
- Replace: „$1“

4. Press the "Replace All" button.

ACTUAL

After pressing "Find All" and/or "Replace All":

- 2 hits
--- 1st line turned into „antäuschen“
--- 3rd line turned into „$1“
----- = BUG

EXPECTED

After pressing "Find All" and/or "Replace All":

- 1 hit
--- 1st line turned into „antäuschen“

- - -

NOTES on Comment 3:

Hmmm... very strange.

I can get the SOFT HYPHEN to match with a period.

For example, do Step 3:

- Find: \b"(.+?)"\b
- Replace: „$1“

and this will retain the SOFT HYPHEN + any inner text, while flipping the
quotes.

But if you do this:

- Find: \b"an.
- Replace: „ZZZ

LO will act like the SOFT HYPHEN isn't even there and match/replace both:

- "ant
   - 4 characters
- "an-t
   - 5 characters
   - '-' = invisible SOFT HYPHEN position

Hmmmm... so something weird is definitely going on with the SOFT HYPHEN and
regex.

It could be because SOFT HYPHEN is a weirdly unique character, acting as
"punctuation" AND "a format code" AND is "invisible" at the same time.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to