https://bugs.documentfoundation.org/show_bug.cgi?id=165931
László Németh <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |WORKSFORME --- Comment #5 from László Németh <[email protected]> --- Regex library search operates on the plain text conversion of the document, where a single text line contains the full text of a paragraph (i.e. paragraph/line). We always need a plain text conversion (back and forth) of the document for regex search, and we have only a single \n for line end (i.e. in plain text editors, you cannot search for paragraph end without adding some extra syntax or heuristic – similarly, in Writer plain text import, there is a heuristic to recognize shorter lines as paragraph boundaries). Fortunately there are possible solutions or workarounds: 1) Easy command line, 2) Macro + Find & Replace 3) Macro only (first step for an add-on development) == 1) Easy command line == 1) Export your document to PDF. 2) Grep your plain text content of the PDF, showing the matching lines in Linux/macOS/Cygwin command line: $ less document.pdf | grep '” *$' Note: When I made some research for hyphenation development (https://numbertext.org/typography/automatikus_magyar_elv%C3%A1laszt%C3%A1s_a_LibreOffice-ban.pdf), I used this, generating hundreds of documents with pyUNO, and the basic Linux tool "less" converted the PDFs to plain text documents with the requested line breaks immediately. == 2) Macro + Find & Replace == 1. Mark line ends with neutral Unicode characters using UNO, e.g. with zero-width joiner (it depends on your text). 2. Apply Find & Replace with regex pattern matching, e.g. "\w+\W?\u200d" to select last line words (with an optional punctuation mark) using Find All. 3. Format the selected words, e.g. underline them (but other formatting, e.g. applying bold text would change the following line ends, so sometimes it's better to use only macro). 3. Remove the neutral Unicode characters using Find & Replace. For example, the Basic code for inserting ZWJ (U+200d): ''''''''''''' Sub RunArg(command, args) dim document as object dim dispatcher as object document = ThisComponent.CurrentController.Frame dispatcher = createUnoService("com.sun.star.frame.DispatchHelper") dispatcher.executeDispatch(document, command, "", 0, args) End Sub Sub Run(command) RunArg(command, Array()) End Sub Sub HardBreak() dim args1(1) as new com.sun.star.beans.PropertyValue cursor = ThisComponent.CurrentController.getViewCursor() Run(".uno:Escape") Run(".uno:GoToEndOfDoc") Do ' insert ZWJ (zero-width joiner, U+200D) character at the end of the line Run(".uno:GoToEndOfLine") args1(0).Name = "Text" args1(0).Value = "" ' ZWJ within quotation marks RunArg(".uno:InsertText", args1) ' go the the previous line Run(".uno:GoLeft") Run(".uno:GoToStartOfLine") origStart = cursor.Start Run(".uno:GoUp") ' loop until the cursor position doesn't change any more Loop Until cursor.Text.compareRegionStarts(origStart, cursor.Start) = 0 End Sub '''''''''''''''''''' Note: it seems, ZWJ can modify hyphenation (maybe a bug), see the attached screenshot. == 3) Macro-only == When the regex replace modifies line breaking, line ends, it's better to use a macro-only solution, e.g. extending the previous macro to do everything automatically. For example, selecting line-by-line the document using UNO dispatcher calls: Run(".uno:GoToEndOfLine") Run(".uno:StartOfLineSel") and calling Find & Replace with Search In Selection: Sub SearchInSelection(regex) dim args1(22) as new com.sun.star.beans.PropertyValue args1(0).Name = "SearchItem.StyleFamily" args1(0).Value = 2 args1(1).Name = "SearchItem.CellType" args1(1).Value = 0 args1(2).Name = "SearchItem.RowDirection" args1(2).Value = true args1(3).Name = "SearchItem.AllTables" args1(3).Value = false args1(4).Name = "SearchItem.SearchFiltered" args1(4).Value = false args1(5).Name = "SearchItem.Backward" args1(5).Value = false args1(6).Name = "SearchItem.Pattern" args1(6).Value = false args1(7).Name = "SearchItem.Content" args1(7).Value = false args1(8).Name = "SearchItem.AsianOptions" args1(8).Value = false args1(9).Name = "SearchItem.AlgorithmType" args1(9).Value = 1 args1(10).Name = "SearchItem.SearchFlags" args1(10).Value = 71680 ' code for search in selection args1(11).Name = "SearchItem.SearchString" args1(11).Value = regex args1(12).Name = "SearchItem.ReplaceString" args1(12).Value = "" args1(13).Name = "SearchItem.Locale" args1(13).Value = 255 args1(14).Name = "SearchItem.ChangedChars" args1(14).Value = 2 args1(15).Name = "SearchItem.DeletedChars" args1(15).Value = 2 args1(16).Name = "SearchItem.InsertedChars" args1(16).Value = 2 args1(17).Name = "SearchItem.TransliterateFlags" args1(17).Value = 1073743104 args1(18).Name = "SearchItem.Command" args1(18).Value = 1 args1(19).Name = "SearchItem.SearchFormatted" args1(19).Value = false args1(20).Name = "SearchItem.AlgorithmType2" args1(20).Value = 2 args1(21).Name = "Quiet" args1(21).Value = true args1(21).Name = "SynchronMode" args1(21).Value = true RunArg(".uno:ExecuteSearch", args1()) end sub (See argument SynchronMode to update the text lines to update the document to select next line correctly). Note: adding the ZWJ or other mark to the line is still needed. So it's work for me (especially because regex is already a feature for advanced users), but if you think, please file an enhancement request or reopen this issue with that. Maybe it's worth to add a complete macro-only solution. -- You are receiving this mail because: You are the assignee for the bug.
