Hi all, During a recent job I have had to develop some functions for string sorting. Some of these contains a very simple implementation of the "natural" sort algorithm. This implementation uses the service css.util.TextSearch to perform special string comparisions. Although the code works well (but it is not very tested!!) and fits my specs, the performance remains quite slow and I don't see any (easy) way to improve it. (but what appears difficult to my eyes maybe easy for others :-)
Finally, I thought to submit the thing as a code-snippet, in the hope it will be useful to someone. best regards Paolo M.
<?xml version="1.0"?> <!-- $RCSfile: $ last change: $Revision: $ $Author: $ $Date: $ (c)2003 by the copyright holders listed with the author-tags. If no explicit copyright holder is mentioned with a certain author, the author him-/herself is the copyright holder. All rights reserved. Public Documentation License Notice: The contents of this Documentation are subject to the Public Documentation License Version 1.0 (the "License"); you may only use this Documentation if you comply with the terms of this License. A copy of the License is available at http://www.openoffice.org/licenses/PDL.html The Original Documentation can be found in the CVS archives of openoffice.org at the place specified by RCSfile: in this header. The Initial Writer(s) of the Original Documentation are listed with the author-tags below. The Contributor(s) are listed with the author-tags below without the marker for being an initial author. All Rights Reserved. --> <snippet language="OOBasic" application="Office"> <keywords> <keyword>com.sun.star.util.TextSearch</keyword> <keyword>com.sun.star.util.SearchOptions</keyword> <keyword>com.sun.star.util.SearchResult</keyword> <keyword>Quicksort</keyword> <keyword>Regular Expression</keyword> <keyword>RegExp</keyword> </keywords> <authors> <author id="paolomantovani" initial="true" email="[EMAIL PROTECTED]">Paolo Mantovani</author> </authors> <question heading="Simple natural sort algorithm">Sorting strings which contains numbers <p>I need to sort a list of file names that are the output of a mail merge.</p> <p>Each filename has the same prefix and a serial ending number.</p> <p>I tried to sort filenames with a quicksort algorithm, but the output appear like this:</p> <ul> <li>myfileprefix0.sxw</li> <li>myfileprefix1.sxw</li> <li>myfileprefix10.sxw</li> <li>myfileprefix11.sxw</li> </ul> <p> ...</p> <ul> <li>myfileprefix2.sxw</li> <li>myfileprefix20.sxw</li> <li>myfileprefix21.sxw</li> </ul> <p> ...</p> <p> ...</p> <p></p> <p>and so on...</p> <p>The result is formally correct, because files are sorted alphabetically, but an human wouldn't never sorted the list in that way.</p> </question> <answer> <p>This is a common problem and was already solved in many languages and contexts.</p> <p>What you need is called "natural" sort algorithm.</p> <p>see for example</p> <p>http://www.naturalordersort.org/</p> <p>or search with google for "natural sort algorithm"</p> <p>The following code contains a very simple implementation of this algorithm.</p> <p>The main work is performed by the function NaturalStrComp() </p> <p>This function calls RegExpStrReplace() to make its work.</p> <p>This one is based on the service css.util.TextSearch</p> <p>Performances are slow.</p> <p>StarBasic language is not fast as compiled ones and further it does not offer advanced functions for string crunching natively.</p> <p>Anyway, maybe some little optimizations are stll possible.</p> <p></p> <p>To use this code:</p> <p>copy it into an empty module and run the Sub Test() on top of the module to see an example.</p> <listing>REM ***** BASIC ***** '_______________________________________________________________________________________________ ' ' String sorting functions '_______________________________________________________________________________________________ ' ' ' Declarations: Option Explicit '_______________________________________________________________________________________________ Sub Test Dim List(20) Dim I As Integer Dim iCaseSens As Integer Dim iNatural As Integer Dim iReversed As Integer 'generate an unsorted list of file names For I = 0 To 20 List(I) = "mytestfile" & Int(199*rnd) & ".sxw" Next I msgbox "not sorted" msgbox Join( list(), chr(10)) msgbox "alphabetically sorted" ExtendedSortStringList(list()) msgbox Join( list(), chr(10)) msgbox """natural"" sorted (slow)" 'set up some flags iCaseSens = 0 iNatural = 1 iReversed = 0 ExtendedSortStringList(list(), iCaseSens, iNatural, iReversed) msgbox Join( list(), chr(10)) End Sub '_______________________________________________________________________________________________ Sub ExtendedSortStringList(Data(), _ Optional iCaseSensitive As Integer, _ Optional iNaturalSort As Integer, _ Optional iReversed As Integer) Dim iCaseSens As Integer Dim iNatural As Integer Dim iRev As Integer Dim iMin As Long Dim iMax As Long If Not IsMissing(iCaseSensitive) Then iCaseSens = iCaseSensitive End If If Not IsMissing(iNaturalSort) Then iNatural = iNaturalSort End If If Not IsMissing(iReversed) Then iRev = iReversed End If iMin = LBound(Data()) iMax = UBound(Data()) If iNatural = 0 Then If iRev = 0 Then SimpleStringSort(Data(), iCaseSens, iMin, iMax) Else SimpleStringSortReversed(Data(), iCaseSens, iMin, iMax) End If Else If iRev = 0 Then NaturalStringSort(Data(), iCaseSens, iMin, iMax) Else NaturalStringSortReversed(Data(), iCaseSens, iMin, iMax) End If End if End Sub '_______________________________________________________________________________________________ Sub SimpleStringSort(Data(), iCaseSensitive As Integer, ByVal Lower As Long, ByVal Upper As Long) Dim Right As Long Dim Left As Long If Lower < Upper Then Left = Lower + 1 Right = Upper + 1 Do While Left < Right If StrComp(Data(Left), Data(Lower), iCaseSensitive) <= 0 Then Left = Left + 1 Else Right = Right - 1 SwapElements(Data(), Left, Right) End If Loop Left = Left - 1 SwapElements(Data(), Lower, Left) SimpleStringSort(Data(), iCaseSensitive, Lower, Left - 1) SimpleStringSort(Data(), iCaseSensitive, Right, Upper) End If End Sub '_______________________________________________________________________________________________ Sub SimpleStringSortReversed(Data(), iCaseSensitive As Integer, ByVal Lower As Long, ByVal Upper As Long) Dim Right As Long Dim Left As Long If Lower < Upper Then Left = Lower + 1 Right = Upper + 1 Do While Left < Right If StrComp(Data(Left), Data(Lower), iCaseSensitive) = 1 Then Left = Left + 1 Else Right = Right - 1 SwapElements(Data(), Left, Right) End If Loop Left = Left - 1 SwapElements(Data(), Lower, Left) SimpleStringSortReversed(Data(), iCaseSensitive, Lower, Left - 1) SimpleStringSortReversed(Data(), iCaseSensitive, Right, Upper) End If End Sub '_______________________________________________________________________________________________ Sub NaturalStringSort(Data(), iCaseSensitive As Integer, ByVal Lower As Long, ByVal Upper As Long) Dim Right As Long Dim Left As Long If Lower < Upper Then Left = Lower + 1 Right = Upper + 1 Do While Left < Right If NaturalStrComp(Data(Left), Data(Lower), iCaseSensitive) <= 0 Then Left = Left + 1 Else Right = Right - 1 SwapElements(Data(), Left, Right) End If Loop Left = Left - 1 SwapElements(Data(), Lower, Left) NaturalStringSort(Data(), iCaseSensitive, Lower, Left - 1) NaturalStringSort(Data(), iCaseSensitive, Right, Upper) End If End Sub '_______________________________________________________________________________________________ Sub NaturalStringSortReversed(Data(), iCaseSensitive As Integer, ByVal Lower As Long, ByVal Upper As Long) Dim Right As Long Dim Left As Long If Lower < Upper Then Left = Lower + 1 Right = Upper + 1 Do While Left < Right If NaturalStrComp(Data(Left), Data(Lower), iCaseSensitive) = 1 Then Left = Left + 1 Else Right = Right - 1 SwapElements(Data(), Left, Right) End If Loop Left = Left - 1 SwapElements(Data(), Lower, Left) NaturalStringSortReversed(Data(), iCaseSensitive, Lower, Left - 1) NaturalStringSortReversed(Data(), iCaseSensitive, Right, Upper) End If End Sub '_______________________________________________________________________________________________ Function NaturalStrComp(sText1 As String, sText2 As String, iCaseSensitive As Integer) As Integer Dim sLocText1 As String Dim sLocText2 As String Dim mNumList1() Dim mNumList2() Dim iNum1 As Integer Dim iNum2 As Integer Dim I As Integer Dim iResult As Integer 'replace numbers with a placeholder sLocText1 = RegExpStrReplace(sText1, "[0-9]+", "#") sLocText2 = RegExpStrReplace(sText2, "[0-9]+", "#") 'if the two strings are equal we will evaluate the numbers If StrComp(sLocText1, sLocText2, iCaseSensitive) = 0 Then 'estract numbers from strings sLocText1 = RegExpStrReplace(sText1, "[^0-9]+",Chr(0)) sLocText2 = RegExpStrReplace(sText2, "[^0-9]+",Chr(0)) mNumList1() = Split(sLocText1,Chr(0)) mNumList2() = Split(sLocText2,Chr(0)) 'note that the two lists have the same number of elements For I = LBound(mNumList1()) To UBound(mNumList1()) iNum1 = mNumList1(I) iNum2 = mNumList2(I) Select Case iNum1 Case Is = iNum2 iResult = 0 Case Is > iNum2 iResult = 1 Exit For Case Is < iNum2 iResult = -1 Exit For End Select Next I Else 'evaluate strings iResult = StrComp(sText1, sText2, iCaseSensitive) End if NaturalStrComp = iResult End Function '_______________________________________________________________________________________________ Function RegExpStrReplace(ByVal sText As String, sSearchRegExp As String, sReplace As String) As String 'Notice: 'in general, this function should allow to set the flag "CaseSensitive". 'Since in this implementation this flag would never be used, in order to preserve 'performances the flag and relative code has been removed Static oTextSearch As Object Static oSearchOpts As Object Dim oResult As Object Dim iStartPos As Integer Dim iTextLen As Integer Dim iMatchStartPos As Integer Dim iMatchEndPos As Integer Dim iMatchLen As Integer Dim sMatchString As String Dim sLocText As String 'initialize the service only if needed If IsNull(oTextSearch) Then oTextSearch = CreateUnoService("[EMAIL PROTECTED] com.sun.star.util.TextSearch}") oSearchOpts = CreateUnoStruct("[EMAIL PROTECTED] com.sun.star.util.SearchOptions}") With oSearchOpts .searchFlag = [EMAIL PROTECTED] com.sun.star.util.SearchFlags:REG}_EXTENDED .algorithmType = [EMAIL PROTECTED] com.sun.star.util.SearchAlgorithms:REGEXP} End With End if oSearchOpts.searchString = sSearchRegExp oTextSearch.setOptions(oSearchOpts) 'do a first search oResult = oTextSearch.searchForward(sText, iStartPos, Len(sText) ) 'iterate the S&R for all occourrences found Do While oResult.subRegExpressions > 0 iMatchStartPos = oResult.startOffset(0) iMatchEndPos = oResult.endOffset(0) iMatchLen = iMatchEndPos - iMatchStartPos 'replace the match in the copy of the original text sLocText = Left(sText, iMatchStartPos) sLocText = sLocText & sReplace sLocText = sLocText & Right(sText, Len(sText) - iMatchEndPos) sText = sLocText 'next search will start after the current replaced match iStartPos = iMatchStartPos + Len(sReplace) 'do the next search oResult = oTextSearch.searchForward(sText, iStartPos, Len(sText) ) Loop RegExpStrReplace = sText End Function '_______________________________________________________________________________________________ Sub SwapElements(Data(), I As Long, J As Long) Dim vTemp As Variant vTemp = Data(I) Data(I) = Data(J) Data(J) = vTemp End Sub</listing> </answer> <versions> <version number="1.1.x" status="tested"/> <version number="2.0.x" status="tested"/> </versions> <operating-systems> <operating-system name="All"/> </operating-systems> <changelog> <change author-id="paolomantovani" date="2005-06-16">Initial version</change> </changelog> </snippet>
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
