Hi all,

During a recent job I have had to develop some functions for string sorting. 
Some of these contains a very simple implementation of the "natural" sort 
algorithm.
This implementation uses the service css.util.TextSearch to perform special 
string comparisions.
Although the code works well (but it is not very tested!!)  and fits my specs, 
the performance remains quite slow and I don't see any (easy) way to improve 
it.
(but what appears difficult to my eyes maybe easy for others :-)

Finally, I thought to submit the thing as a code-snippet, in the hope it will 
be useful to someone.


best regards
Paolo M.
<?xml version="1.0"?>
<!--
$RCSfile: $
last change: $Revision: $ $Author: $ $Date: $

(c)2003 by the copyright holders listed with the author-tags.
If no explicit copyright holder is mentioned with a certain author,
the author him-/herself is the copyright holder. All rights reserved.

Public Documentation License Notice:

The contents of this Documentation are subject to the
Public Documentation License Version 1.0 (the "License");
you may only use this Documentation if you comply with
the terms of this License. A copy of the License is
available at http://www.openoffice.org/licenses/PDL.html

The Original Documentation can be found in the CVS archives
of openoffice.org at the place specified by RCSfile: in this header.

The Initial Writer(s) of the Original Documentation are listed
with the author-tags below.

The Contributor(s) are listed with the author-tags below
without the marker for being an initial author.

All Rights Reserved.
-->

<snippet language="OOBasic" application="Office">

<keywords>
        <keyword>com.sun.star.util.TextSearch</keyword>
        <keyword>com.sun.star.util.SearchOptions</keyword>
        <keyword>com.sun.star.util.SearchResult</keyword>
        <keyword>Quicksort</keyword>
        <keyword>Regular Expression</keyword>
        <keyword>RegExp</keyword>
</keywords>

<authors>
        <author id="paolomantovani" initial="true" email="[EMAIL 
PROTECTED]">Paolo Mantovani</author>
</authors>

<question heading="Simple natural sort algorithm">Sorting strings which 
contains numbers
<p>I need to sort a list of file names that are the output of a mail merge.</p>
<p>Each filename has the same prefix and a serial ending number.</p>
<p>I tried to sort filenames with a quicksort algorithm, but the output appear 
like this:</p>
<ul>
        <li>myfileprefix0.sxw</li>
        <li>myfileprefix1.sxw</li>
        <li>myfileprefix10.sxw</li>
        <li>myfileprefix11.sxw</li>
</ul>
<p>  ...</p>
<ul>
        <li>myfileprefix2.sxw</li>
        <li>myfileprefix20.sxw</li>
        <li>myfileprefix21.sxw</li>
</ul>
<p>  ...</p>
<p>  ...</p>
<p></p>
<p>and so on...</p>
<p>The result is formally correct, because files are sorted alphabetically, but 
an human wouldn&apos;t never sorted the list in that way.</p>
</question>

<answer>
<p>This is a common problem and was already solved in many languages and 
contexts.</p>
<p>What you need is called &quot;natural&quot; sort algorithm.</p>
<p>see for example</p>
<p>http://www.naturalordersort.org/</p>
<p>or search with google for &quot;natural sort algorithm&quot;</p>
<p>The following code contains a very simple implementation of this 
algorithm.</p>
<p>The main work  is performed by the function NaturalStrComp() </p>
<p>This function calls RegExpStrReplace() to make its work.</p>
<p>This one is based on the service css.util.TextSearch</p>
<p>Performances are slow.</p>
<p>StarBasic language is not fast as compiled ones and further it does not 
offer advanced functions for string crunching natively.</p>
<p>Anyway, maybe some little optimizations are stll possible.</p>
<p></p>
<p>To use this code:</p>
<p>copy it into an empty module and run the Sub Test() on top of the module to 
see an example.</p>
<listing>REM  *****  BASIC  *****
&apos;_______________________________________________________________________________________________
&apos;  
&apos;  String sorting functions
&apos;_______________________________________________________________________________________________
&apos;
&apos;
&apos; Declarations:
Option Explicit


&apos;_______________________________________________________________________________________________
Sub Test
Dim List(20)
Dim I As Integer
Dim iCaseSens As Integer
Dim iNatural As Integer
Dim iReversed As Integer
        
        &apos;generate an unsorted list of file names
        For I = 0 To 20
                List(I) = &quot;mytestfile&quot; &amp; Int(199*rnd) &amp; 
&quot;.sxw&quot;
        Next I
        

        msgbox &quot;not sorted&quot;
        msgbox Join( list(), chr(10))
        
        msgbox &quot;alphabetically sorted&quot;
        ExtendedSortStringList(list())
        msgbox Join( list(), chr(10))
        
        msgbox &quot;&quot;&quot;natural&quot;&quot; sorted (slow)&quot;
        &apos;set up some flags
        iCaseSens = 0
        iNatural = 1
        iReversed = 0
        ExtendedSortStringList(list(), iCaseSens, iNatural, iReversed)
        msgbox Join( list(), chr(10))

End Sub


&apos;_______________________________________________________________________________________________
Sub ExtendedSortStringList(Data(), _
                                                        Optional iCaseSensitive 
As Integer, _
                                                        Optional iNaturalSort 
As Integer, _
                                                        Optional iReversed As 
Integer)

Dim iCaseSens As Integer
Dim iNatural As Integer
Dim iRev As Integer
Dim iMin As Long
Dim iMax As Long

        If Not IsMissing(iCaseSensitive) Then
                iCaseSens = iCaseSensitive
        End If

        If Not IsMissing(iNaturalSort) Then
                iNatural = iNaturalSort
        End If
        
        If Not IsMissing(iReversed) Then
                iRev = iReversed
        End If
        
        iMin = LBound(Data())
        iMax = UBound(Data())
        
        If iNatural = 0 Then
                If iRev = 0 Then
                        SimpleStringSort(Data(), iCaseSens, iMin, iMax)
                Else
                        SimpleStringSortReversed(Data(), iCaseSens, iMin, iMax) 
                End If
        Else
                If iRev = 0 Then
                        NaturalStringSort(Data(), iCaseSens, iMin, iMax)
                Else
                        NaturalStringSortReversed(Data(), iCaseSens, iMin, iMax)
                End If
        End if
        
End Sub


&apos;_______________________________________________________________________________________________
Sub SimpleStringSort(Data(), iCaseSensitive As Integer, ByVal Lower As Long, 
ByVal Upper As Long)
Dim Right As Long
Dim Left  As Long

        If Lower &lt; Upper Then
                Left = Lower + 1
                Right = Upper + 1
                
                Do While Left &lt; Right
                        If StrComp(Data(Left), Data(Lower), iCaseSensitive) 
&lt;= 0 Then
                                Left = Left + 1
                        Else
                                Right = Right - 1
                                SwapElements(Data(), Left, Right)
                        End If
                Loop
                
                Left = Left - 1
                SwapElements(Data(), Lower, Left)
                SimpleStringSort(Data(), iCaseSensitive, Lower, Left - 1)
                SimpleStringSort(Data(), iCaseSensitive, Right, Upper)
        End If
End Sub


&apos;_______________________________________________________________________________________________
Sub SimpleStringSortReversed(Data(), iCaseSensitive As Integer, ByVal Lower As 
Long, ByVal Upper As Long)
Dim Right As Long
Dim Left  As Long

        If Lower &lt; Upper Then
                Left = Lower + 1
                Right = Upper + 1
                
                Do While Left &lt; Right
                        If StrComp(Data(Left), Data(Lower), iCaseSensitive) = 1 
Then
                                Left = Left + 1
                        Else
                                Right = Right - 1
                                SwapElements(Data(), Left, Right)
                        End If
                Loop
                
                Left = Left - 1
                SwapElements(Data(), Lower, Left)
                SimpleStringSortReversed(Data(), iCaseSensitive, Lower, Left - 
1)
                SimpleStringSortReversed(Data(), iCaseSensitive, Right, Upper)
        End If
End Sub


&apos;_______________________________________________________________________________________________
Sub NaturalStringSort(Data(), iCaseSensitive As Integer, ByVal Lower As Long, 
ByVal Upper As Long)
Dim Right As Long
Dim Left  As Long

        If Lower &lt; Upper Then
                Left = Lower + 1
                Right = Upper + 1
                
                Do While Left &lt; Right
                        If NaturalStrComp(Data(Left), Data(Lower), 
iCaseSensitive) &lt;= 0 Then
                                Left = Left + 1
                        Else
                                Right = Right - 1
                                SwapElements(Data(), Left, Right)
                        End If
                Loop
                
                Left = Left - 1
                SwapElements(Data(), Lower, Left)
                NaturalStringSort(Data(), iCaseSensitive, Lower, Left - 1)
                NaturalStringSort(Data(), iCaseSensitive, Right, Upper)
        End If
End Sub


&apos;_______________________________________________________________________________________________
Sub NaturalStringSortReversed(Data(), iCaseSensitive As Integer, ByVal Lower As 
Long, ByVal Upper As Long)
Dim Right As Long
Dim Left  As Long

        If Lower &lt; Upper Then
                Left = Lower + 1
                Right = Upper + 1
                
                Do While Left &lt; Right
                        If NaturalStrComp(Data(Left), Data(Lower), 
iCaseSensitive) = 1 Then
                                Left = Left + 1
                        Else
                                Right = Right - 1
                                SwapElements(Data(), Left, Right)
                        End If
                Loop
                
                Left = Left - 1
                SwapElements(Data(), Lower, Left)
                NaturalStringSortReversed(Data(), iCaseSensitive, Lower, Left - 
1)
                NaturalStringSortReversed(Data(), iCaseSensitive, Right, Upper)
        End If
End Sub


&apos;_______________________________________________________________________________________________
Function NaturalStrComp(sText1 As String, sText2 As String, iCaseSensitive As 
Integer) As Integer
                                                
Dim sLocText1 As String
Dim sLocText2 As String
Dim mNumList1()
Dim mNumList2()
Dim iNum1 As Integer
Dim iNum2 As Integer
Dim I As Integer
Dim iResult As Integer
        
        &apos;replace numbers with a placeholder
        sLocText1 = RegExpStrReplace(sText1, &quot;[0-9]+&quot;, &quot;#&quot;)
        sLocText2 = RegExpStrReplace(sText2, &quot;[0-9]+&quot;, &quot;#&quot;) 
        
        &apos;if the two strings are equal we will evaluate the numbers
        If StrComp(sLocText1, sLocText2, iCaseSensitive) = 0 Then
                &apos;estract numbers from strings
                sLocText1 = RegExpStrReplace(sText1, &quot;[^0-9]+&quot;,Chr(0))
                sLocText2 = RegExpStrReplace(sText2, &quot;[^0-9]+&quot;,Chr(0))
                
                mNumList1() = Split(sLocText1,Chr(0))
                mNumList2() = Split(sLocText2,Chr(0))
                
                &apos;note that the two lists have the same number of elements
                For I = LBound(mNumList1()) To UBound(mNumList1())
                        iNum1 = mNumList1(I)
                        iNum2 = mNumList2(I)
                        Select Case iNum1
                                Case Is = iNum2
                                        iResult = 0
                                        
                                Case Is &gt; iNum2
                                        iResult = 1
                                        Exit For
                                        
                                Case Is &lt; iNum2
                                        iResult = -1
                                        Exit For
                                        
                        End Select
                Next I
                
        Else
                &apos;evaluate strings
                iResult = StrComp(sText1, sText2, iCaseSensitive)
        End if
        
        NaturalStrComp = iResult
        
End Function


&apos;_______________________________________________________________________________________________
Function RegExpStrReplace(ByVal sText As String, sSearchRegExp As String, 
sReplace As String) As String
&apos;Notice:
&apos;in general, this function should allow to set the flag 
&quot;CaseSensitive&quot;. 
&apos;Since in this implementation this flag would never be  used, in order to 
preserve
&apos;performances the flag and relative code has been removed

Static oTextSearch As Object
Static oSearchOpts As Object

Dim oResult As Object

Dim iStartPos As Integer
Dim iTextLen As Integer
Dim iMatchStartPos As Integer
Dim iMatchEndPos As Integer
Dim iMatchLen As Integer

Dim sMatchString As String
Dim sLocText As String
        
        &apos;initialize the service only if needed
        If IsNull(oTextSearch) Then
                oTextSearch = CreateUnoService(&quot;[EMAIL PROTECTED] 
com.sun.star.util.TextSearch}&quot;)
                oSearchOpts = CreateUnoStruct(&quot;[EMAIL PROTECTED] 
com.sun.star.util.SearchOptions}&quot;)
                With oSearchOpts
                        .searchFlag = [EMAIL PROTECTED] 
com.sun.star.util.SearchFlags:REG}_EXTENDED
                        .algorithmType = [EMAIL PROTECTED] 
com.sun.star.util.SearchAlgorithms:REGEXP}
                End With
        End if
        
        oSearchOpts.searchString = sSearchRegExp
        oTextSearch.setOptions(oSearchOpts)
        
        &apos;do a first search
        oResult = oTextSearch.searchForward(sText, iStartPos, Len(sText) )

        &apos;iterate the S&amp;R for all occourrences found    
        Do While oResult.subRegExpressions &gt; 0
                
                iMatchStartPos = oResult.startOffset(0)
                iMatchEndPos = oResult.endOffset(0)
                iMatchLen = iMatchEndPos - iMatchStartPos
                
                &apos;replace the match in the copy of the original text
                sLocText = Left(sText, iMatchStartPos)
                sLocText = sLocText &amp; sReplace
                sLocText = sLocText &amp; Right(sText, Len(sText) - 
iMatchEndPos)
                sText = sLocText
                
                &apos;next search will start after the current replaced match
                iStartPos = iMatchStartPos + Len(sReplace)
                
                &apos;do the next search
                oResult = oTextSearch.searchForward(sText, iStartPos, 
Len(sText) )
        Loop
                
        RegExpStrReplace = sText

End Function


&apos;_______________________________________________________________________________________________
Sub SwapElements(Data(), I As Long, J As Long)
Dim vTemp As Variant

        vTemp = Data(I)
        Data(I) = Data(J)
        Data(J) = vTemp
        
End Sub</listing>
</answer>

<versions>
        <version number="1.1.x" status="tested"/>
        <version number="2.0.x" status="tested"/>
</versions>

<operating-systems>
<operating-system name="All"/>
</operating-systems>

<changelog>
        <change author-id="paolomantovani" date="2005-06-16">Initial 
version</change>
</changelog>

</snippet>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to