On 30/06/17 08:20, Fernando Cabral wrote: > 2017-06-30 7:44 GMT-03:00 Fabien Bodard <gambas...@gmail.com>: >> The best way is the nando one ... at least for gambas. >> As you have not to matter about what is the index value or the order, >> the walk ahead option is the better. >> Then Fernando ... for big, big things... I think you need to use a DB. >> Or a native language.... maybe a sqlite memory structure can be good. > Fabien, since this is a one-time only thing, I don't think I'd be > better off witha database. > Basically, I read a text file an then break it down into words, > sentences and paragraphs. > Next I count the items in each array (words, sentences paragraphs). > Array.count works wonderfully. > After that, have to eliminate the duplicate words (Array.words). But > in doing it, al also have to count how many times each word appeared. > Finally I sort the Array.Sentences and the Array.Paragraphs by size > (string.len()). The Array.WOrds are sorted by count + lenght. This is > all woring good. > So, my quest is for the fastest way do eliminate the words duplicates > while I count them. > For the time being, here is a working solution based on system' s sort > | uniq: > Here is one of the versions I have been using: > Exec ["/usr/bin/uniq", "Unsorted.txt", "Sorted.srt2"] Wait > Exec ["/usr/bin/uniq", "-ci", "SortedWords.srt2", SortedWords.srt3"] Wait > Exec ["/usr/bin/sort", "-bnr", SortedWords.srt3] To UniqWords > WordArray = split (UniqWords, "\n") > So, I end up with the result I want. It's effective. Now, it would be > more elegant If I could do the same with Gambas. Of course, the > sorting would be easy with the builting WordArray.sort (). > But how about te '"/usr/bin/uniq", "-ci" ...' part? > Regards > - fernando
Not tried, but for the duplicate count, what about iterating the word array copying each word to a keyed collection? For any new given word, the value (item) added would be 1 (integer), and the key would be UCase(word$). If an error happens, the handler would just Inc the keyed Item value. So (please note my syntax may be slightly off, especially in If Error): Public Function CountWordsInArray(sortedWordArray As String[]) As Collection Dim wordCount As New Collection Dim currentWord As String = Null For Each currentWord In sortedWordArray Try wordCount.Add(1, UCase$(currentWord)) If Error Then Inc wordCount(UCase$(currentWord)) Error.Clear 'Is this needed, or even correct? End If Next Return (wordCollection) End The returned collection should be sorted if the array was, and for each item you will have a numeric count as the item and the word as the key. Hope it helps, zxMarce. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user