Using LZW or similar compression is likely to give you substantially better
file compression, if that's what you're after. Of course you'd have to
re-expand it to use it.
The killer here I would guess is the use of [NSArray indexOfObject:] - it has
to perform a string-by-string linear search until it finds a match. Instead, if
you keep each word in a NSMutableSet (which uses hashing internally) you can
test for membership in constant time.
Also, using - componentsSeparatedByString to get an array of words is simple,
but going to be a killer on time and space. Instead you could parse and index
the text as you go using NSScanner so that a n array of words is not made - as
you scan each word, add it to the set (sets automatically only add a single
instance). At the end of the scan, the set contains all unique words. I would
suggest returning that set rather than putting it back together as a string -
as a set it will be more useful for membership testing and you can easily
convert that to a string or array as you need.
--Graham
On 10/02/2011, at 2:04 PM, Brad Stone wrote:
> I made this code to remove any duplicate words from a large group of text.
> The result is stored in an index file so the text doesn't need to make sense.
> I'm removing the duplicates to save space in the index file. I was
> wondering if anyone had a suggestion for a more efficient way to
> accomplishing this. I'm guessing the separations and joins are taking up
> memory and slowing things down (even though I'm not positive about that).
> Using this code reduced the index file size form 4.7MB to 2.7MB.
>
> Thanks
>
> - (NSString *)abstractText:(NSString *)srcString {
> NSMutableArray *resultArray = [[NSMutableArray alloc] init];
> NSArray *textArray = [srcString componentsSeparatedByString:@" "];
> for (NSString *s in textArray) {
>
> s = [s stringByTrimmingCharactersInSet:[NSCharacterSet
> alphanumericCharacterSet]];
> s = [s lowercaseString];
>
> if ([resultArray indexOfObject:s] == NSNotFound) {
> [resultArray addObject:s];
> }
> }
>
> NSString *resultString = nil;
> if ([resultArray count] > 0) {
> resultString = [resultArray componentsJoinedByString:@" "];
> } else {
> resultString = srcString;
> }
> return resultString;
> }_______________________________________________
>
> Cocoa-dev mailing list ([email protected])
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/cocoa-dev/graham.cox%40bigpond.com
>
> This email sent to [email protected]
_______________________________________________
Cocoa-dev mailing list ([email protected])
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com
This email sent to [email protected]