On Dec 17, Kredler Stefan said: >I want to transform a sorted list into a compact list (if this is the >correct term). >e.g. my list is > >9,11,12,13,14,23,25,26,27,50 and want to have something like >9,11-14,23,25-27,50 (to pass on to a unix-command).
I love this. :) There are two approaches. One of them is the computer science way, and the other is the diehard-regex-hacker way. I'll show you both (the regex way requires Perl 5.6, by the way). Both methods use the same logic, but the regex way is far more compact. The computer science way is to split the string into a list of numbers, and then go through the list one number at a time. Keep track of the first number of the potential "range", as well as the last number seen. If the current number is one more than the last number, then keep going. Once you get a number that is NOT one more than the previous, you generate a range. Here's the code: # input "N1,N2,N3,N4..." # assumes the numbers are sorted!!! sub list2range { my $list = shift; my ($first, $last); my @output; # remove the first number in the list... # and set $first and $last to it $list =~ s/^(\d+),?// and $first = $last = $1; for (split /,/ => $list) { # next number in the range if ($_ == $last + 1) { $last = $_; next; # get the next number } # otherwise, we're done with the current range # don't use "3-3", just use "3" if ($first == $last) { push @output, $first; } else { push @output, "$first-$last"; } # now $_ is the first number in a new range $first = $last = $_; } return join "," => @output; } There we go. Now for the regex version. sub list2range { my $list = shift; $list =~ s< \b # boundary (see below for why) (\d+) # capture to $1 a number (?: # this chunk... , # a comma ( # capture to $2 (and indirectly to $+) (??{ 1 + $+ }) # one more than the last number we matched ) \b # followed by a boundary )+ # ... one or more times ><$1-$+>gx; # replace with "first-last" return $list; } Whew. Much shorter, no? ;) Just a bit of explanation... the last captured part of a regex is stored in $+. The first time we execute the (??{ 1 + $+ }) part of the regex, $+ refers to $1, which is the first number we've matched. However, you'll notice that (??{ ... }) is inside parentheses itself! That means that once it has matched, $2 is set to whatever it matched, and that means that $+ is set to that value as well, so from then on, $+ is referring to what ((??{ 1 + $+ })) matched the last time. I've got \b (word boundaries) in here to make sure we're matching the WHOLE number. We don't want a false positive from something like 10,11,12,135 Since 135 *starts* with 13, we don't want Perl to think the range is "10-135". So we ensure that the first number is preceded by a boundary, and the last number is followed by boundary, so that we're not skipping digits. Which method should you use? Eh, it's up to you. I like the regex approach, because it shows how useful the dynamic regex construct (the (??{ ... }) thing) can be. It reduced a sizeable algorithm down to one simple regex: s/\b(\d+)(?:,((??{1+$+})\n))+/$1-$+/g; But if you'd feel better using the more "general" approach, then by all means, do. -- Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/ ** Look for "Regular Expressions in Perl" published by Manning, in 2002 ** <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]