On 3/28/07, Terrence Brannon <[EMAIL PROTECTED]> wrote:
my original question was about choosing the split delimiter... how
would you split
"terrence234brannon" by the regular expression \d+ for instance?
Why would I want to do that? Well, anyways, here's one
approach:
(a: -.~ e.&'0123456789' <;._1 ])@('0'&,)'terrence234brannon'
If I wanted a general operation which splits using regular
expressions, I would probably define one. (Since Dan
Bron has provided one of those, I'll skip this part.)
More likely, though, I would use ;:
splitOnNums=: (0;(0 10#:10*".;._2(0 :0));a.e.'0123456789')&;:
1.1 0 NB. state 0: waiting for a non digit
1 0.3 NB. state 1: waiting for a digit
)
splitOnNums 'terrence234brannon'
+--------+-------+
|terrence|brannon|
+--------+-------+
For defining word splitting engines, I have found ;: to be far
more powerful than regular expressions, though both
require some serious study before you can get much
done with them.
not only that, but I get a truncation when I try this unless the
string starts with a space. ...
Right, cut (;.) expects a leading (or trailing) delimiter. If
you can't safely assume that your data already includes
one, you should make that assumption safe before
you use cut.
Nouns defined by 0 :0 are a case where you can safely
assume that they end with a delimiter.
Finally most people would trim the leading and trailing whitespace
from the string before splitting it...
I'll agree with you that ;: is more natural for the "split on
whitespace" operation. By the way, here's a general
purpose 'split on any of an arbitrary characters' verb, which
defaults to splitting only on characters
splitCode=:0 10#:0 10#:110 103
split=:' '&$: :((0;splitCode;a.e.[);:])
' ' split 'this is a test'
+----+--+-+----+
|this|is|a|test|
+----+--+-+----+
That said, J is all about optimizing for the simple case, and
for the most part, you will get a lot further focussing on simple
cases (regular data) than trying to figure out how to squeeze
the last little bit of complexity into your system.
Most code, nowadays, is not written to solve problems, it's
written to deal with complexities which slip into requirements
when people aren't paying attention. One of J's strong points
is that it makes those implicit complexities glaringly obvious.
You can deal with them, but J really encourages you to at
the very least factor complexity handling and problem solving
into separate parts of your program.
--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm