On Thursday, September 11, 2003, at 01:03 AM, R. Joseph Newton wrote:


James Edward Gray II wrote:


...

Address:<tab>2933<sp>Hummingbird<tab>St.<tab>City:<tab>Groton<tab>Sta te
:<tab>CT
Address:<sp>4321<tab>Sparrow<sp>Ave.<tab>City:<tab>Groton<tab>State:< ta
b>CT
. . .


What I want to do is get all of the data between Address: and City: and
strip the <tab> and replace with spaces. The only problem is that the
data between Address: and City: changes. What I want in the end is:

...


Address:<tab>2933<sp>Hummingbird<sp>St.<tab>City:<tab>Groton<tab>Stat e:
<tab>CT
Address:<tab.4321<sp>Sparrow<sp>Ave.<tab>City:<tab>Groton<tab>State:< ta
b>CT


(notice that any <tab> in the address itself is now a <sp> with <tab>
both
before and after the address.)

I know it involves using the s/// operator to both strip the tabs and
replace with <sp> but the problem is that it requires using an array
for
each address...and that is what is creating problems for me.

Thanks...

Hmm, let me think out loud a little.


I think I see a pattern, so let's first change all of them to spaces.
That's easy enough:

s/\t/ /g;

Now, if we switch all the spaces that are supposed to be tabs to tabs,
we're done, right.  I bet we can handle that.  What about:

s/ ([A-Za-z]:) /\t$1\t/g;
# and...
s/^([A-Za-z]:) /$1\t/;  # The first one is a special case

I don't think going on a character basis will work well here. For one thing.
specifiers such as Lane, Ave., or St. are generally capitalized. Also, some
street and city names are multiword, such as say Cherry Tree Lane, Des Moine,
etc

My corrections, posted yesterday morning, explain how I'm not searching through the addresses at all. If you missed it, my improved code was:


tr/\t/ /;
s/(^| )([A-Za-z]+:) / $1 ? "\t$2\t" : "$2\t" /eg;

There is a much more subject-specific solution contained in the
specification.  Tabs After and before the field identifiers.  In this
context, the field identiifers are known, so the job should be very
straightforward:

s/ (City)| (State)/ \t$1/g;
s/(Address) |(City) /$1\t/g;

Unfortunately, your code doesn't seem to work. I believe you meant:


s/ City:| State:/\t$1/g;
s/Address: |City: |State: /$1\t/g;

Never have to even touch any obscure character-based, and error-prone,
differentiations here.

Well, as this is Perl, TMTOWTDI. However, to be honest, I much prefer my own solution. Let me tell you why.


Essentially what we're talking about here is a colon delimited file. By treating it as such, we get a lot more flexibility, I think.

For example, I guessed there was likely a Zip: field we weren't shown in the sample. My solution handles that as well. Yours could be made to by changing two lines of code.

There may be 10 more fields, or even 100, left out of the sample to keep it simple and short. That's adding up to a lot of changes and those are going to be some pretty long Regular Expressions you'll have.

What if they want to add a field someday, to track Name: say? Should they have to go to the programmer every time? It's well known that we can't see the future, but I think we can code for some reasonable growth in mind.

Again though, TMTOWTDI, and you have definitely shown another way to attack the problem.

James


-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to