From: Bruce Van Allen <[EMAIL PROTECTED]>
Date: March 2, 2006 10:39:52 AM EST
Subject: Re: Help with conditional grep pattern


On 3/1/06 Joseph Hourcle wrote:
On Mar 1, 2006, at 8:29 PM, Bruce Van Allen wrote:
On 3/1/06 Joseph Hourcle wrote:
Yes.  Yes it is.  And there's the question of what to do with
syntactically incorrect items:

$_SESSION['translate"] ...

I'd probably use the following, because I'm paranoid:

Find: \$_SESSION\[(['"])translate\1\]->it\((['"])(.*?)\2
                                  ^^
Wait, Joe, did you mean that? This _requires_ that the quote mark
be the same before and after translate, whereas Ron's didn't.

Yep.
Hey, that's cool!


 I'd prefer that it not get syntactically incorrect items. (maybe
I'm just strange like that)
99.9% of the time I'd agree with you 100% on this statement ;)


Not strange at all. Just wasn't clear from your comment "And there's the
question..." whether you meant the question of _how_ to match or
_whether_ to match.

In my own biz, data integrity requires that what doesn't match gets
shunted aside for further testing, and if that doesn't clarify, direct
examination by my visual-bio-informatic sensors.

With, say, a million records, it's also a speed issue: if 99.5% of the
records match a very restrictive pattern, then the program only has to
do more time-consuming tests on 5,000.

The OP seemed to want to accept records with errors, by making
assumptions about what they should have been, and using the
matching/extracting process to also clean the data. Not inherently
wrong, but risky...
In this particular instance, I'm only dealing with result lists of 300-600 or so lines, so a post visual inspection is not only OK but has shown me a few inconsistencies I wanted to tidy up in the original files, such as legacy double quotes. It's a good point though. I don't think I actually have any with non-matching before and after quote marks. Though that confidence is mostly a result of having spent many hours visually inspecting/writing the files in the first place.

I think if I was adapting this to process larger numbers I might actually run a few searches to find single vs double and non-matched, etc. It could be useful for cleaning up files that are cobbled together bits and pieces by various coders and coding styles.

In the long run, I am currently using ...
\$_SESSION\[['"]translate['"]\]->it\(['"].*?['"]\)
... and collecting the results in a file using Matthias Steffens 'List Search Results' AppleScript. Then I am using a text factory to sort line, remove duplicates, and finally, tidy up the $_SESSION['translate']->it(' for the beginning of the line and any trailing '); from the end of the line.

regards,
verdon



--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>

Reply via email to