I have an XML file that's perhaps 1000 lines long.

Within that document are perhaps 20 or 30 lines of text that show paths to 
files. Every file path appears twice. The first appearance looks like this:

<key>affectedItem</key>
<string>/Volumes/Macintosh HD 1/Applications/Microsoft Office 
2011/Office/Media/Templates/Print Layout View/Labels/Label 
Wizard.plugin/Contents/_CodeSignature</string>
<key>auxiliaryActionSelector</key>

The second appearance looks like this:
<key>affectedItem</key>
<string>/Volumes/Macintosh HD 1/Applications/Microsoft Office 
2011/Office/Media/Templates/Print Layout View/Labels/Label 
Wizard.plugin/Contents/_CodeSignature</string>
<key>cccErrorCode</key>

I want to delete everything in that document except the paths themselves, 
and I only need 1 instance of each path, starting with the name of the hard 
drive. The output should look like this:
Macintosh HD/Library/Preferences/Parallels/Problem 
Reports/PrlProblemReport-2012.05.23-23.35.08.860/InstalledSoftware.txt

I've worked on this for about 45 minutes, and can't figure it out.

I'm thinking that I need to match this pattern (written in pseudocode):

Any sequence of characters that ends with [/Volumes/Macintosh HD 1/ *+* .* 
*+ *\r*]* *+* <key>auxiliaryActionSelector</key>"

Then replace it with Macintosh HD/ *+* .* *+ *\r*]*

Any suggestions?

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or would like to report a problem, please email
"[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>

--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Reply via email to