Miguel, If you are use Automator and AppleScript, "System Events" has some useful XML parsing functionality.
See : https://developer.apple.com/library/archive/documentation/LanguagesUtilities/Conceptual/MacAutomationScriptingGuide/WorkwithXML.html Your input XML could be parsed with something like this: ```applescript tell application "System Events" tell XML file "~/Downloads/MA_NO_2021_05_011.xml" tell XML element "ejemplar" set vSecciones to every XML element whose name = "seccion" repeat with vSeccion in vSecciones tell vSeccion set vFichas to (every XML element whose name = "ficha") repeat with vFicha in vFichas tell vFicha set vCampos to (every XML element whose name = "campo") repeat with vCampo in vCampos tell vCampo set vClave to value of XML element "clave" set vValor to value of XML element "valor" log {clave:vClave, valor:vValor} end tell end repeat end tell end repeat end tell end repeat end tell end tell end tell ``` HTH, Jean Jourdain On Wednesday, May 19, 2021 at 3:30:03 PM UTC+2 Miguel Perez wrote: > Hi! > > I have a question regarding Automator and BBEdit. > > *Context:* > > On a daily basis I get an XML file. This file contains information about > some dossiers. I need to extract two elements from each dossier: (1) a URL > to download associated images, and (2) the dossier's name. > > Here's an example of such XML files: > https://www.icloud.com/iclouddrive/0uq0GozmzGusqe09WNAmUJuow#MA_NO_2021_05_011 > > Information in the file is in Spanish. > > *What I currently do:* > > I open the XML file on BBEdit and use Grep search to extract the > information. My Grep patterns are: > > To extract the URLs: > <clave><!\[CDATA\[Imagen\]\]></clave>\n\s+<valor><!\[CDATA\[(.+?)\] > > To extract the dossier's name: > <clave><!\[CDATA\[Denominación\]\]></clave>\n\s+<valor><!\[CDATA\[(.+?)\] > > I "replace" this Grep patterns with \1 to extract everything and works > like a charm. > > Both pieces of information get saved in their own plain text files. > > Then I download the images using some wget magic: > wget -E -H -k -K -p -e robots=off -P /users/USERNAME/TARGETFOLDER -i > /users/USERNAME/URLSLIST.txt > > As a final touch to my workflow, I run a batch rename on all files to add > the filetype *.GIF on all images and I'm ready to work. > > *What I want to do:* > > I want to further automate the process. > > Using Automator I created a Service (Quick action) that uses files as > input in Finder. > > What I have in mind is: > ➤ Run the service on the XML file > ➤ Read the contents of the file > ➤ Use BBEdit's Automator action called "Extract lines containing" in Grep > mode to extract the URLs > ➤ Use a shell script to download all images > ➤ Use a batch rename action to add the *.GIF filetype > > For the love of me I can't get "Extract lines containing" to work. I'm > using BBEdit 13.5.6 and Big Sur 11.3.1. > > Any ideas? > > Does anybody know if BBEdit's Automator actions still work? > -- This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "[email protected]" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit> --- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/c52a7b24-d919-4474-850e-0ad189301e2an%40googlegroups.com.
