Re: [CODE4LIB] Regex Question

Jason R Peak Tue, 07 Jul 2015 09:58:02 -0700

In the case of xml, I think xpath is the simpler tool.

---- Brian Zelip wrote ----

Hi Matt.

Re: finding words in all caps, yes it's possible. See this SO answer to
help: http://stackoverflow.com/a/4255225/2145103

Re: italics, my hunch is that you could do so if you got hold of the xml
behind the word doc, which I'd assume would have something like an
`<italic>` tags or attribute values of `italic` in the markup.

good luck!

Brian Zelip

---

Emerging Technologies Librarian

Health Sciences & Human Services Library

University of Maryland, Baltimore

[email protected]

410-706-8865

On Tue, Jul 7, 2015 at 11:56 AM, Matt Sherman <[email protected]>
wrote:

> Hi all,
>
> I am working my way through teaching myself regex to parse an annotated
> bibliography docx file and had a question as I can't seem to get a succinct
> answer from Google.  Is it possible to have regex find words, or in the
> case names, in displayed in all caps?  Also similarly is it possible to
> have regex find words, or in this case titles, that are italicized?  Given
> how the document is formatted doing both would be nice so that I could
> parse them into a table or or database, but I cannot find a clear answer on
> that, though I am very new to regex so it is probably jumping into the deep
> end on this.  Any answers are appreciated.
>
> Matt Sherman
>

Re: [CODE4LIB] Regex Question

Reply via email to