I don't know how common it is, but have you looked at the `tree-seq` function in Clojure? This seems like a good use case for it.
Mark On Wed, Feb 2, 2022 at 3:22 PM lawrence...@gmail.com < lawrence.krub...@gmail.com> wrote: > Assume I've been cursed to scrape HTML. If I convert the pages to Hickory > I end up with a big mass of data which, sadly, lacks many "class" or "id"s > that would let me easily pick out the data I need. However, for the most > part, the only thing I really need off this page is the CVEs, which look > like this: > > CVE-2021-40539 > > I'm thinking I might write regex against the plain text of the page, but > I'm also curious, is it common to take something like Hiccup or Hickory or > a zipper and run regex through it? If yes, how is that done? > > A small part of the data looks like this: > > :content > [{:type :element, > :attrs > {:class "tip-intro", :style "font-size: 15px;"}, > :tag :p, > :content > [{:type :element, > :attrs nil, > :tag :em, > :content > ["This Joint Cybersecurity Advisory uses the MITRE > Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK®) framework, > Version 8. See the " > {:type :element, > :attrs > {:href > " > https://attack.mitre.org/versions/v9/techniques/enterprise/"}, > :tag :a, > :content ["ATT&CK for Enterprise"]} > " for referenced threat actor tactics and for > techniques."]}]} > "\n\n" > {:type :element, > :attrs nil, > :tag :p, > :content > ["This joint advisory is the result of analytic efforts > between the Federal Bureau of Investigation (FBI), United States Coast > Guard Cyber Command (CGCYBER), and the Cybersecurity and Infrastructure > Security Agency (CISA) to highlight the cyber threat associated with active > exploitation of a newly identified vulnerability (CVE-2021-40539) in > ManageEngine ADSelfService Plus—a self-service password management and > single sign-on solution."]} > "\n\n" > {:type :element, > :attrs nil, > :tag :p, > :content > ["CVE-2021-40539, rated critical by the Common > Vulnerability Scoring System (CVSS), is an authentication bypass > vulnerability affecting representational state transfer (REST) application > programming interface (API) URLs that could enable remote code execution. > The FBI, CISA, and CGCYBER assess that advanced persistent threat (APT) > cyber actors are likely among those exploiting the vulnerability. The > exploitation of ManageEngine ADSelfService Plus poses a serious risk to > critical infrastructure companies, U.S.-cleared defense contractors, > academic institutions, and other entities that use the software. Successful > exploitation of the vulnerability allows an attacker to place webshells, > which enable the adversary to conduct post-exploitation activities, such as > compromising administrator credentials, conducting lateral movement, and > exfiltrating registry hives and Active Directory files."]} > "\n\n" > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/clojure/5f2bd2a4-5c35-463b-9cb4-eecb9148fc89n%40googlegroups.com > <https://groups.google.com/d/msgid/clojure/5f2bd2a4-5c35-463b-9cb4-eecb9148fc89n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/CACMqiXAG3xtxa0XzHemyi-nf-HOQa1epoN%2BJrKN5AGJo7%3DVR%3Dw%40mail.gmail.com.