I don't know how common it is, but have you looked at the `tree-seq`
function in Clojure? This seems like a good use case for it.

Mark

On Wed, Feb 2, 2022 at 3:22 PM lawrence...@gmail.com <
lawrence.krub...@gmail.com> wrote:

> Assume I've been cursed to scrape HTML. If I convert the pages to Hickory
> I end up with a big mass of data which, sadly, lacks many "class" or "id"s
> that would let me easily pick out the data I need. However, for the most
> part, the only thing I really need off this page is the CVEs, which look
> like this:
>
> CVE-2021-40539
>
> I'm thinking I might write regex against the plain text of the page, but
> I'm also curious, is it common to take something like Hiccup or Hickory or
> a zipper and run regex through it? If yes, how is that done?
>
> A small part of the data looks like this:
>
>                 :content
>                 [{:type :element,
>                   :attrs
>                   {:class "tip-intro", :style "font-size: 15px;"},
>                   :tag :p,
>                   :content
>                   [{:type :element,
>                     :attrs nil,
>                     :tag :em,
>                     :content
>                     ["This Joint Cybersecurity Advisory uses the MITRE
> Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK®) framework,
> Version 8. See the "
>                      {:type :element,
>                       :attrs
>                       {:href
>                        "
> https://attack.mitre.org/versions/v9/techniques/enterprise/"},
>                       :tag :a,
>                       :content ["ATT&CK for Enterprise"]}
>                      " for  referenced threat actor tactics and for
> techniques."]}]}
>                  "\n\n"
>                  {:type :element,
>                   :attrs nil,
>                   :tag :p,
>                   :content
>                   ["This joint advisory is the result of analytic efforts
> between the Federal Bureau of Investigation (FBI), United States Coast
> Guard Cyber Command (CGCYBER), and the Cybersecurity and Infrastructure
> Security Agency (CISA) to highlight the cyber threat associated with active
> exploitation of a newly identified vulnerability (CVE-2021-40539) in
> ManageEngine ADSelfService Plus—a self-service password management and
> single sign-on solution."]}
>                  "\n\n"
>                  {:type :element,
>                   :attrs nil,
>                   :tag :p,
>                   :content
>                   ["CVE-2021-40539, rated critical by the Common
> Vulnerability Scoring System (CVSS), is an authentication bypass
> vulnerability affecting representational state transfer (REST) application
> programming interface (API) URLs that could enable remote code execution.
> The FBI, CISA, and CGCYBER assess that advanced persistent threat (APT)
> cyber actors are likely among those exploiting the vulnerability. The
> exploitation of ManageEngine ADSelfService Plus poses a serious risk to
> critical infrastructure companies, U.S.-cleared defense contractors,
> academic institutions, and other entities that use the software. Successful
> exploitation of the vulnerability allows an attacker to place webshells,
> which enable the adversary to conduct post-exploitation activities, such as
> compromising administrator credentials, conducting lateral movement, and
> exfiltrating registry hives and Active Directory files."]}
>                  "\n\n"
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/clojure/5f2bd2a4-5c35-463b-9cb4-eecb9148fc89n%40googlegroups.com
> <https://groups.google.com/d/msgid/clojure/5f2bd2a4-5c35-463b-9cb4-eecb9148fc89n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/CACMqiXAG3xtxa0XzHemyi-nf-HOQa1epoN%2BJrKN5AGJo7%3DVR%3Dw%40mail.gmail.com.

Reply via email to