Marcus (et al) -
Has this effort progressed? I don't really understand the scope of
intentions/efforts but it tweaks me:
1. current events erupting (news and oversight and investigations)
2. NM/SciTech community ties
3. network/graph theoretic analysis
* entity-relationship graphs generated and conditioned by
event-reports with place-time registration and credibility
intervals.
I was digging into some NM related bits myself, from other sources
(which may be redundant?).
A few resources of relevance to NM:
https://www.justice.gov/epstein/files/DataSet%2010/EFTA01688746.pdf
https://www.justice.gov/epstein/files/DataSet%209/EFTA01249623.pdf
https://www.documentcloud.org/projects/222714-epstein-new-mexico-docs/
and (very) recent NM relevant articles:
https://www.abqjournal.com/news/investigation-called-for-epsteins-zorro-ranch-after-email-alleges-two-girls-bodies-were-buried-nearby/2978007
https://www.theguardian.com/us-news/2026/feb/08/epstein-files-new-mexico-ranch?utm_source=chatgpt.com
https://prospect.org/2025/10/01/2025-10-01-we-obtained-thousands-of-new-epstein-documents/
On Fri, Feb 6, 2026 at 7:19 AM Tom Johnson <[email protected]> wrote:
Thanks, Marcus.
So you're saying there seems to be no consistency or
standardization of methodology in the so-called review and release
process. That is a story in itself.
Also, having to convert PDF to Text is another time suck but
necessary at some point.
Another approach would be to FOIA the DOJ for directives from
whoTK to those assigned to do the reviews/redactions. Of course
given that Trump has shut down many of the offices that responded
to FOIAs, it's unlikely we would see those documents in our life time.
Onward,
Tom
(TK means "to come" in journalize)
=======================
Tom Johnson
Inst. for Analytic Journalism
Santa Fe, New Mexico
505-577-6482
=======================
On Fri, Feb 6, 2026, 1:36 AM Marcus Daniels <[email protected]>
wrote:
So.. The early tranches were the FBI searches of the
properties. Then there were a bunch of personal photographs
of Epstein and Maxwell on their travels with various famous
people. Amusingly, faces some folks on this list would
recognize. (Read 2and3.md if so inclined and look-up
Maxwell’s recent proffer to Blanche.)
The early volume was modest enough in the early sets that I
could push a lot through Claude, even images. Summaries
attached of that.
The new documents vary a lot in size. There are examples of
subpoenaed e-mail accounts that go on and on for hundreds of
pages, but also singled isolated e-mails. There’s an
unusually large volume on investigating Epstein’s demise in
prison. Overall, it is mostly PDF format, and it often the
case that text can be extracted, e.g., using pdftotext. It’s
just the DOJ convention to use PDF. It doesn’t mean they are
composed documents.
I’ve been focused on “Dataset 9” as that one is large, and the
DOJ failed (or refused?) to make zip file that would be easy
to download. This dataset gives more insight into Epstein’s
contemptible personality. There are many emotionally
manipulative e-mails to some of his more independent young
female associates. I haven’t worked with the new data
systematically yet, just spot checking the download from time
to time. I feel guilty wasting GPU cycles and energy on
traumatizing a perfectly good AI on this stuff.
The file numbering has become sparse in the later datasets.
In the early batches, that occurred when Donald Trump was in
a picture. Just sayin.
Marcus
*From: *Friam <[email protected]> on behalf of Tom
Johnson <[email protected]>
*Date: *Thursday, February 5, 2026 at 9:38 PM
*To: *The Friday Morning Applied Complexity Coffee Group
<[email protected]>
*Subject: *Re: [FRIAM] Gauging interest..
Marcus--
Congrats and many thanks for harvesting this whole crop and
keeping it in various grain bins.
Quick questions:
The DOJ, on multiple occasions, has talked about various
numbers of pages. How many "pages" do you think you have? Are
they all standard 8.5x11 pages? All PDF? If so, searchable PDF?
Do the various batches released come with any kind of title
page, index? Glossary?
Are the pages/documents in any chronological order or any
categorical order?
Do you think we could do a word count vs. lines (each
containing an words-per-line estimate) redacted? (i.e a story
reporting X percent of the documents still hidden or useless).
I'm sure I can bug you for more.
Tom
=======================
Tom Johnson
Inst. for Analytic Journalism
Santa Fe, New Mexico
505-577-6482
=======================
On Thu, Feb 5, 2026, 10:37 PM Marcus Daniels
<[email protected]> wrote:
I’m closing-in on a full download of Dataset 9 of the
Epstein Transparency Act. (I have the rest.) I’m
thinking of building a vector database (e.g. pgvector for
Postgres). I was thinking of wrapping a MCP server
around it so LLMs can get a directory of articles and then
summarize, or cross-reference sets of them. RAG is what
Perplexity does, but apparently, they don’t have the
content yet.
I imagine a SETI-at-home type project to reduce the data.
Another analogy that comes to mind is annotations of the
genome: Line all the documents up and then slowly fill in
the summaries. The vector database could help inform how
to combine documents for consumption within context window
limits (PCA vicinity).
I could keep my Max subscription on it and make some
progress, but really such a project needs tens or hundreds
of workers.
Marcus
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .--
.-. --- -. --. / ... --- -- . / .- .-. . / ..- ... . ..-.
..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p
Zoom https://bit.ly/virtualfriam
to (un)subscribe
http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives: 5/2017 thru present
https://redfish.com/pipermail/friam_redfish.com/
1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-.
--- -. --. / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p
Zoom https://bit.ly/virtualfriam
to (un)subscribe
http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives: 5/2017 thru present
https://redfish.com/pipermail/friam_redfish.com/
1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. ---
-. --. / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives: 5/2017 thru present
https://redfish.com/pipermail/friam_redfish.com/
1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ...
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p
Zoomhttps://bit.ly/virtualfriam
to (un)subscribehttp://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIChttp://friam-comic.blogspot.com/
archives: 5/2017 thru presenthttps://redfish.com/pipermail/friam_redfish.com/
1/2003 thru 6/2021http://friam.383.s1.nabble.com/
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ...
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
1/2003 thru 6/2021 http://friam.383.s1.nabble.com/