I’m still pulling data.  For some reason, the DOJ removed a couple of zip 
files.   Assuming that wasn’t for nefarious reasons, I decided to pull Dataset 
10 file-by-file to make sure I get whatever is published.  Haven’t created a 
vector index yet for a RAG-like interface.    I did try a network type analysis 
earlier when the House files were provided.



From: Friam <[email protected]> On Behalf Of Steve Smith
Sent: Wednesday, February 11, 2026 9:56 AM
To: [email protected]
Subject: Re: [FRIAM] Gauging interest..



Marcus (et al) -

Has this effort progressed?  I don't really understand the scope of 
intentions/efforts but it tweaks me:

1.      current events erupting (news and oversight and investigations)
2.      NM/SciTech community ties
3.      network/graph theoretic analysis

   *    entity-relationship graphs generated and conditioned by event-reports 
with place-time registration and credibility intervals.

   I was digging into some NM related bits myself, from other sources (which 
may be redundant?).

   A few resources of relevance to NM:

   https://www.justice.gov/epstein/files/DataSet%2010/EFTA01688746.pdf

   https://www.justice.gov/epstein/files/DataSet%209/EFTA01249623.pdf

   https://www.documentcloud.org/projects/222714-epstein-new-mexico-docs/

   and (very) recent NM relevant articles:

   
https://www.abqjournal.com/news/investigation-called-for-epsteins-zorro-ranch-after-email-alleges-two-girls-bodies-were-buried-nearby/2978007

   
https://www.theguardian.com/us-news/2026/feb/08/epstein-files-new-mexico-ranch?utm_source=chatgpt.com

   
https://prospect.org/2025/10/01/2025-10-01-we-obtained-thousands-of-new-epstein-documents/

   On Fri, Feb 6, 2026 at 7:19 AM Tom Johnson 
<[email protected]<mailto:[email protected]>> wrote:

      Thanks, Marcus.



      So you're saying there seems to be no consistency or standardization of 
methodology in the so-called review and release process. That is a story in 
itself.



      Also, having to convert PDF to Text is another time suck but necessary at 
some point.



      Another approach would be to FOIA the DOJ for directives from whoTK to 
those assigned to do the reviews/redactions. Of course given that Trump has 
shut down many of the offices that responded to FOIAs, it's unlikely we would 
see those documents in our life time.



      Onward,

      Tom

      (TK means "to come" in journalize)

      =======================
      Tom Johnson
      Inst. for Analytic Journalism
      Santa Fe, New Mexico
      505-577-6482
      =======================



      On Fri, Feb 6, 2026, 1:36 AM Marcus Daniels 
<[email protected]<mailto:[email protected]>> wrote:

         So.. The early tranches were the FBI searches of the properties.   
Then there were a bunch of personal photographs of Epstein and Maxwell on their 
travels with various famous people. Amusingly, faces some folks on this list 
would recognize.   (Read 2and3.md if so inclined and look-up Maxwell’s recent 
proffer to Blanche.)

         The early volume was modest enough in the early sets that I could push 
a lot through Claude, even images.  Summaries attached of that.

         The new documents vary a lot in size.  There are examples of 
subpoenaed e-mail accounts that go on and on for hundreds of pages, but also 
singled isolated e-mails.   There’s an unusually large volume on investigating 
Epstein’s demise in prison.   Overall, it is mostly PDF format, and it often 
the case that text can be extracted, e.g., using pdftotext.   It’s just the DOJ 
convention to use PDF. It doesn’t mean they are composed documents.



         I’ve been focused on “Dataset 9” as that one is large, and the DOJ 
failed (or refused?) to make zip file that would be easy to download. This 
dataset gives more insight into Epstein’s contemptible personality.  There are 
many emotionally manipulative e-mails to some of his more independent young 
female associates.   I haven’t worked with the new data systematically yet, 
just spot checking the download from time to time.   I feel guilty wasting GPU 
cycles and energy on traumatizing a perfectly good AI on this stuff.



         The file numbering has become sparse in the later datasets.   In the 
early batches, that occurred when Donald Trump was in a picture.  Just sayin.



         Marcus

         From: Friam 
<[email protected]<mailto:[email protected]>> on behalf of Tom 
Johnson <[email protected]<mailto:[email protected]>>
         Date: Thursday, February 5, 2026 at 9:38 PM
         To: The Friday Morning Applied Complexity Coffee Group 
<[email protected]<mailto:[email protected]>>
         Subject: Re: [FRIAM] Gauging interest..

         Marcus--

         Congrats and many thanks for harvesting this whole crop and keeping it 
in various grain bins.



         Quick questions:



         The DOJ, on multiple occasions, has talked about various numbers of 
pages. How many "pages" do you think you have? Are they all standard 8.5x11 
pages? All PDF? If so, searchable PDF?

         Do the various batches released come with any kind of title page, 
index? Glossary?



         Are the pages/documents in any chronological order or any categorical 
order?



         Do you think we could do a word count vs. lines (each containing an 
words-per-line estimate) redacted? (i.e a story reporting X percent of the 
documents still hidden or useless).



         I'm sure I can bug you for more.

         Tom



         =======================
         Tom Johnson
         Inst. for Analytic Journalism
         Santa Fe, New Mexico
         505-577-6482
         =======================



         On Thu, Feb 5, 2026, 10:37 PM Marcus Daniels 
<[email protected]<mailto:[email protected]>> wrote:

            I’m closing-in on a full download of Dataset 9 of the Epstein 
Transparency Act.  (I have the rest.)   I’m thinking of building a vector 
database (e.g. pgvector for Postgres).   I was thinking of wrapping a MCP 
server around it so LLMs can get a directory of articles and then summarize, or 
cross-reference sets of them.   RAG is what Perplexity does, but apparently, 
they don’t have the content yet.



            I imagine a SETI-at-home type project to reduce the data.  Another 
analogy that comes to mind is annotations of the genome: Line all the documents 
up and then slowly fill in the summaries.   The vector database could help 
inform how to combine documents for consumption within context window limits 
(PCA vicinity).



            I could keep my Max subscription on it and make some progress, but 
really such a project needs tens or hundreds of workers.



            Marcus









            .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. 
--. / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
            FRIAM Applied Complexity Group listserv
            Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
            to (un)subscribe 
http://redfish.com/mailman/listinfo/friam_redfish.com
            FRIAM-COMIC http://friam-comic.blogspot.com/
            archives:  5/2017 thru present 
https://redfish.com/pipermail/friam_redfish.com/
              1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

         .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. 
--. / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
         FRIAM Applied Complexity Group listserv
         Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
         to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
         FRIAM-COMIC http://friam-comic.blogspot.com/
         archives:  5/2017 thru present 
https://redfish.com/pipermail/friam_redfish.com/
           1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

      .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / 
... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
      FRIAM Applied Complexity Group listserv
      Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
      to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
      FRIAM-COMIC http://friam-comic.blogspot.com/
      archives:  5/2017 thru present 
https://redfish.com/pipermail/friam_redfish.com/
        1/2003 thru 6/2021  http://friam.383.s1.nabble.com/





   .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / 
... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
   FRIAM Applied Complexity Group listserv
   Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
   to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
   FRIAM-COMIC http://friam-comic.blogspot.com/
   archives:  5/2017 thru present 
https://redfish.com/pipermail/friam_redfish.com/
     1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Reply via email to