I’m closing-in on a full download of Dataset 9 of the Epstein Transparency 
Act. (I have the rest.) I’m thinking of building a vector database (e.g. 
pgvector for Postgres). I was thinking of wrapping a MCP server around it so 
LLMs can get a directory of articles and then summarize, or cross-reference 
sets of them. RAG is what Perplexity does, but apparently, they don’t have the 
content yet. 

I imagine a SETI-at-home type project to reduce the data. Another analogy that 
comes to mind is annotations of the genome: Line all the documents up and then 
slowly fill in the summaries. The vector database could help inform how to 
combine documents for consumption within context window limits (PCA vicinity). 

I could keep my Max subscription on it and make some progress, but really such 
a project needs tens or hundreds of workers. 

Marcus 





Attachment: smime.p7s
Description: S/MIME cryptographic signature

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Reply via email to