I’m closing-in on a full download of Dataset 9 of the Epstein Transparency Act. (I have the rest.) I’m thinking of building a vector database (e.g. pgvector for Postgres). I was thinking of wrapping a MCP server around it so LLMs can get a directory of articles and then summarize, or cross-reference sets of them. RAG is what Perplexity does, but apparently, they don’t have the content yet.
I imagine a SETI-at-home type project to reduce the data. Another analogy that comes to mind is annotations of the genome: Line all the documents up and then slowly fill in the summaries. The vector database could help inform how to combine documents for consumption within context window limits (PCA vicinity). I could keep my Max subscription on it and make some progress, but really such a project needs tens or hundreds of workers. Marcus
smime.p7s
Description: S/MIME cryptographic signature
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. FRIAM Applied Complexity Group listserv Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom https://bit.ly/virtualfriam to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
