As usual, Brother Steve is seeing this at a much higher conceptual and process level than I am. From the perspective of Analytic Journalism, if we're dealing with a large data set -- say 10K to 1 million records -- we would first draw a sample of a small TK percent to develop and test our assumptions, methods, and process. Once it's stable, run it against a larger sample. If it is still stable, then throw it against the total dataset.

your original post triggered a cascade of memories (some of which I blurted out here) as well as a jam-session with my bar-friend-cum-technical-interlocutor GPT who led me on a merry chase through some latent techniques I once ideated on (some blurted here earlier).

A phrase that came out of that tete-a-tete fits what I think you are describing from your own POV (highly relevant in these modern times of 3M record data-dumps from DOJ to try to baffle-with-BS) is "pre-image".

GPT offered me the more explicit denotation:

   /The pre-image is not “the” original data point, but:/

   /*the equivalence class of upstream possibilities consistent with
   the downstream observation.*/

I have a lot of respect for those of you who swim well  in such high-dimensional and poorly defined, poorly conditioned data sets such as "the news stream".


.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Reply via email to