guess I'll fix glaring typos but the editor atm is not letting me break the quotes
> I spent some time today relaxing by chatting with chatgpt about an offline > low-energy cheap approach to speedy language model inference with > reasonably potential scaling to llama 405b. May have mentioned this before. > I drilled into more details of construction. > > The rough idea is: > - the data can be converted to a special microfilm form and ordered as > optical media online or otherwise projected to film e.g. via photography. > - this can then be used for streaming optical matmul to make most matrix > ops O(1) > - there are existing projects online for film scanning such as > kinograph.cc . chatgpt recommends a cheap system using t-slot parts and > manual linear stages, or a precise system using microscoph post rods, or > *microscopy post rods > maybe 3d printed flexure > - tape transport is done similarly between audio tape and film, and > generally involves a "capstab" which is > *"capstan" > a cylindrical drive motor immediately after the read head or optical > windows, such that the film position > *window (singular) although you could have more > is controlled by it with minimal play > - for llama 405b it's helpful if film can reach high speeds. Each token > means processing the entire reel of weights, so if it is 12m that means > moving the film > I got 12m by calculating 405b weights / 16k logit dimension * 0.05um microfilm resolution (completely ignoring patallel ops -- however many attention heads llama 405b has could divide this, for example), but at least 24m might be more defensive given sampling error and sync data. chatgpt originally estimated 63m. mostly we figured 2m/s was sufficient for now -- it said 5m/s would need industrial-grade parts at 12m/s for 1 token/s. however, many ops are independent and can be > processed in parallel, as well as multiple rigs can be used to divide the > speed, so a goal of 1-5m/s seems workable > - llama 405b has only 16k logits so a tilted 20MP raspberry pi camera can > more than handle the data with accumulation handled by motion blur over the > other matrix dimension. to convey processing this tilting effectively to > chatgpt I used the phrase "prebaked linear unmixing". by this i roughly > mean measuring the "moire" effect of the light source projecting through > the data onto the sensor as a matrix, inverting the matrix, and then > producing the data on the film multiplied by this inverse so that the > sensor reads the desired values if nothing shifts too many nanometers from > the measurement (chatgpt says recalibration may be needed during runs) > - batching multiple completions together can be done many ways, storing > weights adjacent in parallel, sending light at different angles through the > film, using different colors, and other approaches. this listing was my > biggest reason to want to save the chat but I had not logged in. > > as mentioned earlier there are existing photonic systems too, but they are > very expensive and this approach may provide for small llms under $100 and > large llms for under $1000 maybe as little as small llms, but likely needs > engineering to succeed > > it's one way to pursue chat model urges while focusing on less dangerous > much safer offline, harder-to-mispurpose solutions, and it could be quite > cool > > my t60 is broken atm and I'm focusing on making steps on my extreme tax > debt. I have another logic board in the mail and also printed out some info > on repair if needed (see earlier for part) that I have not engaged much > yet. The tax debt is good to make steps on. Challenging. >
