Yes the initial searches can take a few seconds because it has to load in parts of the index from disk. This was the result when I typed in the same thing just now: https://botcompany.de/images/1102950 Much better as you can see...
I am using memory mapped files, but the complete index doesn't fit in RAM with the 32 GB server and there is no easy way to know which parts to preload, so we start with nothing preloaded and stream it in as needed. In the end, the server would need some 128 GB and everything can be held in memory which should bring any query below one second (longer queries are faster anyway, and I can probably do some more optimization on the short ones). It is searching all of Wikipedia's data in a precise (case-insensitive) full-text search which is kind of the idea here. Results are capped to 100 per query, but that's just for display. I'm using the English Wikipedia dump from April 2020. The algorithm falls inbetween suffix trees, which search very quickly but require an insane amount of memory, and most other methods of searching. The method is based on compression. I'm venturing that with a bit more effort, I can make the index no bigger than the original text - while also still containing the original text with random access. In fact, all the results are printed directly from the index , the original text file is not accessed by the engine. The engine does lots of few-byte random accesses so it is completely IOPS-bound if the index is not in RAM. Not sure what your human brain comparison means... a human doesn't have 70GB of raw text in their head :) As to how to use the system, we want bots to use it to answer questions. I am not 100% sure how that works, but I had the strong impulse to make such a search engine and I'm pleased that it does work, even though it is not exactly hitting the 1 ms goal. 128 GB servers just fell in price, maybe I should get one to get rid of the embarrassing search delays, but I'm not 100% sure about my financial future yet. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T6322565b7d29a2a0-M63275370a1b52de9a9df553a Delivery options: https://agi.topicbox.com/groups/agi/subscription
