Stefan - 128GB of memory to hold everything? that is not so much, i think we may be able to work around the limitstions of juat dumping it all in memory, perhaps.
have you calculated i/o speeds based on memory specs? if so, please let me know which specs of memory you are calcualting with as, in my testing, high-end SSDs were not too shabby for lots of random read/write tasks, but i dont know what the data you are getting from wikipedia looks like and id have to confirm those results anyway since they were from testing i did about 5 years ago, i think we had an engineering sample of an nvme pci-e 16x card from micron and we were atocking intel enterprise SSDs via SATA 3.0 and we might have had an intel nvme, but they didnt take my opinion when it came time to order hardware ; so im looking at clustering with rdma or something since high-end NICs, too, can approach memory bus speeds or surpass them i guess, although this might be a HUGE, fundamental misunderstanding on my part, since memory speed on, say video cards, is measured in GB/s, but NICS are in Gb/s, so we can say 6.4GB/s for a latest nvidia card's memory bus compared to a 45Gb ROCe nic (may be prohibitvely expensive, unsure) its the same-ish, since 6.4GBps (bytes into bits) is 51.2Gbps, its close, and anyways that is GPU land. plus, there would be huge performance hits when switching to a network stack due to memory addressing vs net addressing, unless maybe u could work directly with hwaddressing somehow on the net stack? not sure what that might look like and really just brings us back to IP land, maybe, idk i was never a network engineer so the OSI model is very faded at this point. i try to work with commodity hardware tho, since 1Gbps Nics are still far more common than 10Gbps Nics in consumer land in the USA since p much no one's home internet is faster than 1Gbps anyway. i think even the local webhosting comoany hosts 20k-30k servers with god-knows how many webapps on lile 4-5 redundant 10Gbps links OORRRR you could avoid most of that and find a software solution, maybe. here is an idea i had: maybe consider a caching mechanism on the server where, say, a disk image containing a local copy of wikipedia is mounted read-only is updated by some other, separate process and is presented as an immutable, read-only volume for your application (wikify)'s threads to consume according to some set of rules (rules will be necessary beyond an acl, inorder to ensure data consistency on read) and rhen u can let the OS handle memory caching if you want. you could also look at sharding the dataset across multiple disks for better threading, but more expenseive and unnecessary optimization rn anyway lmk if you have any thoughts! ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T6322565b7d29a2a0-M84c7bfd6bd548751277ca98e Delivery options: https://agi.topicbox.com/groups/agi/subscription
