I’ve had some curiosity about this question too. For a while, I watched for a seastar-like library for the JVM, but https://github.com/bestwpw/windmill was the only one I came across, and it doesn’t seem to be going anywhere. Since one of the points of the JVM is to abstract away the platform, I certainty wonder if the JVM will ever get the kinds of machine affinity these other projects see. Your one-shard-per-core could probably be faked with multiple JVMs and numactl - could be an interesting experiment.
That said, I’m aware that a phenomenal amount of optimization effort has gone into Lucene, and I’d also be interested in hearing about things that worked well. From: Dorian Hoxha <dorian.ho...@gmail.com> Reply-To: "dev@lucene.apache.org" <dev@lucene.apache.org> Date: Friday, January 20, 2017 at 8:12 AM To: "dev@lucene.apache.org" <dev@lucene.apache.org> Subject: How would you architect solr/lucene if you were starting from scratch for them to be 10X+ faster/efficient ? Hi friends, I was thinking how scylladb architecture<http://www.scylladb.com/technology/architecture/> works compared to cassandra which gives them 10x+ performance and lower latency. If you were starting lucene and solr from scratch what would you do to achieve something similar ? Different language (rust/c++?) for better SIMD<http://blog-archive.griddynamics.com/2015/06/lucene-simd-codec-benchmark-and-future.html> ? Use a GPU with a SSD for posting-list intersection ?(not out yet) Make it in-memory and use better data structures? Shard on cores like scylladb (so 1 shard for each core on the machine) ? External cache (like keeping n redis-servers with big ram/network & slow cpu/disk just for cache) ?? Use better data structures (like algolia autocomplete radix<https://blog.algolia.com/inside-the-algolia-engine-part-2-the-indexing-challenge-of-instant-search/> ) Distributing documents by term instead of id<http://research.microsoft.com/en-us/um/people/trishulc/papers/Maguro.pdf> ? Using ASIC / FPGA ? Regards, Dorian