@Alex, I don't know if you've seen it, but there's also redissearch module which they claim to be faster (ofc less features): https://redislabs.com/blog/adding-search-engine-redis-adventures-module-land/ http://www.slideshare.net/RedisLabs/redis-for-search https://github.com/RedisLabsModules/RediSearch
On Fri, Feb 10, 2017 at 1:36 PM, Dorian Hoxha <dorian.ho...@gmail.com> wrote: > > > On Wed, Feb 8, 2017 at 3:58 PM, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > >> One you filter out the JIRA messages, the forum is very strong and >> alive. It is just very focused on its purpose - building Solr and >> Lucene and ElasticSearch. >> > Will do just that. Thanks. > >> >> As to "perfection" - nothing is perfect, you can just look at the list >> of the open JIRAs to confirm that for Lucene and/or Solr. But there is >> constant improvement and ever-deepening of the features and >> performance improvement. >> >> You can also look at Elasticsearch for inspiration, as they build on >> Lucene (and are contributing to it) and had a chance to rebuild the >> layers above it. >> > They have more fancy features, but less advanced ones (ex: shard > splitting!) > >> >> On your question specifically, I think it is hard to answer it well. >> Partially because I am not sure your assumptions are all that thought >> out. For example: >> 1) Different language than Java - Solr relies on Zookeeper, Tika and >> other libraries. All of those are in Java. Language change implies >> full change of the dependencies and ecosystem and - without looking - >> I doubt there is an open-source comprehensive MSWord parser in >> C++/Rust. >> > Usually indexing-speed is not the bottleneck (beside logging and some > other scenarios) so you could probably use a java service (for tika). > Zookeeper is again not a bottleneck when serving requests, and you can > still use it with a non-java db. > >> 2) Algolia radix? Lucene uses pre-compiled DFA (deterministic finite >> automata). Are you sure the open graph chosen because Algolia wants to >> run on the phone is an improvement on the DFA >> > The `suggesters` which are backed by DFA can't be used with normal > filters/queries which is critical (and algolia-radix can do) > >> 3) Document distribution is already customizable with _route_ key, >> though obviously Maguro algorithm is beyond single key's reach. On the >> other hand, I am not sure Maguro is designed for good faceting, >> streaming, enumerations, or other features Lucene/Solr has in its >> core. >> > Yes, seems very special use case. > >> >> As to the rest (GPU!, FPGA), we accept contributions. Including large, >> complex, interesting contributions (streams, learning to rank, >> docvalues, etc). > > I mean just in the "ideas case", not do it for me. > >> And, long term, it is probably more effective to be >> able to innovate without the well-established framework rather than >> reinventing things from scratch. After all, even Twitter and LinkedIn >> built their internal implementations on top of Lucene rather than >> reinventing absolutely everything. >> > Depends how core it is to your comp and how good at low-level your team > is. Most of the time yes but sometimes you gotta (like the scylladb case, > they've built A LOT from scratch, like custom scheduler etc) > >> >> Still, Elasticsearch had a - very successful - go at the "Innovator's >> Dilemma" situation. If you want to create a team trying to >> rebuild/improve the approaches completely from scratch, I am sure you >> will find a lot of us looking at your efforts with interest. I, for >> one, would be happy to point out a new radically-different approach to >> search engine implementation on my Solr Start mailing list. >> > That's why I'm asking for ideas. This is what I got from another dev on > the same question: https://news.ycombinator.com/item?id=13249724 > Quote:"Multicores parallel shared nothing architecture like the on in the > TurboPFor inverted index app and a ram resident inverted index." > > > > >> Regards and good luck, >> Alex. >> ---- >> http://www.solr-start.com/ - Resources for Solr users, new and >> experienced >> >> >> On 8 February 2017 at 03:39, Dorian Hoxha <dorian.ho...@gmail.com> wrote: >> > So, am I asking too much (maybe), is this forum dead (then where to ask >> ? >> > there is extreme noise here), is lucene perfect(of course not) ? >> > >> > >> > On Wed, Jan 25, 2017 at 5:01 PM, Dorian Hoxha <dorian.ho...@gmail.com> >> > wrote: >> >> >> >> Was thinking also how bing doesn't use posting lists and also compiling >> >> queries ! >> >> About the queries, I would've think it wouldn't be as high overhead as >> >> queries in in rdbms since those apply on each row while on search they >> apply >> >> on each bitset. >> >> >> >> >> >> On Mon, Jan 23, 2017 at 6:04 PM, Jeff Wartes <jwar...@whitepages.com> >> >> wrote: >> >>> >> >>> >> >>> >> >>> I’ve had some curiosity about this question too. >> >>> >> >>> >> >>> >> >>> For a while, I watched for a seastar-like library for the JVM, but >> >>> https://github.com/bestwpw/windmill was the only one I came across, >> and it >> >>> doesn’t seem to be going anywhere. Since one of the points of the JVM >> is to >> >>> abstract away the platform, I certainty wonder if the JVM will ever >> get the >> >>> kinds of machine affinity these other projects see. Your >> one-shard-per-core >> >>> could probably be faked with multiple JVMs and numactl - could be an >> >>> interesting experiment. >> >>> >> >>> >> >>> >> >>> That said, I’m aware that a phenomenal amount of optimization effort >> has >> >>> gone into Lucene, and I’d also be interested in hearing about things >> that >> >>> worked well. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> From: Dorian Hoxha <dorian.ho...@gmail.com> >> >>> Reply-To: "dev@lucene.apache.org" <dev@lucene.apache.org> >> >>> Date: Friday, January 20, 2017 at 8:12 AM >> >>> To: "dev@lucene.apache.org" <dev@lucene.apache.org> >> >>> Subject: How would you architect solr/lucene if you were starting from >> >>> scratch for them to be 10X+ faster/efficient ? >> >>> >> >>> >> >>> >> >>> Hi friends, >> >>> >> >>> I was thinking how scylladb architecture works compared to cassandra >> >>> which gives them 10x+ performance and lower latency. If you were >> starting >> >>> lucene and solr from scratch what would you do to achieve something >> similar >> >>> ? >> >>> >> >>> Different language (rust/c++?) for better SIMD ? >> >>> >> >>> Use a GPU with a SSD for posting-list intersection ?(not out yet) >> >>> >> >>> Make it in-memory and use better data structures? >> >>> >> >>> Shard on cores like scylladb (so 1 shard for each core on the >> machine) ? >> >>> >> >>> External cache (like keeping n redis-servers with big ram/network & >> slow >> >>> cpu/disk just for cache) ?? >> >>> >> >>> Use better data structures (like algolia autocomplete radix ) >> >>> >> >>> Distributing documents by term instead of id ? >> >>> >> >>> Using ASIC / FPGA ? >> >>> >> >>> >> >>> >> >>> Regards, >> >>> >> >>> Dorian >> >> >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >