RedisSearch seems to be fully in-memory and have no analysis or query chain. Or any real multilingual support. It is pears and apples comparison and their "big" feature is what Lucene started from (term list). I don't even see phrase search support, as they don't seem to implement posting list, just the terms.
Also I don't see them publishing their Elasticsearch or Solr configuration, which from past experiences is often left untuned. But yes, good for them. And good for Postgres for adding full-text search some months ago. Even good for Oracle for having a commercial (however hardcoded and terrible) full-text search. I think the summary - in my mind - is that, if software is swallowing the world than search is swallowing the software. Maybe it will become that last "Kitchen sink" proof, replacing email. And the more interesting ideas go around - the better. And some of them, I am sure, will end up in Lucene/Solr/Elasticsearch, as - after all - they are the most popular platforms and people will bring those extra things to the core platform they use, if they really want it. Regards, Alex. ---- http://www.solr-start.com/ - Resources for Solr users, new and experienced On 10 February 2017 at 11:38, Dorian Hoxha <dorian.ho...@gmail.com> wrote: > @Alex, > I don't know if you've seen it, but there's also redissearch module which > they claim to be faster (ofc less features): > https://redislabs.com/blog/adding-search-engine-redis-adventures-module-land/ > http://www.slideshare.net/RedisLabs/redis-for-search > https://github.com/RedisLabsModules/RediSearch > > On Fri, Feb 10, 2017 at 1:36 PM, Dorian Hoxha <dorian.ho...@gmail.com> > wrote: >> >> >> >> On Wed, Feb 8, 2017 at 3:58 PM, Alexandre Rafalovitch <arafa...@gmail.com> >> wrote: >>> >>> One you filter out the JIRA messages, the forum is very strong and >>> alive. It is just very focused on its purpose - building Solr and >>> Lucene and ElasticSearch. >> >> Will do just that. Thanks. >>> >>> >>> As to "perfection" - nothing is perfect, you can just look at the list >>> of the open JIRAs to confirm that for Lucene and/or Solr. But there is >>> constant improvement and ever-deepening of the features and >>> performance improvement. >>> >>> You can also look at Elasticsearch for inspiration, as they build on >>> Lucene (and are contributing to it) and had a chance to rebuild the >>> layers above it. >> >> They have more fancy features, but less advanced ones (ex: shard >> splitting!) >>> >>> >>> On your question specifically, I think it is hard to answer it well. >>> Partially because I am not sure your assumptions are all that thought >>> out. For example: >>> 1) Different language than Java - Solr relies on Zookeeper, Tika and >>> other libraries. All of those are in Java. Language change implies >>> full change of the dependencies and ecosystem and - without looking - >>> I doubt there is an open-source comprehensive MSWord parser in >>> C++/Rust. >> >> Usually indexing-speed is not the bottleneck (beside logging and some >> other scenarios) so you could probably use a java service (for tika). >> Zookeeper is again not a bottleneck when serving requests, and you can >> still use it with a non-java db. >>> >>> 2) Algolia radix? Lucene uses pre-compiled DFA (deterministic finite >>> automata). Are you sure the open graph chosen because Algolia wants to >>> run on the phone is an improvement on the DFA >> >> The `suggesters` which are backed by DFA can't be used with normal >> filters/queries which is critical (and algolia-radix can do) >>> >>> 3) Document distribution is already customizable with _route_ key, >>> though obviously Maguro algorithm is beyond single key's reach. On the >>> other hand, I am not sure Maguro is designed for good faceting, >>> streaming, enumerations, or other features Lucene/Solr has in its >>> core. >> >> Yes, seems very special use case. >>> >>> >>> As to the rest (GPU!, FPGA), we accept contributions. Including large, >>> complex, interesting contributions (streams, learning to rank, >>> docvalues, etc). >> >> I mean just in the "ideas case", not do it for me. >>> >>> And, long term, it is probably more effective to be >>> able to innovate without the well-established framework rather than >>> reinventing things from scratch. After all, even Twitter and LinkedIn >>> built their internal implementations on top of Lucene rather than >>> reinventing absolutely everything. >> >> Depends how core it is to your comp and how good at low-level your team >> is. Most of the time yes but sometimes you gotta (like the scylladb case, >> they've built A LOT from scratch, like custom scheduler etc) >>> >>> >>> Still, Elasticsearch had a - very successful - go at the "Innovator's >>> Dilemma" situation. If you want to create a team trying to >>> rebuild/improve the approaches completely from scratch, I am sure you >>> will find a lot of us looking at your efforts with interest. I, for >>> one, would be happy to point out a new radically-different approach to >>> search engine implementation on my Solr Start mailing list. >> >> That's why I'm asking for ideas. This is what I got from another dev on >> the same question: https://news.ycombinator.com/item?id=13249724 >> Quote:"Multicores parallel shared nothing architecture like the on in the >> TurboPFor inverted index app and a ram resident inverted index." >> >> >> >>> >>> Regards and good luck, >>> Alex. >>> ---- >>> http://www.solr-start.com/ - Resources for Solr users, new and >>> experienced >>> >>> >>> On 8 February 2017 at 03:39, Dorian Hoxha <dorian.ho...@gmail.com> wrote: >>> > So, am I asking too much (maybe), is this forum dead (then where to ask >>> > ? >>> > there is extreme noise here), is lucene perfect(of course not) ? >>> > >>> > >>> > On Wed, Jan 25, 2017 at 5:01 PM, Dorian Hoxha <dorian.ho...@gmail.com> >>> > wrote: >>> >> >>> >> Was thinking also how bing doesn't use posting lists and also >>> >> compiling >>> >> queries ! >>> >> About the queries, I would've think it wouldn't be as high overhead as >>> >> queries in in rdbms since those apply on each row while on search they >>> >> apply >>> >> on each bitset. >>> >> >>> >> >>> >> On Mon, Jan 23, 2017 at 6:04 PM, Jeff Wartes <jwar...@whitepages.com> >>> >> wrote: >>> >>> >>> >>> >>> >>> >>> >>> I’ve had some curiosity about this question too. >>> >>> >>> >>> >>> >>> >>> >>> For a while, I watched for a seastar-like library for the JVM, but >>> >>> https://github.com/bestwpw/windmill was the only one I came across, >>> >>> and it >>> >>> doesn’t seem to be going anywhere. Since one of the points of the JVM >>> >>> is to >>> >>> abstract away the platform, I certainty wonder if the JVM will ever >>> >>> get the >>> >>> kinds of machine affinity these other projects see. Your >>> >>> one-shard-per-core >>> >>> could probably be faked with multiple JVMs and numactl - could be an >>> >>> interesting experiment. >>> >>> >>> >>> >>> >>> >>> >>> That said, I’m aware that a phenomenal amount of optimization effort >>> >>> has >>> >>> gone into Lucene, and I’d also be interested in hearing about things >>> >>> that >>> >>> worked well. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: Dorian Hoxha <dorian.ho...@gmail.com> >>> >>> Reply-To: "dev@lucene.apache.org" <dev@lucene.apache.org> >>> >>> Date: Friday, January 20, 2017 at 8:12 AM >>> >>> To: "dev@lucene.apache.org" <dev@lucene.apache.org> >>> >>> Subject: How would you architect solr/lucene if you were starting >>> >>> from >>> >>> scratch for them to be 10X+ faster/efficient ? >>> >>> >>> >>> >>> >>> >>> >>> Hi friends, >>> >>> >>> >>> I was thinking how scylladb architecture works compared to cassandra >>> >>> which gives them 10x+ performance and lower latency. If you were >>> >>> starting >>> >>> lucene and solr from scratch what would you do to achieve something >>> >>> similar >>> >>> ? >>> >>> >>> >>> Different language (rust/c++?) for better SIMD ? >>> >>> >>> >>> Use a GPU with a SSD for posting-list intersection ?(not out yet) >>> >>> >>> >>> Make it in-memory and use better data structures? >>> >>> >>> >>> Shard on cores like scylladb (so 1 shard for each core on the >>> >>> machine) ? >>> >>> >>> >>> External cache (like keeping n redis-servers with big ram/network & >>> >>> slow >>> >>> cpu/disk just for cache) ?? >>> >>> >>> >>> Use better data structures (like algolia autocomplete radix ) >>> >>> >>> >>> Distributing documents by term instead of id ? >>> >>> >>> >>> Using ASIC / FPGA ? >>> >>> >>> >>> >>> >>> >>> >>> Regards, >>> >>> >>> >>> Dorian >>> >> >>> >> >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org