On Wed, Feb 8, 2017 at 3:58 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote:
> One you filter out the JIRA messages, the forum is very strong and > alive. It is just very focused on its purpose - building Solr and > Lucene and ElasticSearch. > Will do just that. Thanks. > > As to "perfection" - nothing is perfect, you can just look at the list > of the open JIRAs to confirm that for Lucene and/or Solr. But there is > constant improvement and ever-deepening of the features and > performance improvement. > > You can also look at Elasticsearch for inspiration, as they build on > Lucene (and are contributing to it) and had a chance to rebuild the > layers above it. > They have more fancy features, but less advanced ones (ex: shard splitting!) > > On your question specifically, I think it is hard to answer it well. > Partially because I am not sure your assumptions are all that thought > out. For example: > 1) Different language than Java - Solr relies on Zookeeper, Tika and > other libraries. All of those are in Java. Language change implies > full change of the dependencies and ecosystem and - without looking - > I doubt there is an open-source comprehensive MSWord parser in > C++/Rust. > Usually indexing-speed is not the bottleneck (beside logging and some other scenarios) so you could probably use a java service (for tika). Zookeeper is again not a bottleneck when serving requests, and you can still use it with a non-java db. > 2) Algolia radix? Lucene uses pre-compiled DFA (deterministic finite > automata). Are you sure the open graph chosen because Algolia wants to > run on the phone is an improvement on the DFA > The `suggesters` which are backed by DFA can't be used with normal filters/queries which is critical (and algolia-radix can do) > 3) Document distribution is already customizable with _route_ key, > though obviously Maguro algorithm is beyond single key's reach. On the > other hand, I am not sure Maguro is designed for good faceting, > streaming, enumerations, or other features Lucene/Solr has in its > core. > Yes, seems very special use case. > > As to the rest (GPU!, FPGA), we accept contributions. Including large, > complex, interesting contributions (streams, learning to rank, > docvalues, etc). I mean just in the "ideas case", not do it for me. > And, long term, it is probably more effective to be > able to innovate without the well-established framework rather than > reinventing things from scratch. After all, even Twitter and LinkedIn > built their internal implementations on top of Lucene rather than > reinventing absolutely everything. > Depends how core it is to your comp and how good at low-level your team is. Most of the time yes but sometimes you gotta (like the scylladb case, they've built A LOT from scratch, like custom scheduler etc) > > Still, Elasticsearch had a - very successful - go at the "Innovator's > Dilemma" situation. If you want to create a team trying to > rebuild/improve the approaches completely from scratch, I am sure you > will find a lot of us looking at your efforts with interest. I, for > one, would be happy to point out a new radically-different approach to > search engine implementation on my Solr Start mailing list. > That's why I'm asking for ideas. This is what I got from another dev on the same question: https://news.ycombinator.com/item?id=13249724 Quote:"Multicores parallel shared nothing architecture like the on in the TurboPFor inverted index app and a ram resident inverted index." > Regards and good luck, > Alex. > ---- > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 8 February 2017 at 03:39, Dorian Hoxha <dorian.ho...@gmail.com> wrote: > > So, am I asking too much (maybe), is this forum dead (then where to ask ? > > there is extreme noise here), is lucene perfect(of course not) ? > > > > > > On Wed, Jan 25, 2017 at 5:01 PM, Dorian Hoxha <dorian.ho...@gmail.com> > > wrote: > >> > >> Was thinking also how bing doesn't use posting lists and also compiling > >> queries ! > >> About the queries, I would've think it wouldn't be as high overhead as > >> queries in in rdbms since those apply on each row while on search they > apply > >> on each bitset. > >> > >> > >> On Mon, Jan 23, 2017 at 6:04 PM, Jeff Wartes <jwar...@whitepages.com> > >> wrote: > >>> > >>> > >>> > >>> I’ve had some curiosity about this question too. > >>> > >>> > >>> > >>> For a while, I watched for a seastar-like library for the JVM, but > >>> https://github.com/bestwpw/windmill was the only one I came across, > and it > >>> doesn’t seem to be going anywhere. Since one of the points of the JVM > is to > >>> abstract away the platform, I certainty wonder if the JVM will ever > get the > >>> kinds of machine affinity these other projects see. Your > one-shard-per-core > >>> could probably be faked with multiple JVMs and numactl - could be an > >>> interesting experiment. > >>> > >>> > >>> > >>> That said, I’m aware that a phenomenal amount of optimization effort > has > >>> gone into Lucene, and I’d also be interested in hearing about things > that > >>> worked well. > >>> > >>> > >>> > >>> > >>> > >>> From: Dorian Hoxha <dorian.ho...@gmail.com> > >>> Reply-To: "dev@lucene.apache.org" <dev@lucene.apache.org> > >>> Date: Friday, January 20, 2017 at 8:12 AM > >>> To: "dev@lucene.apache.org" <dev@lucene.apache.org> > >>> Subject: How would you architect solr/lucene if you were starting from > >>> scratch for them to be 10X+ faster/efficient ? > >>> > >>> > >>> > >>> Hi friends, > >>> > >>> I was thinking how scylladb architecture works compared to cassandra > >>> which gives them 10x+ performance and lower latency. If you were > starting > >>> lucene and solr from scratch what would you do to achieve something > similar > >>> ? > >>> > >>> Different language (rust/c++?) for better SIMD ? > >>> > >>> Use a GPU with a SSD for posting-list intersection ?(not out yet) > >>> > >>> Make it in-memory and use better data structures? > >>> > >>> Shard on cores like scylladb (so 1 shard for each core on the machine) > ? > >>> > >>> External cache (like keeping n redis-servers with big ram/network & > slow > >>> cpu/disk just for cache) ?? > >>> > >>> Use better data structures (like algolia autocomplete radix ) > >>> > >>> Distributing documents by term instead of id ? > >>> > >>> Using ASIC / FPGA ? > >>> > >>> > >>> > >>> Regards, > >>> > >>> Dorian > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >