Hi, there are basically 2 different kinds of searches in kallithea.
1. filtering revisions Mads mentioned 2 years ago that he plans to add some support for this https://bitbucket.org/conservancy/kallithea/issues/18/search-needs-to-be-improved 2. searching in multiple repositories (inlc. fulltext searching in the files) I think the first point is pretty much strait forward. Git and Mercurial support filtering revisions. It basically 'only' needs to be implemented. :-) But the second one is more complicated. There are multiple problems with the current implementation. 1. For starters since 9c5f794df7cd the make-index command is broken. But that can be easily fixed. 2. What is no so easy to fix, is the fact that indexing is currently incredibly slow. 3. The indexing is done periodically, it only indexes the tip revision at indexing time and the search results refer to the tip at search time. Therefore a) you may get hits that are no longer valid b) you may get no hits even though the string is present now c) you can't search for things that have been removed I believe all this is solvable. I looked into the code and found a few places where the indexing can definitely be improve. But I don't have much experience with whoosh. So I'm not sure if it is even worth it to fix the current implementation, or if I should restart with solr or elastic search. My questions to you guys are: 1. Do you have experience with whoosh? Does it scale to gigabytes of data? 2. Would you even pull a implementation that requires installing solr? Note: I believe installation and setup of solr can be automated. 3. Or maybe you thing the fulltext search should be dropped all together. BTW: I'd use the linux kernel as benchmark. I think, if we could handle more then half a million revisions, with more then a gig of files, we would be fine.
_______________________________________________ kallithea-general mailing list [email protected] http://lists.sfconservancy.org/mailman/listinfo/kallithea-general
