Adding to the aforementioned limitations and desired improvements, another issue with Zebra is that it index builds scale very poorly. Indexes must be built serially, and it's a really problematic limitation for large catalogs. A parallelized index creation process would be a huge benefit. Does SOLR have that capability?
Also, what do you have in mind for continuing Z39.50 support? This is a must-have feature for many libraries. I've been investigating the possibility of using MongoDB or a similar dynamic indexer to replace Zebra, but the need to write a Z39.50 front end adds a great deal more work to the project. Clay On Mon, Oct 4, 2010 at 1:10 AM, LAURENT Henri-Damien < [email protected]> wrote: > Hi > As you already read in Paul previous message about > "BibLibre strategy for 3.4 and next version", we are growing, want be > involved in the community as previously. Paul promised some POCs, here > is one available. We also worked on Plack and support. We created a base > of script to search for Memoryleaks. We'll demonstrate that later. > > > zebra is fast and embeds native z3950 server. But it has also some major > drawbacks we have to cope with on our everyday life making it quite > difficult to maintain. > > 1. zebra config files are a nightmare. You can't drive the > configuration file easily. namely : Can't edit indexs via HTTP or > configuration. all is in files hardcoded on disk. => you can't list > indexes you can't change indexes, you can't edit indexes, you can't say > I want this index at OPAC, that in intranet. (Could be done with > scraping ccl.properties, and then record.abs and bib1.att.... But what a > HELL) So you cannot customize configuration defining the indexes you > want easily. And ppl donot get a translation of the indexes since all > the indexes are hardcoded in the ccl.properties and we donot have a > translation process so that ccl attributes could be translated into > different languages. > > 2. no real-time indexing : the use of a crontab is poor: when you > add an authority while creating a biblio, you have to wait some some > minutes to end your biblio (might be solved since zebra has some way to > index biblios via z3950 extended services, but hard and should be tested > and at the time community first tested that, a performance problem was > raised on indexing.) > > 3. no way to access/process/delete data easily. If you have indexes > in it or have some problems with your data, you have to reindex the > whole stuff and indexing errors are quite difficult to detect. > > 4. during index process of a file, if you have a problem in your > data, zebraidx just fails silently... And this is NOT secure. And you have > no way to know WHICH biblio made the process crash. We had a LOT of > trouble with Aix-Marseille universities that have some > arabic translitterated biblios that makes zebra/icu completly crash ! We > had to do some recursive script to find 14 biblios on 730 000 that makes > zebra crash (even is properly stored & displayed) > > 5. facets are not working properly : they are on the result displayed > because there are problems with diacritics & facets that can't be solved > as of today. And noone can provide a solution (we spoke about that with > indexdata and no clear solution was really provided. > > 6. zebra does not evolve anymore. There is no real community around > it, it's just an opensource indexdata software. We sent many questions > onlist and never got answers. We could pay for better support but the > fee required is quite deterrent and benefit is still questionable. > > 7. icu & zebra are colleagues, not really friends : right truncation > not working, fuzzy search not working and facets. > > 8. we use a deprecated way to define indexes for biblios (grs1) and > the tool developped by indexdata to change to DOM has many flaws. we > could manage and do with it. But is it worth the strive ? > > I think that every one agrees that we have to refactor C4::Search. > Indeed, query parser is not able to manage independantly all the > configuration options. And usage of usmarc as internal for biblio comes > with a serious limitation of 9999 bytes, which for big biblios with many > items, is not enough. > > BibLibre investigated in a catalogue based on solr. > A University in France contracted us for that development. > This University is in relation with all the community here in France and > solr will certainly be adopted by all the libraries France wide. > We are planning to release the code on our git early spring next year > and rebase on whatever Koha version will be released at that time 3.4 or > 3.6. > > > Why ? > > Solr indexes with data with HTTP. > It can provide fuzzy search, search on synonyms, suggestions > It can provide facet search, stemming. > utf8 support is embedded. > Community is really impressively reactive and numerous and efficient. > And documentation is very good and exhaustive. > > You can see the results on solr.biblibre.com and > catalogue.solr.biblibre.com > > http://catalogue.solr.biblibre.com/cgi-bin/koha/opac-search.pl?q=jean > http://solr.biblibre.com/cgi-bin/koha/admin/admin-home.pl > you can log there with demo/demo lgoin/password > > http://solr.biblibre.com/cgi-bin/koha/solr/indexes.pl > is the page where ppl can manage their indexes and links. > > a) Librarians can define their own indexes, and there is a plugin that > fetches data from rejected authorities and from authorised_values (that > could/should have been achieved with zebra but only with major work on > xslt). > > b) C4/Search.pm count lines of code could be shrinked ten times. > You can test from poc_solr branch on > git://git.biblibre.com/koha_biblibre.git > But you have to install solr. > > Any feedback/idea welcome. > -- > Henri-Damien LAURENT > BibLibre > _______________________________________________ > Koha-devel mailing list > [email protected] > http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
_______________________________________________ Koha-devel mailing list [email protected] http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
