http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=12478
--- Comment #80 from Robin Sheat <ro...@catalyst.net.nz> --- (In reply to Jonathan Druart from comment #79) > The first problem I got was to find a MARC21 DB (since the UNIMARC mappings > are not defined, I cannot test with an UNIMARC DB). The UNIMARC mappings should be defined, though not tested. > I have used the one created for the sandboxes > (http://git.koha-community.org/gitweb/?p=contrib/global.git;a=blob;f=sandbox/ > sql/sandbox1.sql.gz;h=19268bccb43b2a33d5644b7d86cbb1abb323016b;hb=HEAD). But > there are only 436 biblios, it's not enough to test some stuffs (facets for > instance). > Or maybe you can share your DB? I could, but I think we'll get more useful results from different databases. > Here some notes: > > 1/ Add deps to C4/Installer/PerlDependencies.pm Yeah, I'm mostly waiting for things to settle (which they have now.) > 2/ The number of tests provided is very low. Yes, I've been meaning to go back and add a pile more. > 3/ catalyst/elastic_search is 1004 commits behind origin/master, please > rebase It's just a tedious process, so I keep putting it off :) should do that soon though. > 4/ The message "No 'elasticsearch' block is defined in koha-conf.xml" should > be raised before starting the indexation process, and not on commiting the > first batch. Added to my TODO. > 5/ You really need to tune the default value for the commit :) > commit 100: perl misc/search_tools/rebuild_elastic_search.pl -b 77.57s > user 0.86s system 91% cpu 1:25.62 total > commit 1000: perl misc/search_tools/rebuild_elastic_search.pl -b 24.68s > user 0.52s system 79% cpu 31.595 total > For Solr, we used 5000. > Yes I know, it's configurable. I just picked a number and haven't gone back to it. I'm also thinking that maybe dropping the committing entirely and just feeding straight into Catmandu and letting it do its own batching, rather than doubling up on it. More experimentation needed really, but definitely increasing the default is a sensible thing to do. FWIW, committing at 5,000: real 2m14.627s user 1m13.272s sys 0m2.228s 100: real 6m6.280s user 4m45.268s sys 0m2.828s That's a fair difference :) > 6/ Verbose does not work as expected, it could be fixed with Oops. TODOed. > > 7/ perl -e "use > Pod::Checker;podchecker('misc/search_tools/rebuild_elastic_search.pl')"; > *** WARNING: empty section in previous paragraph at line 36 in file > misc/search_tools/rebuild_elastic_search.pl > *** ERROR: =over on line 38 without closing =back at line EOF in file > misc/search_tools/rebuild_elastic_search.pl TODOed. > 8/ 2 occurrences of "Solr" reintroduced in installer/data/mysql/sysprefs.sql > and koha-tmpl/intranet-tmpl/prog/en/modules/admin/preferences/admin.pref Must have come about when merging. TODOed. > 9/ Test! > I have launched some searches, with the same DB (the one from the sandbox). > On a local using your remote branch and another one using master (sandbox7 > provided by BibLibre). > > a. Search for 'd' (screentshot opac_search_for_d_sort_by_relevance.png ES on > the left, Zebra on the right). > Main differences: > - 183 vs 182 results (?) I wouldn't necessarily expect them to be the same, especially for a fairly meaningless search. > - the order is not the same (make sense) > - Locations and Places facets are missing Yeah, they're not faceted yet. Added that to my TODO list before I forget again. > - 6 entries are displayed in the facets for ES (current behavior is 5). > > b. Search for 'd', sort by title AZ (screenshot > opac_search_for_d_sort_by_title.png) > - Zebra displayes only 1 facet That's probably zebra being wrong then :) > - The order is still completely different I'm not sure which is right in this case, though I'm doing some work on the sorting at the moment that would allow you to pick which of the fields that end up in title you want to sort by. For example, it might be that ES is putting the ones with a lower series title near the start, even though it displays a different title. That'll be tuneable when I'm done with the current stuff. > c. Search for 'harry', sort by title AZ (screenshot > opac_search_for_harry_sort_by_title.png) > - 'Show more' links is displayed even if only 2 entries for a facet are > available Thought I'd fixed that, I'll have to have a look again. > - The order is still different ("The discovery of heaven" should be sorted > either before Dollhouse (if the is a stopword) either after "Hareios*" Dollhouse probably has another title field that's actually being used, as noted above. > - The availability is wrong for ES (The item for Dollhouse is not for loan) Why is it not for loan? Is it by policy, because there are no items, or because all items are issued? > d. Search for Books (limit by item type in the adv search), sort by pubdate > (screenshot limit_by_book_sort_by_pubdate.png) > - "Return to the last advanced search" link is not displayed I wonder how it knows to show that... I can't actually find that string in my checkout at all. > - The item types facet contains several entries, which does not make sense Curious. Are there situations where you have a biblio-level itemtype that differs from the item-level item type, or where one biblio might have multiple items with different item types? At the moment, I think they're all being thrown into one facet pot. > - The number of results highly differ (395 vs 364) Probably due to biblio-vs-item itemtype selection not being supported yet. If you can find it giving you a record that plain shouldn't match though, that'd be interesting. > - The order is still completely different. I had a look in the index and > found: > "Pictura murală*" has "pubdate":"||||" (/_search?q=_id:39&pretty) > The Korean Go Association's learn to play go "pubdate":"uuuu" > (/_search?q=_id:155&pretty) > Where do come from these values? Shouldn't be a date, or at least an integer? Could be the mapping is funny/broken for that. My test system has things like: "pubdate":"1998" though, which implies that it's correct. The actual mapping comes from: INSERT INTO `elasticsearch_mapping` (`indexname`, `mapping`, `facet`, `suggestible`, `type`, `marc21`, `unimarc`, `normarc`) VALUES ('biblios','pubdate',FALSE,FALSE,'','008_/7-10','100a_/9-12','008_/7-10'); On the other hand, it does have: "date-entered-on-file":"61006" which doesn't look right no matter how you carve it. > It's not easy to know what is indexed where. Did you have a look at the > indexes configuration page the Solr stuff had? > It provided an interface to configure the different mappings, it was very > useful. I haven't yet got to the point where I have the time to make an interface. At the moment it's all configured in elasticsearch_mapping.sql, which is somewhat human readable/editable. After loading the data into a table, it rewrites all those tables into a form that'll be more conducive for having a GUI on top of, but is less human readable. BTW, if you add <trace_to>Stderr</trace_to> to the <elasticsearch> block, it'll dump all the chatter with ES out to stderr, which is useful for seeing what exactly is going on. I warn you, there is a lot there though. Thanks for testing, even if I have a pile more things to fix now :) -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/