Op woensdag 11 juni 2014 20:11:39 UTC+12 schreef Jörg Prante:
>
>
> The browsing UI you mean is a traditional instrument librarians are used 
> to when they want machines to sort and not to search. Here, it is called 
> "register search" but in fact it is a sorted list. Beside being a 
> traditional approach which is not the most modern form of search, I made 
> the experience that librarians do not understand the difference between 
> search and sort - they learned in school they must page manually through 
> all results, and all relevance ranking comes form the devil himself.
>

That sounds like it. I know that it's a thing very few people would want, 
however the people who are giving us money to do this would like it, so 
that makes it into an important thing :) I have the search aspects mostly 
built already, but need this browse support.
 

> Nevertheless, you can build sorted lists in ES but with a small trick - an 
> extra index.
>
> The author list can be implemented with ES "sort" in the search action, 
> using "from" and "to" to page through results. You have to index the author 
> names in an author name index (or better, an authoritative name index). 
> There, you should index two forms of the author field, one for search 
> (tokenized) and another unique form to sort on with the keyword analyzer so 
> it's unchanged. You have to take care to use the "preferred author name" of 
> the "main entry" for sort, which is determined by library catalog rules and 
> not necessarily the 
>

That's close, but a bit different from what I want. If I have an 'author' 
index, and I search for things starting with 'Smith', sorting A->Z, I want 
to be able to page back, and get the results that are closer to the start 
of the alphabet. That is to say, it should tell me the "Smith" is the 524th 
(e.g.) author entry across the whole index when sorted, then I can set up 
my results page so the user can page backwards. Or, if I could do a 
startswith search, and have a negative "from" so it looks backwards in the 
results...

But that still won't work, as startswith doesn't give me a place in the 
index, it gives me a subset of the index restricted by the query.

Essentially, I think that this would give me an authority searcher, when I 
want an authority browser. I _could_ browse through it starting at 'A', but 
I really need to be able to jump to a point in the middle and go 
backwards/forwards from there.
 

> form how the author name is written. Variant names of an author should be 
> kept aside, just for search. So you should index all author names into one 
> authority index for this purpose, where each author is represented by a 
> single document. This should be easy since librarians are used to mark all 
> author names by a unique key (at least they carry the biographical dates 
> with them)
>  
>
Beside author names, you'd have to deal with corporate names, conference 
> names, and subject names.
>

Those are implementation details that I'll work out when I have something 
working. Believe me, I've spent a lot of time working with MARC data, I 
understand all the weird ways that it does things :)
 

> I do not fully understand how you intend to process facet results. This 
> seems a bit overengineered. ES can not page through facet (aggregation) 
> results efficiently, as you already noted, this would have to be a task for 
> the client application to work on, and for millions of entries, it would be 
> very sluggish experience, and the required heap resources would be far from 
> reasonable.
>

Well, it's the only way I could think of that would function at all. But 
it's far from an ideal solution, in that it's going to work at all. 
 

> The "field starts with" is also easy to implement, look at the 
> "match_phrase_prefix" query, here is a blog post by Zachary Tong 
>
> http://www.elasticsearch.org/blog/starts-with-phrase-matching/
>

Yeah, I have that working already, but it's not quite what I want. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c5e78db5-857f-4c92-a919-20b478fe67f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to