Re: [Archivesspace_Users_Group] FW: normalization in ArchivesSpace

Andrew Morrison Tue, 21 Apr 2020 02:03:40 -0700

You can see what the default Solr config in ArchivesSpace does withthese queries in this screenshot of Solr's analysis tool on adevelopment system:


https://user-images.githubusercontent.com/33721187/79843129-b9549680-83b1-11ea-8d3a-670f4e84a6de.png

On the left is how it indexes Governors and on the right is how ithandles a query for Governor's. The first step, marked "ST" in lightgrey, is the Standard Tokenizer. As you can see, it does nothing in thiscase, and passes both unchanged to the next step ("SF", the stop wordfilter, which also does nothing.)

Changing to a different tokenizer could change how apostrophes arehandled. Or adding a stemmer might do the same and also ensure the sameresults are returned for singular and plural forms of most words. Butthese sort of customizations are language-specific. What works forEnglish probably wouldn't work, and might have negative effects, forfinding materials in Spanish, French or German. This is one of theadvantages of using an external Solr server<https://archivesspace.github.io/archivesspace/user/running-archivesspace-with-external-solr/>set up - that you can tailor it for your collections and your users. Italso means you can run a more up-to-date version of Solr, with more andbetter options<https://lucene.apache.org/solr/guide/7_7/filter-descriptions.html> (weuse Word Delimiter Graph Filter and KStem.)


Andrew.


On 20/04/2020 23:44, Trevor Thornton wrote:

From what I can tell, the Solr Standard Tokenizer<https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer>(which I think is the one used for most text fields) doesn't excludethe apostrophe or use it as a delimiter to split the word (as it doeswith other punctuation marks), so a query for "Governor’s" won't match"Governors" and vice versa. I don't know of a convenient workaround(without modifying the Solr schema).

On Mon, Apr 20, 2020 at 4:38 PM Hoffner, Bailey E. <[email protected]<mailto:[email protected]>> wrote:


    Hello All,

    One of our catalogers noticed an issue with search functionality
    and normalization (see below). Has anyone dealt with this issue
    before, or know of a workaround?

    Thanks!

    -Bailey

    Bailey Hoffner, MLIS

    Metadata and Collections Management Archivist

    University of Oklahoma Libraries

    405-325-1566

    *From: *"Steele, Thomas D." <[email protected]
    <mailto:[email protected]>>
    *Date: *Monday, April 20, 2020 at 3:26 PM
    *To: *"Hoffner, Bailey E." <[email protected] <mailto:[email protected]>>
    *Subject: *normalization in ArchiveSpace

    Searching for a term such as “Governors’” yields no hits if you
    spell it as “Governor’s”.  both terms should normalize to
    “Governors”, but it’s possible the latter is normalizing to
    “Governor s”

    Tom Steele

    Science and Technology Cataloger

    University of Oklahoma Libraries

    Norman, OK   73019

    (405) 325-4082

    [email protected] <mailto:[email protected]>

    /"Books constitute capital. A library book lasts as long as a
    house, for hundreds of years. It is not, then, an article of mere
    consumption but fairly of capital, and often in the case of
    professional men, setting out in life, it is their only
    capital/./" -- Thomas Jefferson/

    _______________________________________________
    Archivesspace_Users_Group mailing list
    [email protected]
    <mailto:[email protected]>
    http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group



--
Trevor Thornton
Applications Developer, Digital Library Initiatives
North Carolina State University Libraries

_______________________________________________
Archivesspace_Users_Group mailing list
[email protected]
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

_______________________________________________
Archivesspace_Users_Group mailing list
[email protected]
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

Re: [Archivesspace_Users_Group] FW: normalization in ArchivesSpace

Reply via email to