Issue 52 of the Code4Lib Journal has been published.  Many thanks to the 
authors and the editorial committee!

The new issue is available at: 
https://journal.code4lib.org/issues/issues/issue52

Here are the abstracts from this issue:

Issue 52, 2021-09-22

Editorial : The Cost of Knowing Our 
Users<https://journal.code4lib.org/articles/16208>

Mark Swenson

Some musings on the difficulty of wanting to know our users’ secrets and 
simultaneously wanting to not know them.

Building and Maintaining Metadata Aggregation Workflows Using Apache 
Airflow<https://journal.code4lib.org/articles/16171>

Leanne Finnigan and Emily Toner

PA Digital is a Pennsylvania network that serves as the state’s service hub for 
the Digital Public Library of America (DPLA). The group developed a homegrown 
aggregation system in 2014, used to harvest digital collection records from 
contributing institutions, validate and transform their metadata, and deliver 
aggregated records to the DPLA. Since our initial launch, PA Digital has 
expanded significantly, harvesting from an increasing number of contributors 
with a variety of repository systems. With each new system, our highly 
customized aggregator software became more complex and difficult to maintain. 
By 2018, PA Digital staff had determined that a new solution was needed. From 
2019 to 2021, a cross-functional team implemented a more flexible and scalable 
approach to metadata aggregation for PA Digital, using Apache Airflow for 
workflow management and Solr/Blacklight for internal metadata review. In this 
article, we will outline how we use this group of applications and the new 
workflows adopted, which afford our metadata specialists more autonomy to 
contribute directly to the ongoing development of the aggregator. We will 
discuss how this work fits into our broader sustainability planning as a 
network and how the team leveraged shared expertise to build a more stable 
approach to maintenance.

Closing the Gap between FAIR Data Repositories and Hierarchical Data 
Formats<https://journal.code4lib.org/articles/16223>

Connor B. Bailey, Fedor F. Balakirev, and Lyudmila L. Balakireva

Many in the scientific community, particularly in publicly funded research, are 
pushing to adhere to more accessible data standards to maximize the 
findability, accessibility, interoperability, and reusability (FAIR) of 
scientific data, especially with the growing prevalence of machine learning 
augmented research. Online FAIR data repositories, such as the Open Science 
Framework (OSF), help facilitate the adoption of these standards by providing 
frameworks for storage, access, search, APIs, and other features that create 
organized hubs of scientific data. However, the wider acceptance of such 
repositories is hindered by the lack of support of hierarchical data formats, 
such as Technical Data Management Streaming (TDMS) and Hierarchical Data Format 
5 (HDF5), that many researchers rely on to organize their datasets. Various 
tools and strategies should be used to allow hierarchical data formats, FAIR 
data repositories, and scientific organizations to work more seamlessly 
together. A pilot project at Los Alamos National Laboratory (LANL) addresses 
the disconnect between them by integrating the OSF FAIR data repository with 
hierarchical data renderers, extending support for additional file types in 
their framework. The multifaceted interactive renderer displays a tree of 
metadata alongside a table and plot of the data channels in the file. This 
allows users to quickly and efficiently load large and complex data files 
directly in the OSF webapp. Users who are browsing files can quickly and 
intuitively see the files in the way they or their colleagues structured the 
hierarchical form and immediately grasp their contents. This solution helps 
bridge the gap between hierarchical data storage techniques and FAIR data 
repositories, making both of them more viable options for scientific 
institutions like LANL which have been put off by the lack of integration 
between them.

Conspectus: A Syllabi Analysis Platform for Leganto Data 
Sources<https://journal.code4lib.org/articles/15995>

David Massey, Thomas Sødring

In recent years, higher education institutions have implemented electronic 
solutions for the management of syllabi, resulting in new and exciting 
opportunities within the area of large-scale syllabi analysis. This article 
details an information pipeline that can be used to harvest, enrich and use 
such information.

Core Concepts and Techniques for Library Metadata 
Analysis<https://journal.code4lib.org/articles/16078>

Stacie Traill and Martin Patrick

Metadata analysis is a growing need in libraries of all types and sizes, as 
demonstrated in many recent job postings. Data migration, transformation, 
enhancement, and remediation all require strong metadata analysis skills. But 
there is no well-defined body of knowledge or competencies list for library 
metadata analysis, leaving library staff with analysis-related responsibilities 
largely on their own to learn how to do the work effectively. In this paper, 
two experienced metadata analysts will share what they see as core knowledge 
areas and problem solving techniques for successful library metadata analysis. 
The paper will also discuss suggested tools, though the emphasis is 
intentionally not to prescribe specific tools, software, or programming 
languages, but rather to help readers recognize tools that will meet their 
analysis needs. The goal of the paper is to help library staff and their 
managers develop a shared understanding of the skill sets required to meet 
their library’s metadata analysis needs. It will also be useful to individuals 
interested in pursuing a career in library metadata analysis and wondering how 
to enhance their existing knowledge and skills for success in analysis work.

Digitization Decisions: Comparing OCR Software for Librarian and Archivist 
Use<https://journal.code4lib.org/articles/16132>

Leanne Olson and Veronica Berry

This paper is intended to help librarians and archivists who are involved in 
digitization work choose optical character recognition (OCR) software. The 
paper provides an introduction to OCR software for digitization projects, and 
shares the method we developed for easily evaluating the effectiveness of OCR 
software on resources we are digitizing.

We tested three major OCR programs (Adobe Acrobat, ABBYY FineReader, Tesseract) 
for accuracy on three different digitized texts from our archives and special 
collections at the University of Western Ontario. Our test was divided into two 
parts: a word accuracy test (to determine how searchable the final documents 
were), and a test with a screen reader (to determine how accessible the final 
documents were). We share our findings from the tests and make recommendations 
for OCR work on digitized documents from archives and special collections.

Introducing SAGE: An Open-Source Solution for Customizable Discovery Across 
Collections<https://journal.code4lib.org/articles/15740>

David B. Lowe, James Creel, Elizabeth German, Douglas Hahn, and Jeremy Huff

Digital libraries at research universities make use of a wide range of unique 
tools to enable the sharing of eclectic sets of texts, images, audio, video, 
and other digital objects. Presenting these assorted local treasures to the 
world can be a challenge, since text is often siloed with text, images with 
images, and so on, such that per type, there may be separate user experiences 
in a variety of unique discovery interfaces. One common tool that has been 
developed in recent years to potentially unite them all is the Apache Solr 
index. Texas A&M University (TAMU) Libraries has harnessed Solr for internal 
indexing for repositories like DSpace, Fedora, and Avalon. Impressed by 
frameworks like Blacklight at peer institutions, TAMU Libraries wrote an 
analogous set of tools in Java, and thus was born SAGE, the Solr AGgregation 
Engine, with two primary functions: 1) aggregating Solr indices or “cores,” 
from various local sources, and 2) presenting search facility to the user in a 
discovery interface.

Leveraging a Custom Python Script to Scrape Subject Headings for 
Journals<https://journal.code4lib.org/articles/16080>

Shelly R. McDavid, Eric McDavid, and Neil E. Das

In our current library fiscal climate with yearly inflationary cost increases 
of 2-6+% for many journals and journal package subscriptions, it is imperative 
that libraries strive to make our budgets go further to expand our suite of 
resources. As a result, most academic libraries annually undertake some form of 
electronic journal review, employing factors such as cost per use to inform 
budgetary decisions. In this paper we detail some tech savvy processes we 
created to leverage a Python script to automate journal subject heading 
generation within the OCLC’s WorldCat catalog, the MOBIUS (A Missouri Library 
Consortium) Catalog, and the VuFind Library Catalog, a now retired catalog for 
the CARLI (Consortium for Academic and Research Libraries in Illinois). We also 
describe the rationale for the inception of this project, the methodology we 
utilized, the current limitations, and details of our future work in automating 
our annual analysis of journal subject headings by use of an OCLC API.

On Two Proposed Metrics of Electronic Resource 
Use<https://journal.code4lib.org/articles/16087>

William Denton

There are many ways to look at electronic resource use, individually or 
aggregated. I propose two new metrics to help give a better understanding of 
comparative use across an online collection. Users per mille is a relative 
annual measure of how many users a platform had for every thousand potential 
users: this tells us how many people used a given platform. Interest factor is 
the average number of uses of a platform by people who used it more than once: 
this tells us how much people used a given platform. These two metrics are 
enough to give us good insight into collection use. Dividing each into 
quartiles allows a quadrant comparison of lows and highs on each metric, giving 
a quick view of platforms many people use a lot (the big expensive ones), many 
people use very little (a curious subset), a few people use a lot (very 
specific to a narrow subject) and a few people use very little (deserves 
attention). This helps understand collection use and informs collection 
management.

Using Low Code to Automate Public Service Workflows: Three 
Cases<https://journal.code4lib.org/articles/16096>

Dianna Morganti and Jess Williams

Public service librarians without coding experience or technical education may 
not always be aware of or consider automation to be an option to streamline 
their regular work tasks, but the new prevalence of enterprise-level low code 
solutions allows novices to take advantage of technology to make their work 
more efficient and effective. Low code applications apply a graphic user 
interface on top of a coding platform to make it easy for novices to leverage 
automation at work. This paper presents three cases of using low code solutions 
for automating public service problems using the prevalent Microsoft Power 
Automate application, available in many library workplaces that use the 
Microsoft Office ecosystem. From simplifying the communication and scheduling 
process for instruction classes to connecting our student workers’ hourly floor 
counts to our administrators’ dashboard of building occupancy, we’ve leveraged 
simple low code automation in a scalable and replicable manner. Pseudo-code 
examples provided.

An XML-Based Migration from Digital Commons to Open Journal 
Systems<https://journal.code4lib.org/articles/15988>

Cara M. Key

The Oregon Library Association has produced its peer-reviewed journal, the OLA 
Quarterly (OLAQ), since 1995, and OLAQ was published in Digital Commons 
beginning in 2014. When the host institution undertook to move away from 
Bepress, their new repository solution was no longer a good match for OLAQ. 
Oregon State University and University of Oregon agreed to move the journal 
into their joint instance of Open Journal Systems (OJS), and a small team from 
OSU Libraries carried out the migration project. The OSU project team declined 
to use PKP’s existing migration plugin for a number of reasons, instead 
pursuing a metadata-centered migration pipeline from Digital Commons to OJS. We 
used custom XSLT to convert tabular data exported from Bepress into PKP’s 
Native XML schema, which we imported using the OJS Native XML Plugin. This 
approach provided a high degree of control over the journal’s metadata and a 
robust ability to test and make adjustments along the way. The article 
discusses the development of the transformation stylesheet, the metadata 
mapping and cleanup work involved, as well as advantages and limitations of 
using this migration strategy.

Reply via email to