Hi Pascal, Thank you very much for sharing your work. I have been maintaining nearly 12 dspace instances in different univs and I am trying to standardize my automation scripts for ingesting items into dspace using REST APIs. Your project is definitely getting my attention. I will give it a try very soon. One question: Have you considered supporting Entity Model for your project? Best regards, Fatih
On Friday, March 27, 2026 at 10:48:26 PM UTC+3 Pascal Calarco wrote: > Hi folks, > > I am releasing a set of Python scripts I have been working on since last > late November called FedHarv (short for federated harvesting). Its > available now publicly under an AGPL v.3 license for all to use, modify and > build upon, provided it stays as free and open source software. > > https://github.com/pvcalarco/FedHarv > > FedHarv is a sophisticated, production-ready federated harvester for open > access academic content, designed to automatically discover, enrich, and > harvest scholarly articles with PDF availability from multiple sources. > > The problem we are trying to provide a solution for is to to the extent > possible, identify Creative Commons-licensed scholarly works (journal > articles, letters to the editor, retractions, errata, book chapters, > conference proceedings, and open access books) that are authored by > researchers, faculty and students of an institution of higher education or > research, harvest the metadata and associated PDF from a variety of API > services. Where we can't find a non-paywalled version, we use Unpaywall > to identify author manuscripts and preprints that can be deposited. > > The script then provides these metadata and PDFs in a series of folders > for the repository manager to quickly check (for departmental and > institutional affiliation and CC license correctness), package these up > into Simple Archive Format (SAF), ready for batch ingest into DSpace > institutional repositories. > > The harvester isn't perfect and you should still check to make sure closed > or bronze OA items were not harvested in error, but the author has made > every effort to do so and has encountered few such errors after much > iteration over this. > > With this tool, you'll be able to gather together as much of the Open > Access scholarly works that your community has formally written and legally > deposit these into your organization's institutional repository. If you > find this software useful, please drop me an email! > > ## 🤖 AI Assistance & Authorship Disclosure > > **FedHarv** was designed, architected, and verified by **Pascal Calarco**. > > During the development process, AI-augmented coding tools (Google Gemini > and GitHub Copilot) were utilized to: > * Generate boilerplate code and initial function structures. > * Refactor logic for performance (e.g., implementing multi-threading). > * Assist with documentation, licensing (AGPL-v3), and testing suites. > > All AI-generated suggestions have been manually reviewed, tested, and > integrated by the author to ensure technical accuracy, > scholarly metadata standards, and adherence to best practices in library > and information science. > > All best wishes, > > Pascal > > > > > > > > > *Pascal Calarco*¦ Scholarly Communications Librarian and Systems Librarian > > Lead, Discovery Team > > Research & Publishing Services Unit > Librarian IV > > University of Windsor ¦ J. Francis Leddy Library > 401 Sunset Avenue ¦ Windsor, Ontario N9B 3P4 > (519)-253-3000 <(519)%20253-3000> ¦ leddy.uwindsor.ca > > > > > > *The University of Windsor is situated on the traditional territory of the > Three Fires Confederacy of First Nations: the Ojibwa, the Odawa, and the > Potawatomi.* > > > > *Join the fight for post-secondary education at Education2025.ca.* > -- All messages to this mailing list should adhere to the Code of Conduct: https://lyrasis.org/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/dspace-community/c57cb85e-12c4-4032-a392-fc5b465dd6ban%40googlegroups.com.
