For turning a bibliography into RIS format, I wrote a tool based on a whole
pile of regex commands bundled into sed files wrapped in an AppleScript app:
Webpage: http://deborahfitchett.com/toys/ref2ris/
Code4Lib article: http://journal.code4lib.org/articles/6286
Let me know if you've got
See also http://wiki.tei-c.org/index.php/Heuristics , which discusses
this problem more broadly conceived. I've just added a link to the
archives of this very discussion. --Kevin
On 6/18/15 12:52 PM, Matt Sherman wrote:
The hope is to take these bibliographies put it into more of a web
Hi all,
As Matt's problem is related to parsing citations, I would definitely have a
look at the tools cited by Cindy because going with regexp will quickly become
a nightmare. Even if citations have been created following a common reference
style: there will necessarily be incoherence,
That is a pretty good summation of it yes. I appreciate the suggestions,
this is a bit of a new realm for me and while I know what I want it to do
and the structure I want to put it in, the conversion process has been
eluding me so thanks for giving me some tools to look into.
On Thu, Jun 18,
On Jun 18, 2015, at 12:02 PM, Matt Sherman matt.r.sher...@gmail.com wrote:
I am working with colleague on a side project which involves some scanned
bibliographies and making them more web searchable/sortable/browse-able.
While I am quite familiar with the metadata and organization aspects we
How you want to preprocess and structure the data depends on what you hope
to achieve. Can you say more about what you want the end product to look
like?
kyle
On Thu, Jun 18, 2015 at 10:08 AM, Matt Sherman matt.r.sher...@gmail.com
wrote:
That is a pretty good summation of it yes. I appreciate
Hi Code4Libbers,
I am working with colleague on a side project which involves some scanned
bibliographies and making them more web searchable/sortable/browse-able.
While I am quite familiar with the metadata and organization aspects we
need, but I am at a bit of a loss on how to automate the
The hope is to take these bibliographies put it into more of a web
searchable/sortable format for researchers to make use out of them. My
colleague was taking some inspiration from the Marlowe Bibliography (
https://marlowebibliography.org/), though we are hoping to possibly get a
bit more robust
We¹re actually also working on getting a bibliography from a Word Doc to a
more structured format. We¹re using regular expressions in LibreOffice
Writer to mark up the citations, then insert tabs between the elements,
and then copy into a spreadsheet (similar to what¹s described in
It may depend on the format of the PDF, but I’ve used the Scraperwiki Python
Module ‘pdf2xml’ function to extract text data from PDFs in the past. There is
a write up (not by me) at
http://schoolofdata.org/2013/08/16/scraping-pdfs-with-python-and-the-scraperwiki-module/
Thanks, that is interesting since we can export from the PDFs, and while
the OCR text is a little messy it is in decent shape. I'll certainly look
into that.
On Thu, Jun 18, 2015 at 3:13 PM, Gordon, Bonnie bgor...@rockarch.org
wrote:
We¹re actually also working on getting a bibliography from a
11 matches
Mail list logo