On Fri, Apr 3, 2009 at 12:18 PM, Mark Voorhies <[email protected]> wrote: > On Saturday 14 March 2009 7:41 pm Bryan Bishop wrote: >> Hi all, >> >> This email comes about because of the recent thread about bibliography >> management. In particular, I've always had my eye out for what sort of >> software should (or should not) exist for scientific papers. > > cb2bib can extract BibTeX references from a set of pdf files with or without > user assistance (e.g., in the supervised mode, cb2bib guesses journal, > volume, title, etc. and provides a window where the user can select > (pdftotext generated) text and assign it to appropriate fields).
I looked over that a few weeks ago, but I'm not entirely sure about it. Can you please explain whether or not it performs the following? The website is not clear. (1) Given a PDF that essentially consists of a collection of images (scanned data), will it segment the page, extract text, and figure out what the title of the paper is and the citation information (etc.), or will it extract the references? (2) In unsupervised mode, does it automatically extract references and guess to which fields the information belongs to? (3) Does it extract BibTeX encoded in PDF files, or does it extract the PDF-encoded content (i.e. which may or may not preserve BibTeX markup)? Thank you! :-) - Bryan http://heybryan.org/ 1 512 203 0507 -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

