On 10/1/07, Christiaan Hofman <[EMAIL PROTECTED]> wrote: > I haven't had time to test out or look at the new web group changes. > So I don't know what kind of scrapers you have.
citeulike.org, hubmed.org, ACM digital library, and google scholar. > But would it be > possible to add some scrapers perhaps for the arXiv? I haven't looked > much at what they added there to make things easier. I don't see an easy way to get an official BibTeX entry, but it looks like we could get the title, authors, and abstract from their HTML, which is nicely formed and uses meaningful html class names. If a paper has been accepted to a journal, it gets a journal-id which would give us the journal name/number, pages, etc. That isn't marked up, but it might be in a common format that's regex-parseable. However, maybe the better way to get arxiv support would be to look at the Open Archives Initiative support they have: http://arxiv.org/help/oa Other sites use that as well, so if we could just scrape an identifier from the page then get useful metadata from their OAI-PMH server, that might be a good way to support more sites. I'll make a note to look into that more, but I can't do it right now. -mike > On 1 Oct 2007, at 7:53 PM, Michael McCracken wrote: > > > Has anyone had a chance to try out the web group? > > > > Let me know if there is a site you use to find papers that I could add > > support for, if it would help you test things. > > > > Or does everyone else already enjoy good searching support through the > > z39.50 stuff, and it's only those of us in CS with backwards > > publishers? > > > > -mike > > > > -- > > Michael McCracken > > UCSD CSE PhD Candidate > > research: http://www.cse.ucsd.edu/~mmccrack/ > > misc: http://michael-mccracken.net/wp/ > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Bibdesk-develop mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/bibdesk-develop > -- Michael McCracken UCSD CSE PhD Candidate research: http://www.cse.ucsd.edu/~mmccrack/ misc: http://michael-mccracken.net/wp/ ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bibdesk-develop mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bibdesk-develop
