On 10/1/07, Christiaan Hofman <[EMAIL PROTECTED]> wrote:
> I haven't had time to test out or look at the new web group changes.
> So I don't know what kind of scrapers you have.

citeulike.org, hubmed.org, ACM digital library, and google scholar.

> But would it be
> possible to add some scrapers perhaps for the arXiv? I haven't looked
> much at what they added there to make things easier.

I don't see an easy way to get an official BibTeX entry, but it looks
like we could get the title, authors, and abstract from their HTML,
which is nicely formed and uses meaningful html class names. If a
paper has been accepted to a journal, it gets a journal-id which would
give us the journal name/number, pages, etc. That isn't marked up, but
it might be in a common format that's regex-parseable.

However, maybe the better way to get arxiv support would be to look at
the Open Archives Initiative support they have:
http://arxiv.org/help/oa

Other sites use that as well, so if we could just scrape an identifier
from the page then get useful metadata from their OAI-PMH server, that
might be a good way to support more sites. I'll make a note to look
into that more, but I can't do it right now.

-mike


> On 1 Oct 2007, at 7:53 PM, Michael McCracken wrote:
>
> > Has anyone had a chance to try out the web group?
> >
> > Let me know if there is a site you use to find papers that I could add
> > support for, if it would help you test things.
> >
> > Or does everyone else already enjoy good searching support through the
> > z39.50 stuff, and it's only those of us in CS with backwards
> > publishers?
> >
> > -mike
> >
> > --
> > Michael McCracken
> > UCSD CSE PhD Candidate
> > research: http://www.cse.ucsd.edu/~mmccrack/
> > misc: http://michael-mccracken.net/wp/
> >
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Bibdesk-develop mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bibdesk-develop
>


-- 
Michael McCracken
UCSD CSE PhD Candidate
research: http://www.cse.ucsd.edu/~mmccrack/
misc: http://michael-mccracken.net/wp/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bibdesk-develop mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bibdesk-develop

Reply via email to