In article <[EMAIL PROTECTED]>, Christiaan Hofman <[EMAIL PROTECTED]> wrote:
> On 19 Mar 2008, at 2:23 PM, A wrote: > > > > > On 19 Mar 2008, at 11:14, Christiaan Hofman wrote: > >> Search groups are mostly custom stuff. Each type of search group is > >> based on a server object that gets a search string (from the search > >> field) and should return a list of publication items. It should get a > >> string representation of the items that can be parsed by one of our > >> string parsers (like bibtex, JSTOR, MARC, etc). This is not the case > >> with what gets out of this: this is a page that must be parsed with > >> much more complicated methods, including downloading links. It's not > >> just parsing the string you get back. Moreover, it does not accept a > >> general query string, but a very specific request. (Unless there is > >> another form that does support a query string?) > > > > What are the exact specifications of a search string? Do you mean a > > complex query with boolean connectors (as specified in > > http://bibdesk.sourceforge.net/manual/BibDesk%20Help_10.html#SEC34) > > It depends on the server. What I mean is: a query string that the user > can type in the search field. > > > > > ? What do you mean by saying DBLP is limited to "a very specific > > request"? > > > I did not see any URL that accepts something like a "? > query=searchterm" query component or something. It is a URL with a > particular syntax for passing the name. You cannot expect a user to > exactly type the query in that specific form. > > So basically, you need to be able to translate a query string the user > types to a request (URL) you cna send to the server. I did not see > anything like that in what you've told us. It also needs to return parseable content, since it's fed directly through one of the parsers. This makes them low maintenance and reliable, whereas all of the screen scrapers are liable to break without warning. IIRC there are 3 protocols supported at present: PubMed; z39.50; ISI is supported using SOAP web services. These are all formally documented, so they shouldn't break without warning. For instance, when I wrote the PubMed searching stuff, I used the API specified at http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html to figure out query options and syntax. If DBLP supplied web service access, that would be ideal, but it looks like they have nothing that's robust enough for a search group. > > > Also it could be worth mentioning that there is a raw XML file of all > > the DBLP database (http://dblp.uni-trier.de/xml/dblp.xml). Again, I > > have no real-life experience of developing under Cocoa, but there is > > surely an API to easily build and query databases. The question is > > whether or not that is possible without having to download the actual > > xml file (which takes more than 400MB IIRC). At the very least, there > > could be a local index file? Again, I don't know what I am talking > > about, I am just mentioning this in case it's useful. > > > > There's generic API to do al kind of things. I'm very much against > building a local index file. I think you'd have to download that file to do queries on it; even at 76 MB compressed, it's huge (and would grow stale quickly). You'd have to memory map the file and index it, since loading it into NSXMLDocument would kill the program. On a side note, keeping this in a single flat file seems crazy (hopefully that's not what they actually use!). Likewise, this quote "The encoding used for the XML file is plain ASCII. To represent characters outside of the 7-bit range we use symbolic or numeric entities. All symbolic entities are defined in the DTD. At the moment most parts of DBLP are restricted to ISO-8859-1 (Latin-1) characters, i.e. the first 255 Unicode characters. Only inside the <note>-element you may find characters outside of this range, for example some Chinese names in their original spelling." ...makes me a bit nervous. Have they not heard of UTF-8? -- adam ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bibdesk-develop mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bibdesk-develop
