Re: [Bibdesk-develop] Web of Science

Fischlin Andreas Sun, 23 Mar 2014 09:07:12 -0700

BTW, I am already beta testing Reto's latest builds by using nothing else and 
can sey, they work perfectly so far. Not a glitch! Very well done Reto.


Regards,
Andreas



On 23/03/2014, at 16:52 , Christiaan Hofman wrote:


On Mar 23, 2014, at 15:32, Adam R. Maxwell wrote:



On Mar 23, 2014, at 6:35, Christiaan Hofman 
<cmhof...@gmail.com<mailto:cmhof...@gmail.com>> wrote:

Let me add a few remarks on this. And, btw, also from me thanks for doing this. 
I never much looked at these classes because I I never really understood much 
of what was going on, and the code is not very clean, to use an understatement.

On Mar 23, 2014, at 12:19, Reto Stöckli wrote:

Adam

A few answers from my side, well knowing that I might be wrong:

1) I am frankly pissed that CitedReferences/CitingArticles/etc is not 
documented in the code or the online help, if the comment is still correct. 
This accounts for a huge amount of boilerplate-looking code, which in my 
experience is a disaster for testing after rewriting/refactoring. I suggest 
removing them in your rewrite.

Since these parts do not seem to be user-accessible anyway, I agree. I however 
leave the final decision to you guys.

That seems reasonable for me too, also cleans up the code a lot.


2) BDSKISISourceXMLTagPriorityKey is apparently an array of fields. No idea 
what the hell it does, as it's not documented in the code or on the wiki. I 
suggest removing it in your rewrite.

Same as above.

idem


3) Style: is testing isKindOfClass: really the only way to distinguish the body 
parts? It's generally frowned on in Obj-C.

Can you make an example to me as a non Obj-C programmer so that I can test the 
correct way?

I don't think this is possible. This is a consequence of the rather bad 
generated code, there should be a lot of subclassing going on which there 
isn't. If I look at where this stuff is added (I think that should be 
WokSearchService.m around line 6577) these objects are just added to an array, 
and nothing is added to identify what was added, and the objects themselves 
have no common property to get to figure out what type they are, so checking 
the class is the only way you can know.

Okay, that's what I suspected as well. It is what it is, in that case.




4) Style: as far as I know, the remainder of BibDesk does not use the Obj-C dot 
syntax for accessors, but it looks like that was used in the previous iteration 
of this class. Personally, I detest it and would prefer square brackets, but 
this isn't your fault.

That I can change very easily, use brackets instead of dot-syntax. I have, as 
you say, followed the syntax of the original code since I learned it this way.

Not a big deal, especially if Christiaan doesn't mind. I initially thought you 
were dealing with C structs :).


Not a big deal, but I also don't like it


5) Style: downloadWithSearchTerm:database:options: is >300 lines, and 
createPublicationInfoWithRecord() is >460 lines. This is really unfortunate, as 
they are both unmaintainably long.

I agree.

With 1 this should help a lot.


6) There's a hardcoded date of "2020-01-01" in there. This needs to be +[NSDate 
distantFuture] or +[NSDate date]. 2020 is only 6 years away, and I've been 
contributing code to BibDesk for the past 12 years :-).

I have not checked that part of the code, but I can change that. Thanks for the 
hint.

It seems to me the dates aren't really needed. In the search option it is not 
used at all, so if we drop the rest this becomes mute.



The chief points are really the last two. Can Xpath queries be used to avoid 
walking the tree(s) so many times? The multiple nested while() loops make me 
really nervous from a readability standpoint.

Ok, here's the answer from a novice Obj-C programmer:
I've tried the Xpath syntax since it was also used in the original code. It 
failed for WOKSearch V3 since sub-tree identifiers are not unique wihtin the 
whole tree (e.g. the identifier "names" appears several times in the tree, 
naming several different sub-tree key's, such as author names, publication 
names, title names etc.). Searching for XPath:name resulted in a wild match of 
totally different things. I've avoided this by walking through individual 
sub-trees, one by one.

Now my question: can the XPath query be used to match these subtrees uniquely 
("XPath:names" matches both)?
a) summary->names
b) summary->publishers->publisher->names

If it is possible, I'll replace the complex walk through sub-trees with single 
XPath statements.

After a quick look to the loops, it seems to me that should be possible, and 
not so difficult. It should not matter that "names" occur at different places, 
you should just look at them in the correct place by building a more specific 
path. So don' look for all "names" nodes with something like .//names (double 
slash) or "XPath:names", but more specific paths with single slashes and 
parents included.

For instance, use ./static_data/summary from the record to get the summary 
node, and then use things like ./pub_info, ./titles, etc applied to that node 
to get its children, and work through its children analogously.

For instance, this could be part of the code to get the Author field (typed in 
email, not tested):

NSXMLNode *summaryNode = [[record nodesForXPath:@"./static_data/summary" 
error:NULL] lastObject];
...
NSArray *names = [summaryNode nodesForXPath:@"./names/name/full_name" 
error:NULL];
NSMutableString *nameString = nil;
for (child in names) {
if (name == nil)
nameString = [NSMutableString string];
else
[nameString appendString:@" and "];
[nameString appendString:[child stringValue]];
}

Also, for the rest the following Xpaths should immediately give you arrays of 
nodes you can use to build the abstract, keywords, and the IDs:

./static_data/fullrecord_metadata/abstracts/abstract/abstract_text/p
./static_data/item/keywords_plus/keyword
./dynamic_data/cluster_related/identifiers/identifier

Perhaps you have used the wrong XPaths. It should not matter whether some 
identifier occurs at different places in the tree, unless you search a whole 
tree like with .//names (with double slash), which you probably should never do.

Yes, what he said! Also, I'll add that Xpath is more tricky than it seems, 
especially if you don't have deep knowledge of it. If you have a sample of the 
XML returned for v3, please post that, and we can all look at it. I have WoS at 
work, but no time to try your code this week, unfortunately.

Adam

Though I don't think you need the tricky parts of Xpath, because it seems all 
to be simple descending according to a set path without the need for 
conditionals and such. AFAICS these are all the Xpaths that are needed (at 
least for the uncommented part) (the indentation is associated with the 
hierarchy of sub nodes):

./UID
./dynamic_data/cluster_related/identifiers/identifier
./static_data
./fullrecord_metadata/abstracts/abstract/abstract_text/p
./item/keywords_plus/keyword
./summary
./doctypes/doctype
./pub_info
./page
./titles/title
./names/name/full_name
./publishers/publisher
./address_spec
./names/name/full_name

Christiaan

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech_______________________________________________
Bibdesk-develop mailing list
Bibdesk-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bibdesk-develop


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Bibdesk-develop mailing list
Bibdesk-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bibdesk-develop

Re: [Bibdesk-develop] Web of Science

Reply via email to