Re: [Bibdesk-develop] Web of Science

Christiaan Hofman Sun, 23 Mar 2014 06:36:10 -0700

Let me add a few remarks on this. And, btw, also from me thanks for doing this. 
I never much looked at these classes because I I never really understood much 
of what was going on, and the code is not very clean, to use an understatement.

On Mar 23, 2014, at 12:19, Reto Stöckli wrote:

> Adam
> 
> A few answers from my side, well knowing that I might be wrong:
>> 
>> 1) I am frankly pissed that CitedReferences/CitingArticles/etc is not 
>> documented in the code or the online help, if the comment is still correct. 
>> This accounts for a huge amount of boilerplate-looking code, which in my 
>> experience is a disaster for testing after rewriting/refactoring. I suggest 
>> removing them in your rewrite.
> 
> Since these parts do not seem to be user-accessible anyway, I agree. I 
> however leave the final decision to you guys.

That seems reasonable for me too, also cleans up the code a lot.

>> 
>> 2) BDSKISISourceXMLTagPriorityKey is apparently an array of fields. No idea 
>> what the hell it does, as it's not documented in the code or on the wiki. I 
>> suggest removing it in your rewrite.
> 
> Same as above.

idem

>> 
>> 3) Style: is testing isKindOfClass: really the only way to distinguish the 
>> body parts? It's generally frowned on in Obj-C.
> 
> Can you make an example to me as a non Obj-C programmer so that I can test 
> the correct way?
> 

I don't think this is possible. This is a consequence of the rather bad 
generated code, there should be a lot of subclassing going on which there 
isn't. If I look at where this stuff is added (I think that should be 
WokSearchService.m around line 6577) these objects are just added to an array, 
and nothing is added to identify what was added, and the objects themselves 
have no common property to get to figure out what type they are, so checking 
the class is the only way you can know.

>> 
>> 4) Style: as far as I know, the remainder of BibDesk does not use the Obj-C 
>> dot syntax for accessors, but it looks like that was used in the previous 
>> iteration of this class. Personally, I detest it and would prefer square 
>> brackets, but this isn't your fault.
> 
> That I can change very easily, use brackets instead of dot-syntax. I have, as 
> you say, followed the syntax of the original code since I learned it this way.
>> 
>> 5) Style: downloadWithSearchTerm:database:options: is >300 lines, and 
>> createPublicationInfoWithRecord() is >460 lines. This is really unfortunate, 
>> as they are both unmaintainably long.
> 
> I agree.

With 1 this should help a lot.

>> 
>> 6) There's a hardcoded date of "2020-01-01" in there. This needs to be 
>> +[NSDate distantFuture] or +[NSDate date]. 2020 is only 6 years away, and 
>> I've been contributing code to BibDesk for the past 12 years :-).
> 
> I have not checked that part of the code, but I can change that. Thanks for 
> the hint.
> 

It seems to me the dates aren't really needed. In the search option it is not 
used at all, so if we drop the rest this becomes mute.

>> 
>> 
>> The chief points are really the last two. Can Xpath queries be used to avoid 
>> walking the tree(s) so many times? The multiple nested while() loops make me 
>> really nervous from a readability standpoint.
> 
> Ok, here's the answer from a novice Obj-C programmer: 
> I've tried the Xpath syntax since it was also used in the original code. It 
> failed for WOKSearch V3 since sub-tree identifiers are not unique wihtin the 
> whole tree (e.g. the identifier "names" appears several times in the tree, 
> naming several different sub-tree key's, such as author names, publication 
> names, title names etc.). Searching for XPath:name resulted in a wild match 
> of totally different things. I've avoided this by walking through individual 
> sub-trees, one by one.
> 
> Now my question: can the XPath query be used to match these subtrees uniquely 
> ("XPath:names" matches both)?
> a) summary->names
> b) summary->publishers->publisher->names
> 
> If it is possible, I'll replace the complex walk through sub-trees with 
> single XPath statements.
> 

After a quick look to the loops, it seems to me that should be possible, and 
not so difficult. It should not matter that "names" occur at different places, 
you should just look at them in the correct place by building a more specific 
path. So don' look for all "names" nodes with something like .//names (double 
slash) or "XPath:names", but more specific paths with single slashes and 
parents included.

For instance, use ./static_data/summary from the record to get the summary 
node, and then use things like ./pub_info, ./titles, etc applied to that node 
to get its children, and work through its children analogously.

For instance, this could be part of the code to get the Author field (typed in 
email, not tested):

NSXMLNode *summaryNode = [[record nodesForXPath:@"./static_data/summary" 
error:NULL] lastObject];
...
NSArray *names = [summaryNode nodesForXPath:@"./names/name/full_name" 
error:NULL];
NSMutableString *nameString = nil;
for (child in names) {
        if (name == nil)
                nameString = [NSMutableString string];
        else
                [nameString appendString:@" and "];
        [nameString appendString:[child stringValue]];
} 

Also, for the rest the following Xpaths should immediately give you arrays of 
nodes you can use to build the abstract, keywords, and the IDs:

./static_data/fullrecord_metadata/abstracts/abstract/abstract_text/p
./static_data/item/keywords_plus/keyword
./dynamic_data/cluster_related/identifiers/identifier

Perhaps you have used the wrong XPaths. It should not matter whether some 
identifier occurs at different places in the tree, unless you search a whole 
tree like with .//names (with double slash), which you probably should never do.

> Reto

thanks,
Christiaan

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech

_______________________________________________
Bibdesk-develop mailing list
Bibdesk-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bibdesk-develop

Re: [Bibdesk-develop] Web of Science

Reply via email to