Hi Andrew,

On Wed, Jun 1, 2011 at 8:45 PM, Andrew Dalke <da...@dalkescientific.com> wrote:
> Also, what does CDK do if there's an error in one of the records? For 
> example, some of the other toolkits' readers skip bad records and report to 
> stderr. If CDK ignores bad records, is there a way to get the count of 
> records skipped? If CDK raises an exception, is there a way to ignore the 
> error and keep on processing?

This is what the STRICT and RELAXED mode is about. The first will
throw an exception, the later will only report it, using a listener.
The 'Groovy Cheminformatics' book has a section on this.

Have a look at this IChemObjectReader method (which the readers inherit):

public void setErrorHandler(IChemObjectReaderErrorHandler handler)

Note that this only looks at the file format, not at the chemical errors.

>> the CDK-style guidelines prefer longer variable names (though this is really 
>> more me being obsessive), so the "IAtom a" would look better as, for 
>> example, "IAtom atom" or even "IAtom currentAtom"
>
> Being the type of person I am, I checked the actual practice ;)
>
>  95  matches to "IAtom a[^0-9a-zA-Z]" (56 in *Test.java files)
> 2158 matches to "IAtom atom[^a-zA-Z_]" (870 in *Test.java files) (eg, "atom1")
>  235 matches to "IAtom atom[a-zA-Z]" (61 in tests) (eg, "atomToUpdate")

This is to encourage the developers to think about decent names.
atomToUpdate and atomToTakeInfoFrom is much more informative (though
on the lengthy side) than atom1 and atom2. If the code is clear enough
written (which is not always the case, e.g. because the algorithm is
just long and hard to split up into methods) then atom1 and atom2 can
do.

'a' is really quite bad in some situations, where you have both an
atom type, an atom, and an atom container... a short variable name
puts extra stress on the reader.

Now, sometimes 'a' is perhaps acceptable, such as really simple
iterations... like the

for (IAtom a : mol.atoms()) {
  print a.getSymbol();
}

but with a mere three chars more, I think it become all the much more readable:

for (IAtom atom : mol.atoms()) {
  print atom.getSymbol();
}

Code must be readable. The risk of introducing misunderstanding is
just to big. Just browse a random CDK class, and try to figure out
what the algorithm is doing, and explain that to the first person
sitting next to you. If that actually worked, the source code is OK.

The minimal three char variable length checking code by the CDK code
quality checker is just a very lousy thing we can do automatically...
the least it does is make people think about code quality.

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger. 
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Data protection magic?
Nope - It's vRanger. Get your free trial download today. 
http://p.sf.net/sfu/quest-sfdev2dev
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to