Hi Charles, On Thu, Jan 12, 2012 at 09:48:04AM +0900, Charles Plessy wrote: > This is likely to be usable in the blend's sentinel pages. However, I have > not > implemented this. I do not have a particularly good excuse, except that I am > not a python programmer, so I would need to block a long slot of time to > really > get into it, and recently such slots I have given them to DEP 5 or to my > attempts to use Debian Installer to prepare Amazon Machine Images (see > http://charles.plessy.org/Debian/debiâneries/nuage/ and #637784). > > The data is in the UDD, so anybody can give it a try.
I have tried to use the information from upstream-metadata.yaml in the tasks pages. This worked so far but I noticed that the data in UDD are not yet of the quality I would want them to have. At first I noticed that for instance for package perlprimer two entries for PMID do exist (one of them is empty). This could be avoided if we would use PRIMARY KEY (package,key) (I'm sure I suggested this before). I admit that this could lead to more import errors. However, this kind of import errors would lead to check the input base immediately which is a wanted effect IMHO. I tried to do some more investigation into the UDD bibref table and noticed that at minimum the Reference-Journal is missing. This is used on the tasks pages and should be injected as well. Also volume, number and pages might be interesting to propagate from upstream-metadata.yaml to UDD. We also use URL and eprint which is missing as well. I tried to find out the reason for these missings and checked the intermediate format you are using for the import which is obtained via wget -q http://upstream-metadata.debian.net/for_UDD/biblio.yaml IMHO this format is really not the best choice (it's even not yaml, right?) As far as I understood you are generating these data from another database at upstream-metadata.debian.net and thus you could choose yourself the most convinient format for UDD. If I were you I would make things pretty simple and use a format which is fit for postgresql COPY format[1]. The only drawback I could see for this format is that it might be harder to debug if you are violating the suggested primary key. Currently the problematic part in your data input is: - perlprimer - PMID - 15073005 --- - perlprimer - PMID - '' Besides the fact that you should not export empty values anyway there should be some mechanism to avoid duplicates. Another option would be a pretty simple CSV file because Python has a cool CSV reader which uses first line as keys for a dictionary. I'd volunteer to write the importer from this (or any other format you might choose - yaml, xml - whatever you prefer) including checking the primary key constraint. But the precondition would be a value complete data file. I can confirm that I yesterday wrote some code which reads the available data from UDD and uses these data on the tasks pages. I could make this available on some separate pages to verify that everything will look as expected in the next couple of days. Kind regards Andreas. [1] http://www.postgresql.org/docs/9.1/static/sql-copy.html -- http://fam-tille.de -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: http://lists.debian.org/[email protected]

