Hi Pierre-Elliott, On Wed, 22 Nov 2017, Pierre-Elliott Bécue wrote: > Please, feel free to review and comment the patch. > > It lacks tests for the task, I'll work on that by the end of the week in a > fourth patch.
First of all, I would highly suggest to try to follow the principles of test-driven development: https://www.obeythetestinggoat.com/ IOW, you write a failing test first, then you code what's necessary to make the test pass and so on until you have all the features that you are looking for. Adding tests afterwards is hard and you are likely to miss some important tests. And it's just not fun to code tests separately when you already have something working... it will be hard to stick to it in the long run. Then on the compression feature, I have to agree with pabs, we should not have to look into the data-flow to find the compression method. I would suggest: - either the user is explicit (compression="gzip" attribute, or compression=None/False for no compression) - or the user relies on compression="auto" (default value) in which case it should just rely on the filename extension that we should pass along in some way. In the case of something retrieved through the web, it might be interesting to check the Content-Type to infer the compression too: $ HEAD https://qa.debian.org/data/vcswatch/vcswatch.json.gz |grep Content-Type Content-Type: application/x-gzip X-Content-Type-Options: nosniff (But it might be overkill since I guess we have a usable extension in 99% of the cases, feel free to skip this) Also it seems to me that the correct API for generic decompression should rely on getting a file object as input and providing a file object as output... as that's what you want to be able to process really big files without reading them all at once in memory (shall the need arise). Now onto the vcswatch code: > --- a/distro_tracker/core/panels.py > +++ b/distro_tracker/core/panels.py > @@ -235,6 +235,12 @@ class GeneralInformationPanel(BasePanel): > # There is no general info for the package > return > > + try: > + vcswatch = PackageExtractedInfo.objects.get( > + package=self.package, key='vcswatch').value > + except PackageExtractedInfo.DoesNotExist: > + vcswatch = {} I don't like that we modify a general panel for a feature that is currently vendor specific. I would like to see an intermediary abstraction under the form of "vcs_extra_links". It can still be in PackageExtractedInfo but you would not be alone in having the right to create/modify those keys. It would be something like this: { 'QA': 'https://qa.debian.org/cgi-bin/vcswatch?package=foo', 'Browse': '...', } And the code would display the links in alphabetical order. (Yes I'm suggesting that we might want to move the code that stores the Vcs-Browser link to use this new structure) Also if we have to extract two keys out of PackageExtractedInfo we want to do it in a single query to avoid round-trips with the database. The package page is already very heavy in numbers of queries. > --- a/distro_tracker/core/templates/core/panels/general.html > +++ b/distro_tracker/core/templates/core/panels/general.html > @@ -95,7 +95,7 @@ > <a href="{{ ctx.vcs.url }}">{{ vcs }}</a> > {% endif %} > {% if ctx.vcs.browser %} > - (<a href="{{ ctx.vcs.browser }}">Browse</a>) > + (<a href="{{ ctx.vcs.browser }}">Browse</a>{% if ctx.vcs.watch %}, <a > href="{{ ctx.vcs.watch }}"> > QA</a>{% endif %}) > {% endif %} > {% endwith %} You can have vcswatch data even if you don't have any browse link. vcswatch works as soon as you have a vcs url. But here the code would be very different (and cleaner) if you implement the abstraction suggested above. > --- /dev/null > +++ b/distro_tracker/vendor/debian/templates/debian/vcswatch-action-item.html > @@ -0,0 +1,9 @@ > +{% with description=item.extra_data.description %} > +{% with error=item.extra_data.error %} > + > +<a href="{{item.extra_data.vcswatch_url}}">VCSwatch</a> reports > +that {{description}}<br/><br/> > +{% if error %} > +<span>{{error}}</span> > +{% endif %} > +{% endwith %}{% endwith %} This looks wrong. The long descriptions should really be embedded here (with multiple if to check for the vcswatch status) and use whatever data they need out of extra_data. > + __out = {} Please don't use variable names with underscores. A single underscore might be useful for private method or function names (or for variables defined at the module level possibly). But it's not really useful for private variables within methods. > + @staticmethod > + def get_data_checksum(data): > + json_dump = json.dumps(data, sort_keys=True) > + if json_dump is not six.binary_type: > + json_dump = json_dump.encode('UTF-8') > + return hashlib.md5(json_dump).hexdigest() This checksum mechanism to see whether we have updated data should likely be factored out in a function where it can be used by other objects. To make it possible to store the checksum alongside the data, you might want to checksum a copy of the data where you drop the pre-defined key "data_checksum". We use Python 3 only now, so it's no longer required to use "six.binary_type". But in fact the whole check is not required as I believe that json.dumps will always return text (not bytes). > + def get_vcswatch_data(self): > + url = 'https://qa.debian.org/data/vcswatch/vcswatch.json.gz' > + data = json.loads(get_resource_content(url, > compression=True).decode('utf-8')) get_resource_content() might return None in case of network issue (which will trigger an excteption here). Also the need to decode the output is really annoying, we might want to introduce "get_resource_text()" which does this for us automatically. This is a pretty-good job so far, thank you! Cheers, -- Raphaël Hertzog ◈ Debian Developer Support Debian LTS: https://www.freexian.com/services/debian-lts.html Learn to master Debian: https://debian-handbook.info/get/