Re: udd/blends_metadata_gathener.py hints

Emmanouil Kiagias Fri, 18 Oct 2013 14:19:37 -0700

Hello Andreas,

I updated blends_metadata_gathener.py


>From first intuition I would think it might make sense to add single
> paragraphs to
> the configfile, like
>
>   blends-all
>   blend-med
>   blend-edu
>   blend-gis
>   blend-...
>
>  I added the above paragraphs inside config-ullman.yaml.
The gathener with blends-all runs for each available Blend else it runs for
the selected blend.

I created the single blend paragraphs using <<: *blends-conf in case we
need to override any of the blends-all attributes.

Each Blend now has each own log file by the name :
blends_metadata_gatherer-BLEND.log

In case the gathener fails before he updates any blend it logs into a
blends_metadata_gatherer-default.log file.

For checking if a task file has changed I added a  "hashkey" column in the
blends_tasks. When a task is imported I save a md5 hash in the
blends_tasks. Before I delete and add from scratch a taskfile I checked
whether its hashkey has changed. So if you run once the new gathener in
order to save some first hashkeys then it will only delete/adds the changed
tasks.

In the above case I could not delete and readd the Blend entry from
blends_metadata table (because of the references in blends_tasks etc) so I
check whether a Blends exists. If it exists I update the entry to save any
changes else I use the blends_metadata_insert to create a  new entry.

You can test the gathener. Any feedback/comments is more than welcome :-).

I will now check on the following (quoting from a previous mail of yours):

c) try to make the insertion procedure itself more efficient by for
     instance:
      - check, whether we could speed up the check for a package that
        just exists in UDD
      - inject all packages in one rush


Kind regards

Emmanouil

Re: udd/blends_metadata_gathener.py hints

Reply via email to