Nate Coraor wrote:
After looking at it a bit more, I see what you mean. Are there plans to
implement and additional cleanup scripts for non-postgres 9.1 users?
Just curious so I don't reinvent the wheel, I'd be happy to help with
On Mar 22, 2013, at 11:56 AM, Lance Parsons wrote:
I have been running a Galaxy server for our sequencing researchers for a while
now and it's become increasingly successful. The biggest resource challenge for
us has been, and continues to be disk space. As such, I'd like to implement
some additional cleanup scripts. I thought I run a few questions by this list
before I got too far into things.
In general, I'm wondering how to implement updates/additions to the cleanup system that
will be in line with the direction that the Galaxy project is headed. The pgcleanup.py
script is the newest piece of code in this area (and even adds cleanup of exported
histories, which are absent from the older cleanup scripts). Also, the pgcleanup.py
script uses a "cleanup_event" table that I don't believe is used by the older
cleanup_datasets.py script. However, the new pgcleanup.py script only works for Postgres,
and worse, only for version 9.1+. I run my system on RedHat (CentOS) and thus we use
version 8.4 of Postgres. Are there plans to support other databases or older versions of
pgcleanup.py makes extensive use of Writable CTEs, so there is not really a way
to port it to older versions. For 8.4 or MySQL, you can still use the older
Right. It seems to make sense to me to focus on the cleanup_datasets.py
since that will work for everyone. I would like to essentially mimic
the user deleting a dataset. I'd then email them to let them know that
some old data had been marked for deletion and let the rest of the
scripts proceed as normal, cleaning that up if they don't undelete it.
I'd like to implement a script to delete (set the deleted flag) for certain
datasets (e.g. raw data imported from our archive, for old, inactive users,
etc.). I'm wondering if it would make sense to try and extend pgcleanup.py or
cleanup_datasets.py. Or perhaps it would be best to just implement a separate
script, though that seems like I'd have to re-implement a lot of boilerplate
code for configuration reading, connections, logging, etc. Any tips on
generally acceptable (supported) procedures for marking a dataset as deleted?
You could probably reuse a lot of the code from either of the cleanup scripts
It looks like I would want to mark the HistoryDatasetAssociations as
deleted? Is that correct? Would I need to do anything else to simulate
the user deleting the dataset?
Thanks for the help,
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at: