Hi Greg,

Another small patch for cleanup_datasets.py, that makes the output messages 
slightly more informative.

The changes are:
1. If a dataset is skipped (because it's shared/cloned and was already 
process), no message is printed at all.
2. If a dataset can not be deleted because it is shared, and one instance is 
not marked as "deleted", a proper message is printed.
3. If a dataset has metadata files, the message is changed depending on 
"info_only" and "remove_from_disk" flags.
4. The final summary message is slightly changed.

The main goal is for the "delete_datasets.sh" script (with "-i" or without) to 
display exactly what's going on.

comments are welcomed,
 -gordon


Example of items 1,2:
Without patch:
=====
...
######### Processing dataset id: 1782
######### Processing dataset id: 1782
######### Processing dataset id: 1782
######### Processing dataset id: 1783
Dataset 1783 will be deleted (without 'info_only' mode)
######### Processing dataset id: 1801
Dataset 1801 will be deleted (without 'info_only' mode)
======

Which shows that dataset 1782 seems to be processed three times, but still not 
deleted, without giving a reason.

With this patch, the output will be:
======
######### Processing dataset id: 1782
Dataset is not deletable (shared between multiple histories/libraries, at least 
one is not deleted)
######### Processing dataset id: 1783
Dataset 1783 will be deleted (without 'info_only' mode)
######### Processing dataset id: 1801
Dataset 1801 will be deleted (without 'info_only' mode)
======


Example of item 3:
Without patch:
==========
######### Processing dataset id: 1404
The following metadata files attached to associations of Dataset '1404' have 
been purged:
/localdata1/galaxy/database_prod/files/_metadata_files/000/metadata_102.dat
Dataset 1404 will be deleted (without 'info_only' mode)
==========

With patch:
==========
######### Processing dataset id: 1404
The following metadata files attached to associations of Dataset '1404' will be 
marked as deleted (without 'info_only' mode):
/localdata1/galaxy/database_prod/files/_metadata_files/000/metadata_102.dat
Dataset 1404 will be deleted (without 'info_only' mode)
===========

Example of item 4:
Without patch:
====
Examined 606 datasets, marked 589 as deleted and purged 595 dataset instances
====

With patch:
===
Examined 606 datasets, marked 589 datasets and 595 dataset instances (HDA) as 
deleted
===

diff --git a/scripts/cleanup_datasets/cleanup_datasets.py b/scripts/cleanup_datasets/cleanup_datasets.py
--- a/scripts/cleanup_datasets/cleanup_datasets.py
+++ b/scripts/cleanup_datasets/cleanup_datasets.py
@@ -310,17 +310,21 @@ def delete_datasets( app, cutoff_time, r
     dataset_ids.extend( [ row.id for row in history_dataset_ids_query.execute() ] )
     # Process each of the Dataset objects
     for dataset_id in dataset_ids:
+        dataset = app.sa_session.query( app.model.Dataset ).get( dataset_id )
+        if dataset.id in skip:
+            continue
+        skip.append( dataset.id )
         print "######### Processing dataset id:", dataset_id
-        dataset = app.sa_session.query( app.model.Dataset ).get( dataset_id )
-        if dataset.id not in skip and _dataset_is_deletable( dataset ):
-            deleted_dataset_count += 1
-            for dataset_instance in dataset.history_associations + dataset.library_associations:
-                # Mark each associated HDA as deleted
-                _purge_dataset_instance( dataset_instance, app, remove_from_disk, include_children=True, info_only=info_only, is_deletable=True )
-                deleted_instance_count += 1
-        skip.append( dataset.id )
+        if not _dataset_is_deletable( dataset ):
+            print "Dataset is not deletable (shared between multiple histories/libraries, at least one is not deleted)"
+            continue
+        deleted_dataset_count += 1
+        for dataset_instance in dataset.history_associations + dataset.library_associations:
+            # Mark each associated HDA as deleted
+            _purge_dataset_instance( dataset_instance, app, remove_from_disk, include_children=True, info_only=info_only, is_deletable=True )
+            deleted_instance_count += 1
     stop = time.time()
-    print "Examined %d datasets, marked %d as deleted and purged %d dataset instances" % ( len( skip ), deleted_dataset_count, deleted_instance_count )
+    print "Examined %d datasets, marked %d datasets and %d dataset instances (HDA) as deleted" % ( len( skip ), deleted_dataset_count, deleted_instance_count )
     print "Total elapsed time: ", stop - start
     print "##########################################" 
 
@@ -396,8 +400,13 @@ def _delete_dataset( dataset, app, remov
                                                .filter( app.model.MetadataFile.table.c.lda_id==ldda.id ):
                 metadata_files.append( metadata_file )
         for metadata_file in metadata_files:
-            print "The following metadata files attached to associations of Dataset '%s' have been purged:" % dataset.id
-            if not info_only:
+            op_description = "marked as deleted"
+            if remove_from_disk:
+                op_description = op_description + " and purged from disk"
+            if info_only:
+                print "The following metadata files attached to associations of Dataset '%s' will be %s (without 'info_only' mode):" % ( dataset.id, op_description )
+            else:
+                print "The following metadata files attached to associations of Dataset '%s' have been %s:" % ( dataset.id, op_description )
                 if remove_from_disk:
                     try:
                         print "Removing disk file ", metadata_file.file_name
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to