Assaf Gordon wrote:
> Hello Sergei,
> 
> I'm experimenting with the clean-up scripts myself, so perhaps I can offer 
> some information (the galaxy team is welcomed to correct me and/or explain 
> better).
> 
> 
> 1. If you look at the output of your query, you'll notice that the "purged" 
> field is 0 for all datasets (I assume 0 is "false" in MySQL).
> This means that the actual files where *not* purged (e.g. physically deleted) 
> - at least not by the "purge_datasets.sh" or "cleanup_datasets.py -3" step.
> Since you did use "-r" parameter, it means those dataset were not picked-up 
> as possible deletion candidates by this script.
> 
> 
> 2. (The following I found by reading the source code, it's not really well 
> explained - so if I'm wrong - correct me).
> The "dataset" table has an "update_time" field, and this field is updated 
> automatically whenever the dataset record changes.
> This means that when you run the first cleanup script and set the "deleted" 
> flag to true, the update_time is updated to "now".
> When you run the next clean-up script and ask for anything that is older than 
> 1 day ("-d 1"), it looks for the update_time older then one day - so it will 
> *not* find the dataset that was just marked as "deleted" in the first step 
> (because the update_time is "now"). Only if you run the next clean-up script 
> tomorrow, that dataset will be deleted.
> 
> So, for example, running the following in succession:
> cleanup_datasets.py universe_wsgi.ini -d 1 -6    ( => delete datasets )
> cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r ( => purge datasets + delete 
> physical files)
> 
> both run with "-d 1" - but by design, files from yesterday (1 day old) will 
> not be physically deleted.
> 
> Files that the user deleted yesterday (1 day old) will be marked as 
> "deleted", but their update_time will by "now".
> Only files that were marked as deleted yesterday will be deleted today 
> (meaning: they are 2 days old).
> 
> To really delete files now, use "-d 0" with all the scripts.
> Since this is quite scary, the "-i" (info only) mode will show what what will 
> be deleted (but that requires a recent version 5770:a5e0a5d3c0a1).
> 
> 
> 3. The file_size=NULL issue happen when a job fails - on some occasions (I 
> couldn't pinpoint exactly when) galaxy does not pickup the fact the an output 
> file was generated even if the job failed, and so you get "ghost" files which 
> exist on the disk but are NULL in the database.
> The "discard" means the job was discarded (by the galaxy user?) - not that 
> the dataset was deleted/purged by the clean-up scripts.

Also, datasets created prior to the addition of the total_size column in
changeset 5700:70e2b1c95a69 will have this unset - it can be set by
running the script:

    % python ./scripts/set_dataset_sizes.py

Also, Sergei, it's possible to allow users to force datsaets to be
removed from disk after they "delete" them.  See the
'allow_user_dataset_purge' option in universe_wsgi.ini.  If set to
True, users can select "Show Deleted Datasets" from the History's
"Options" menu and then choose datasets to purge.  Entire histories can
be purged from the history list.

--nate

> 
> 
> Hope this helps,
>  -gordon
> 
> 
> 
> Sergei Ryazansky wrote, On 07/06/2011 12:15 PM:
> > Hi,
> > thank you for answer.
> > I have tried to use the mentioned scripts but it seems that the order of 
> > their using at first time was incorrect.. As a result, the metadata in 
> > database tables are modified but the datasets files corresponded to deleted 
> > datasets in history remains unremoved. So, the following calling of the 
> > scripts in the right order (as indicated in wiki) also didn't delete the 
> > unused dataset files. Is there any way to update the metadata in tables 
> > according to the real state of files?
> > I think that the order of calling the scripts at first time was the 
> > following:
> > cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r
> > cleanup_datasets.py universe_wsgi.ini -d 6 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 2 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 3 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 4 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 5 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -2 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -4 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -5 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r
> > 
> > Also there are some strange things (imho) in galaxy.dataset table: there a 
> > lot of datasets id having or NULL total size:
> > 
> > mysql> select * from dataset where (id="148" or id="53" or id="86" or 
> > id="146" or id="330");
> > +-----+---------------------+---------------------+-----------+---------+--------+----------+-------------------+-------------------+-----------+------------+
> > | id | create_time | update_time | state | deleted | purged | purgable | 
> > external_filename | _extra_files_path | file_size | total_size |
> > +-----+---------------------+---------------------+-----------+---------+--------+----------+-------------------+-------------------+-----------+------------+
> > | 53 | 2011-03-29 16:21:58 | 2011-07-06 14:17:49 | error | 1 | 0 | 1 | NULL 
> > | NULL | 0 | NULL |
> > | 86 | 2011-03-29 20:35:44 | 2011-07-06 14:17:52 | discarded | 1 | 0 | 1 | 
> > NULL | NULL | NULL | NULL |
> > | 146 | 2011-05-26 01:38:14 | 2011-07-06 14:18:00 | error | 1 | 0 | 1 | 
> > NULL | NULL | NULL | NULL |
> > | 148 | 2011-05-26 02:20:44 | 2011-07-06 14:18:00 | discarded | 1 | 0 | 1 | 
> > NULL | NULL | NULL | NULL |
> > | 330 | 2011-07-05 00:44:44 | 2011-07-05 00:44:44 | NULL | 0 | 0 | 1 | NULL 
> > | NULL | NULL | NULL |
> > +-----+---------------------+---------------------+-----------+---------+--------+----------+-------------------+-------------------+-----------+------------+
> > 
> > I don't know how these records looked like before calling of the cleanup 
> > scripts, but is it possible that it is because of incorrect order of their 
> > calling? Is "discarded" state mean that the corresponded file should be 
> > deleted? But in my case all these files are still in database folder.
> > Please, let me know if you need any other of clarification of my questions.
> > 
> > 
> > 2011/7/6 Hans-Rudolf Hotz <h...@fmi.ch <mailto:h...@fmi.ch>>
> > 
> >     Hi Sergei
> > 
> >     This is a question better asked on 'galaxy-...@bx.psu.edu 
> > <mailto:galaxy-...@bx.psu.edu>' since you refer to your local Galaxy 
> > installation.
> > 
> > 
> >     In order to remove the data from your file system, you need to run the 
> > 'cleanup scripts', as described on this wiki page:
> > 
> > 
> >     
> > https://bitbucket.org/galaxy/galaxy-central/wiki/Config/PurgeHistoriesAndDatasets
> > 
> > 
> > 
> >     Regards, Hans
> > 
> > 
> > 
> >     On 07/06/2011 03:33 PM, Sergei Ryazansky wrote:
> > 
> > 
> > 
> >         -------- Исходное сообщение --------
> >         Тема:   deleting datasets from history
> >         Дата:   Tue, 5 Jul 2011 19:58:45 +0300
> >         От:     Sergei Ryazansky <s.ryazan...@gmail.com 
> > <mailto:s.ryazan...@gmail.com>>
> >         Кому:   galaxy-user-requ...@lists.bx.psu.edu 
> > <mailto:galaxy-user-requ...@lists.bx.psu.edu>
> > 
> > 
> > 
> >         Hello all,
> > 
> > 
> >         After the deleating datasets from the history panel in our Galaxy 
> > mirror
> >         the indicator at the top right corner shows the same amount of used
> >         space as before deleting. Also, the files corresponded to the 
> > datasets
> >         remains in the Galaxy database/files/000 directory. It seems, that
> >         deleting of datasets from history is only delete the launch to file 
> > but
> >         not the file itself. How to configure the Galaxy mirror to delete 
> > not
> >         only records in history panel but also the corresponed files?
> >         Thank you in advance!
> > 
> > 
> > 
> >         ___________________________________________________________
> >         The Galaxy User list should be used for the discussion of
> >         Galaxy analysis and other features on the public server
> >         at usegalaxy.org <http://usegalaxy.org>.  Please keep all replies 
> > on the list by
> >         using "reply all" in your mail client.  For discussion of
> >         local Galaxy instances and the Galaxy source code, please
> >         use the Galaxy Development list:
> > 
> >           http://lists.bx.psu.edu/listinfo/galaxy-dev
> > 
> >         To manage your subscriptions to this and other Galaxy lists,
> >         please use the interface at:
> > 
> >           http://lists.bx.psu.edu/
> > 
> > 
> 
> 
> 
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>   http://lists.bx.psu.edu/
> 
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to