Re: [galaxy-dev] Datasets incorrectly flagged as deleted

Lance Parsons Mon, 05 Dec 2016 07:05:04 -0800

I do use that script, though I hope it’s not causing this. The intentbehind it was to simulate deletion from the HDA by the user, and thenlet the cleanup process progress as normal. Also, I only run on certaindatasets, and many (all?) of the specific examples of this issue I’veseen were not ones touched by the script, but datasets copied from aData Library.


Looking a bit further:


 * |d.deleted| and |not hda.purged| = 66845
     o |d.deleted| and |d.purged| and |not hda.purged| = 54818
     o |d.deleted| and |not d.purged| and |not hda.purged| = 12027

Nate Coraor <mailto:[email protected]>
December 2, 2016 at 1:09 PM
Lance,
Do you use the administrative deletion script you wrote? That wouldprobably cause d.deleted and not hda.purged.
d.deleted isn't for queries, it's sort of a deletion buffer. After allhda w/ hda.dataset_id = d.id <http://d.id> are hda.purged, the nextcleanup step is to mark d.deleted. Then after d.deleted, it's markedd.purged and the file(s) are actually removed from disk. It's a bit ofa failsafe so that datasets can be intercepted if unintentionally deleted.
--nate


Lance Parsons <mailto:[email protected]>
December 2, 2016 at 12:39 PM
Thanks Nate, that makes sense. However it seems I still have an issue:

```
select count(d.id)
    from dataset d
    join history_dataset_association hda on d.id = hda.dataset_id
    where d.deleted = 't' and hda.purged = 'f';
 count
-------
 67464
(1 row)
```
Perhaps I'll need to write some script to check each of these to seeif the data does, indeed, exist, and then set the flagappropriately... Hrmm.
Does anyone know what the `dataset.deleted` flag is used for? Is thatjust supposed to be set when all `hda.purged` are `t`. Sort of like ashortcut for queries?
- Lance

Nate Coraor <mailto:[email protected]>
December 2, 2016 at 11:15 AM
Lance,
usegalaxy.org <http://usegalaxy.org> has 4,652,912 such datasets. Thecause here is that deleting an entire history does not mark the HDAsdeleted (so that if you view a deleted history you can see whatdatasets were deleted and which were not at the time of deletion).There is a separate hda.purged column that indicates that an HDA is nolonger user-recoverable by the user. I have 699 datasets that ared.deleted but not hda.purged, this number should be 0.
--nate


Lance Parsons <mailto:[email protected]>
November 30, 2016 at 2:20 PM
I've run into issues over the past year where some jobs wouldoccasionally fail to start (stuck in a `new` state). I tracked themdown to a situataion where `dataset.deleted` is set to `t` yet the`history_dataset_association.deleted` is `f`. Simply setting`dataset.deleted` to `f` in those instances resolved the issue and thejobs ran. The datasets have all still been on disk.
Since this is a pretty annoying situation, I thought I'd check to seeif there are other datasets with this problem. Shockingly, I foundmany thousands of such datasets:
```
select count(d.id)
    from dataset d
    join history_dataset_association hda on d.id = hda.dataset_id
    where d.deleted = 't' and hda.deleted = 'f';
 count
-------
 76977
(1 row)
```
I'm hesitant to update so many rows in my database so I thought I'dput this out there for comment. What do others see when running theabove query? Has anyone run into this or a similar issue? Thanks.


--
Lance Parsons - Scientific Programmer
Carl C. Icahn Laboratory - Room 136
Lewis-Sigler Institute for Integrative Genomics
Princeton University

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Datasets incorrectly flagged as deleted

Reply via email to