Hi, Galaxy Developers,

I apologize for resurrecting another old thread 
(http://dev.list.galaxyproject.org/delete-data-library-via-API-td4553000.html), 
and for this long-winded email...

First things first, I am trying to confirm what is suggested in the thread 
cited above (that data sets cannot be deleted from a Galaxy data library via 
the API).   I am trying to confirm this because I have a vested interested in 
performing this operation to the extent that I've started investigating 
modifying the PostgreSQL tables directly if it can't be done via the API (more 
about this later). 

The gist of why I am trying to programmatically delete data from a data library 
is because I am trying to write/implement some custom python code that 
maintains consistency between a folder on the local filesystem and a 
corresponding Galaxy data library (i.e bi-directional synchronization).  I need 
do do this in an automated fashion to account for the following two conditions;

1) The file on the filesystem gets deleted (i.e. the path that is referenced in 
the data library is no longer valid).
2) The MD5 of the file on the filesystem changes (i.e. the file was replaced or 
modified, and needs to be re-imported such that the correct metadata (i.e. file 
size) is reported via the Galaxy UI).

Based on the limited amount of testing I have done, it doesn't appear to be 
possible to delete an actual data set from the data library via the API.   Here 
is a test that leads me to believe that this is not possible;

1) I can delete a data library successfully without issue;  Here is the output 
of me doing so:

--(galaxy@crigalaxy)-(/group/galaxy/galaxy-dist/scripts/api)--
> ./delete.py  11f3cb91acb2ab1677f8265bxxxxxxxx 
> http://localhost:8081/api/libraries/e85a3be143d5905b
Response
--------
{'synopsis': 'dansully', 'description': 'dansully', 'name': 'dansully'}
--(galaxy@crigalaxy)-(/group/galaxy/galaxy-dist/scripts/api)--


2) Whenever I try to delete an item in the data library, I get a 404, with the 
response "no action found for ..."

--(galaxy@crigalaxy)-(/group/galaxy/galaxy-dist/scripts/api)--
> ./display.py  11f3cb91acb2ab1677f8265bxxxxxxxx 
> http://localhost:8081/api/libraries/e85a3be143d5905b/contents/62e564808c5368d4
Member Information
------------------
ldda_id: 62e564808c5368d4
misc_blurb: 2 lines
name: whatever.txt
data_type: txt
file_name: /group/galaxy/galaxy-dist/database/files/008/dataset_8938.dat
uploaded_by: dansu...@uchicago.edu
template_data: {}
genome_build: ?
model_class: LibraryDataset
misc_info: uploaded txt file
file_size: 329
metadata_data_lines: 2
message: 
id: 62e564808c5368d4
date_uploaded: 2012-08-29T15:48:38.335445
metadata_dbkey: ?
--(galaxy@crigalaxy)-(/group/galaxy/galaxy-dist/scripts/api)--
> ./delete.py  11f3cb91acb2ab1677f8265bxxxxxxxx 
> http://localhost:8081/api/libraries/e85a3be143d5905b/contents/62e564808c5368d4
HTTP Error 404: Not Found
404 Not Found
The resource could not be found.
No action for /api/libraries/e85a3be143d5905b/contents/62e564808c5368d4
--(galaxy@crigalaxy)-(/group/galaxy/galaxy-dist/scripts/api)--

My hope of being able to actually use the API for a delete is that I am either 
not forming the URL string to correctly delete the data set, or the data{} dict 
implements a key that I am not aware of (it is my understanding that the Galaxy 
API is still under active development; so far I have not been able to locate 
any documentation that suggests any keys or attributes (other than the example 
code distributed)) that will make a delete operation for an individual data set 
feasible.  Would it be possible for somebody with specific knowledge of the 
Galaxy API to comment on whether or not this functionality is implemented?

All of this being said, if it is *not* possible to delete an individual data 
set from the Galaxy API, I am prepared to make a 'reasonable' effort to try and 
do this by modifying the galaxy back-end SQL tables directly.  Based on the 
research I have done (I have enabled mod logging on the PostgreSQL database), 
here is what a delete operation from the Galaxy UI  looks like in terms of 
database changes:

2012-08-29 08:44:20.361 
CDT,"galaxy_","galaxy",7952,"127.0.0.1:46174",503e1a8d.1f10,41,"idle in 
transaction",2012-08-29 ent: UPDATE library_dataset SET 
update_time='2012-08-29T13:44:20.361070', deleted=true WHERE library_dataset.id 
= 6013",$

So, at this point (assuming that I cannot delete the data set from the Galaxy 
API), I'm trying to work backwards from information I know (i.e. file_name or 
ldda_id) to discern the library_dataset.id  (I'm still digging through database 
query logs to try to determine how this is done (it appears that there are more 
than one query executed when a data library is rendered via the Galaxy UI).  
Determining the id of the dataset in question continues to be an ongoing 
challenge.

Which leads me to one final last question.  Could anybody tell me how the ID's 
(the ldda_id) that get returned by the Galaxy API are calculated?  It is it 
some sort of a hash of composite or primary keys from the back-end tables?  The 
reason why I am asking this is because I did a full dump of the database and 
searched (using grep) for the  ldda_id (i.e. 62e564808c5368d4), and it didn't 
exist anywhere in the database (I was surprised by this).

If anybody out there has programmatically deleted a data set from a Galaxy data 
library (via the API or other), or could shed some light on how to solve my 
problem, I'd love to hear from you.  Thank-you so much for your time, and 
again, I apologize for my lengthly e-mail.

Dan Sullivan
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to