How about if there were a completely separate daemon that monitored the galaxy
database periodically to determine what datasets belong to which user(s). Then
it would move the actual dataset to an area owned by the user and group
accessible to galaxy, replacing the dataset with a symlink. This would require
no changes to the galaxy build, but it would require a constant monitoring
There is already a mechanism for users to move their files into a joint
user/galaxy directory, but it is (as far as I know) only allowed for libraries,
not histories. It would be better if there were a way for users to browse
through their own directories as a tool, and be able to load files directly
into their history.
On May 15, 2012, at 7:40 PM, Josh Nielsen wrote:
> Please forgive the length of this proposition as I try to explain my
> reasoning behind this. Let me say first of all that I understand that Galaxy
> is not meant to be everything to everyone and that requests for features may
> not suit everyone who uses Galaxy. That being said I have an idea or request
> that I think would be convenient for dealing with user's datasets from a
> file-system perspective.
> Galaxy has the obvious benefit and advantage (compared to manual
> job-submission for tools on a cluster) of providing an interface for using
> all the analysis tools, and the history of the operations done on your data,
> all in one place. However I have found that putting all the output & datasets
> in one directory (the files/000/ directory) on the file-system causes a
> problem for the users if they specifically want to interact with it *on the
> file-system*, and not just through the Web interface - for whatever
> complicated or diverse reasons.
> Since Galaxy runs on a cluster of its own in our environment, and we do not
> allow users to remote connect into it to submit manual jobs (and individually
> output it to their separate home directories) like we do our main cluster, it
> is essentially a black box beyond the GUI interface of Galaxy. That is
> essentially what we want except for how they can interact with the output
> The issue is that our users would like an easy means of copying their files
> off of the Galaxy cluster to other servers from a command line (possibly even
> automated by scripts). Even if we allow an FTP share of the output directory
> for users to do that, the common [galaxy-dist]/database/files/000/ directory
> clumps all of the files for all users together in one directory and uses a
> sequential file-naming scheme (dataset_N++) that is not easy to discriminate
> between as to who the owner is for each file.
> Is there a way that the dataset output directory locations could be designed
> (or set optionally?) like the FTP upload feature's expected directory
> structure: where the files are dropped into the corresponding subdirectory of
> the user who produced it? For example having under database/files/
> subdirectories named according to the user's Galaxy account id (like
> [galaxy-dist]/database/files/jsmith, [galaxy-dist]/database/files/sparker,
> etc.). If they could be segregated by user it would be much easier to keep
> track of what datasets belong to whom on the file-system. Then I could
> possibly set up a read-only FTP share to the files/ directory on the cluster,
> from which the users could directly copy the files in their personal
> subdirectory to other systems, and perhaps batch download them, rather than
> having to rely solely on the Web interface.
> I understand that the way Galaxy is currently designed is that the files are
> just generically named (the "behind-the-scenes" handling of data is a black
> box) and it is the database that keeps track of which files belong to whom,
> and which has the metadata for more meaningful dataset/job names, etc. But a
> file-system hierarchy alternative would also be welcome in a heavily
> command-line oriented computational environment too.
> Would setting up a more user-representative output directory hierarchy on the
> file-system like that be possible?
> Best Regards,
> Josh Nielsen
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: