On Jan 8, 2013, at 1:27 PM, Shantanu Pavgi (Campus) wrote:
> I am trying to understand how FTP and data library upload options are working
> in Galaxy. When a non-binary file is uploaded through FTP option, it goes
> through three move operations:
> 1. First it is copied to a temporary namespace line-by-line converting
> 2. Then the temporary file is moved back to the FTP directory with the same
> name 
> 3. Later the newline sanitized FTP file is moved to datasets directory
> These move operations in Python are carried as copy and delete tasks. I don't
> see the same approach being taken with data libraries or other file-system
> import/upload options. I looked at library_common code, but I couldn't follow
> it . I was wondering if someone could help in understanding how file
> upload is implemented for different upload mechanisms and datatypes.
> Also, can FTP upload option reduce the number of move operations? For
> example, can the original FTP file or temporary file copied/moved directly to
> the datasets directory? This will be helpful in supporting FTP-type upload
> where 'galaxy' user isn't the primary owner of user's files (move operations
> perform chmod and it requires primary ownership)
The code in question is actually in the upload tool,
tools/data_source/upload.py. In general, you should be able to minimize the
number of copy and delete steps if you put new_file_path, file_path, and
ftp_upload_dir in the same filesystem.
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: