Re: [galaxy-user] Identifying Tags - Galaxy Question

Jennifer Jackson Wed, 18 Sep 2013 15:20:58 -0700

Hello Dominique,

Yes, this can be done. Here is the process ->

Start by splitting up the data by using the 'NGS: QC and manipulation ->Barcode Splitter" tool. The result files will be available as links.These can be copied and added to the "Get Data -> Upload File" tool inthe text box, in batch, and each will loaded as a dataset. Copying theminto a simple text file, then pasting into the Upload tool all at onceis a quick way to do this, or you can do one by one.

Once you have the individual files as datasets, you probably will wantto rename them to better keep track of which barcode/tag they represent.Click on the pencil icon in the upper right corner of each dataset to dothis on the Edit Attributes form.

Next, the idea is to convert the fasta dataset to tabular, add in acolumn with the "_Tag1" information, merge the original identifiercolumn with the new tag column, cut the columns to rearrange - (you wantjust the new merged identifier and the original fasta sequence - leavingbehind the two columns with the original identifier + tag), then covertback from tabular to fasta format. Use the tools in 'Text Manipulation'and 'FASTA manipulation' to do these operations. I would normallysuggest creating/using a workflow at this point, but as the tags willall be different, and the "Add column" step is in the middle of theprocessing, this is probably not worth it.


Hopefully this helps!

Jen
Galaxy team

On 9/18/13 7:36 AM, D. A. Cowart wrote:

Hello,
I need to perform an action (or series of actions) on an 454 datasetusing Galaxy, and have not been able to figure out the necessarysteps, even after looking through the toolbar expressions and usingcustom search.
My file is a fasta and has the standard format:

>GNJQDEZ01A940A
CTGAGTCAGGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC
ATGTTA
>GNJQDEZ01BJYQZ
CTGAGTCAGGTCAACAATCATAAGACATCGGCTCTCTATATTTAATATTGGT
Each of the 100,000 sequences within this file contains a specifictag, which is the first 8 nucleotides.There are 19 tags total. I would like to identify these tags and addan identifier of the tag to the sequence name.Therefore, if I am looking for the first tag (CTGAGTCA), the outputwould look like:
>GNJQDEZ01A940A_*Tag1*
*CTGAGTCA*GGTCAACAATCATAAGATATTGGCACCATGTACCTGTGGTTCTCGTTTCC
ATGTTA
Is it possible to achieve this using Galaxy? If possible, could youkindly suggest tools to use.
Thank you in advance,
Dominique Cowart


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Identifying Tags - Galaxy Question

Reply via email to