[galaxy-user] Text Editing

Jennifer Jackson Fri, 09 Dec 2011 10:37:23 -0800

Hello Luce,

I can explain the use of the tools "Text Manipulation". For each fileindependently, the following steps will rename the "name" identifier incolumn 4. I don't believe that there a more direct method, but you maydiscover one. This type of customization is why the tools are distinct -so they can be used in sequence to do many of the same textmanipulations as on the unix line command. There is a biosed command aspart of EMBOSS, but that tool works on sequence text, not text files ingeneral.

To save time in the future, these steps can be put into a workflow, witha edit of step 2 to customize the new ID text as needed when run.


Starting with a 5 column MACS BED file:

1 - Save the track header line with the tool "Select first lines from adataset" with the option to save the line 1.

2 - Create the working dataset that does not include the first line withthe tool "Remove beginning of a file" with the option "Remove first: 1"lines.

3 - Split of up the existing ID with the tool 'Convert delimiters toTAB" using the "Underscores" option.

This will split the fourth "name" column into three distinct columns,the last new column will be using to create the new ID.

4 - Create a column in your file named "treatment1_peak_" with the tool"Add column to an existing dataset"

This will create an extra column at the end of the BED file, to be usedin the new ID.


The file should now be:
c1 - chrom
c2 - start
c3 - end
c4 - the text "MACS"
c5 - the text "peak"
c6 - the text will be a number, second part of the new ID
c7 - score
c8 - the text "treatment1_peak_"
     (or "treatment2_peak_" if the second file)

5 - Merge the two ID portions with the tool "Merge Columns together"using the option of merging column c8 with c6.

This will create a new field, c9, with the text "treatment2_peak_N"where "N" is whatever the number in c6 was, per row.

6 - Create the new BED file, putting the new "name" column in thecorrect order and omitting the columns not needed, using the tool "Cutcolumns from a table" and pasting into the "Cut columns:" box the thistext (no quotes):


c1,c2,c3,c9,c7

7 - Add in back the track line (removed in step 1) with the tool"Concatenate datasets tail-to-head" with the options set to concatenatethe output of step#1 as the first file and the output of step 6 as asecond file.

8 - Use the Edit Attributes form to change the file type back to BED andassign all five columns to the proper attribute (click on pencil icon toreach form).

Hopefully this is will work (it did for my test) or is enoughinformation for you to worked out the exact steps for your particulardatasets. Next time, please send data/tool questions directly "to" the[email protected] mailing list. Replies should be send "reply-all".The outreach account is for other purposes.


[email protected] wrote:

> I have two ChIPSeq datasets, and I am trying to find the common anddistinct peaks between them and visualize them. I end up with a MACS bedfile for each (listing a bunch of MACS_peaks). I then use the Intersectand Subtract tools from the Genomic Intervals tab and end up with thepeaks I want. However, because of the way that MACS names its peaks,there can end up being some peaks named the same way in both files(because, for example, peak 20 in file1 is from position 300,000-300,500but peak 20 in file 2 is from position 320,000-320,500). So, I can endup with multiple peaks with the same name. Because all the peak nameshave the same form, it can also be difficult to tell them apart whenvisualizing them in the UCSC Genome Browser.

> What I would like to do is to be able to edit the bed file to changethe text MACS_peak_<number> to, say, treatment1_peak_<number> so thatpeak 20 would now still be numbered 20 in both files, but would have adifferent label. This would be pretty easy to do using regularexpressions and sed.

> I know there have been a few posts about text manipulation, and Iknow that there is a text manipulation tab, but I can't seem to find aneasy way to do what I want to do.

>
> Any advice?
>
> Thanks, luce


--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

[galaxy-user] Text Editing

Reply via email to