Hi Mike,

Yes, to get the .wig file out of the data, select the first "892177" lines. (Selecting "892178" would include the second track line, which you don't want).


After looking one more time, not all data appears to be .wig. This is a multiple track group file, labeled as .wig, but the second track is .bed, not .wig. The data didn't look right at the first pass examination (the second track line didn't have the "type=wiggle_0" declaration), which is why I thought it would be a good idea to contact the data authors in my original reply and not attempted to manipulate the data yourself (instead ask them to have it reviewed and resubmitted, or at least confirmed). It now is pretty clear what the merge consists of = .wig + .bed. If you really wanted to try to use the data as-is, I would start by interpreting/labeling the first track as .wig, second track as .bed (once split), and carefully examining the results from any research you perform with it.

Apologies for the complicated file analysis,

Jen
Galaxy team

---

Details about why the first track looks like a .wig file, the second track looks like a .bed file. NOTE: these example data have line counts added for clarification. When you select the data to create working files, use the original dataset without line counts.

All "variable Step" declaration lines are before the second track line at 892,178, and after that the file continues to line 925,183 in bed format.

- Select on "Step"
variableStep chrom=chr11 span=25        2
variableStep chrom=chr10 span=25        74501
variableStep chrom=chr13 span=25        119959
variableStep chrom=chr12 span=25        152353
variableStep chrom=chr15 span=25        185476
variableStep chrom=chr14 span=25        224351
variableStep chrom=chr17 span=25        253339
variableStep chrom=chr16 span=25        298007
variableStep chrom=chr19 span=25        325068
variableStep chrom=chr18 span=25        352583
variableStep chrom=chrM span=25 377622
variableStep chrom=chr1 span=25 378109
variableStep chrom=chr3 span=25 431654
variableStep chrom=chr2 span=25 468728
variableStep chrom=chr5 span=25 538115
variableStep chrom=chr4 span=25 600376
variableStep chrom=chr7 span=25 663953
variableStep chrom=chr6 span=25 726093
variableStep chrom=chr9 span=25 770436
variableStep chrom=chrX span=25 819431
variableStep chrom=chr8 span=25 830175

- Select first lines from dataset=10
- http://genome.ucsc.edu/goldenPath/help/wiggle.html
track type=wiggle_0 visibility=full name="Smc3_mES" autoScale=on color=100,0,100 1
variableStep chrom=chr11 span=25        2
3000251 0.6     3
3000276 1.5     4
3000301 1.6     5
3000326 1.7     6
3000351 1.7     7
3000376 1.7     8
3000401 1.7     9
3000426 1.6     10

- Select last lines from a dataset= 33006 (calculated from 925183-892178+1)
- http://genome.ucsc.edu/FAQ/FAQformat.html#format1
track visibility=dense name="Smc3_mES enriched regions - 1e-09" color=100,0,100 892178
chr11   3023275 3023700 892179
chr11   3028200 3028225 892180
chr11   3039225 3039275 892181
chr11   3040500 3040525 892182
chr11   3070325 3070375 892183
chr11   3080650 3080675 892184
chr11   3085850 3085950 892185
chr11   3097450 3097475 892186
chr11   3190200 3190275 892187
(...more until end of file...)


On 4/16/12 9:02 AM, Michael Sikes wrote:
Jen,

A couple of uninformed questions. I gather from your response that the
author lab submitted a multiple track group .wig file instead of a
single track group .wig file, and that I need to generate a single track
group file before the bigwig conversion will work. So, with regard to
the instructions below, I am to run the text manipulation on the
original author submitted .wig file. Then run "filter and sort--Select
lines that match an expression" on the newly created file that:
"Matching" the pattern: "track". This generates yet another file that
has the following info:

88: Select on data 87 <https://main.g2.bx.psu.edu/history>
1 line, 1 comments
format: wig, database: mm8
Info: Matching pattern: track
<https://main.g2.bx.psu.edu/datasets/8997307e4b7b843c/display?to_ext=wig><https://main.g2.bx.psu.edu/datasets/8997307e4b7b843c/show_params><https://main.g2.bx.psu.edu/tool_runner/rerun?id=7048469><https://main.g2.bx.psu.edu/history>
<https://main.g2.bx.psu.edu/tag/retag?item_id=8997307e4b7b843c&item_class=HistoryDatasetAssociation><https://main.g2.bx.psu.edu/dataset/annotate?id=8997307e4b7b843c>


track type=wiggle_0 visibility=full name="Smc3_mES" autoScale=on 
color=100,0,100      1
track visibility=dense name="Smc3_mES enriched regions - 1e-09"  
color=100,0,100      892178

Is the number 892173 the number of track lines? If so, do I then do the
"Remove beginning of a file" using 892178 on the original author .wig file?
Mike



On Apr 16, 2012, at 10:35 AM, Jennifer Jackson wrote:

Hi Mike,

I apologize if I wasn't clear, but the 'Select' was to show you how
to identify the multi-track group wig files. I wanted to give you a
way to screen similar files going forward.

The wig-to-bigWig program in Galaxy comes from UCSC. It accepts .wig
files with a single track group as input:
http://genome.ucsc.edu/goldenPath/help/bigWig.html (see step #1)

The data author lab can either submit the data as single track group
.wig files, or, if you are confident that the multiple track group
.wig format is expected and OK from this source, split the file.
There are no specific tools in Galaxy to do this, but something like
this would work:

- Text Manipulation -> "Add column", "1", Iterate? = yes
- "Select", "track"
- note the line number of track lines
- "Remove beginning of a file", using line numbers, and the
-original- .wig file, to break up into individual .wig files.

Good luck!

Jen
Galaxy team

On 4/16/12 6:57 AM, Michael Sikes wrote:
Jennifer,

Thanks for your help. I ran the filter and sort tool as advised, and
then ran the wig to bigwig on the new history item generated by the
filter. This time I got a different error:
84: Wig-to-bigWig on data 83 <https://main.g2.bx.psu.edu/history>
0 bytes
An error occurred running this job:/stdin is empty of data
Error running wigToBigWig.
/
<https://main.g2.bx.psu.edu/dataset/errors?id=6818347><https://main.g2.bx.psu.edu/datasets/0f70746579b165e2/show_params><https://main.g2.bx.psu.edu/tool_runner/rerun?id=6818347>
<https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/display/?preview=True><https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/edit><https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/delete?show_deleted_on_refresh=False>
83: Select on data 49 <https://main.g2.bx.psu.edu/history>
1 line, 1 comments
format: wig, database: mm8
Info: Matching pattern: track
<https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/display?to_ext=wig><https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/show_params><https://main.g2.bx.psu.edu/tool_runner/rerun?id=6818275><https://main.g2.bx.psu.edu/history>
<https://main.g2.bx.psu.edu/tag/retag?item_id=b4fb2e8c767b4258&item_class=HistoryDatasetAssociation
<https://main.g2.bx.psu.edu/tag/retag?item_id=b4fb2e8c767b4258&item_class=HistoryDatasetAssociation>><https://main.g2.bx.psu.edu/dataset/annotate?id=b4fb2e8c767b4258>


Again, I'm sure I left off something obvious. Could you tell me what I
did wrong?

Thanks,
Mike

On Apr 13, 2012, at 1:27 PM, Jennifer Jackson wrote:

Hi Michael,

This particular .wig file has a data format problem that is the root
cause of the conversion error. Specifically, there is an extra track
line in the file. This can be found using unix tools with a grep or in
Galaxy with the tool "Filter and Sort -> Select" by matching the
pattern "track".

Ideally this would be corrected and resubmitted by the data author
before use, since how/why this was inserted and what impact it has
would need to be examined.

Since you noticed problems with other GEO files (conversion problems),
verifying the .wig format and making any necessary corrections would
also be advised.

Hopefully this helps!

Best,

Jen
Galaxy team


Michael Sikes, Ph.D.
Associate Professor of Immunology
North Carolina State University
Microbiology Department
4524A Gardner Hall
Campus Box 7615
Raleigh, NC 27695
Ph: 919-513-0528
Fax: 919-515-7867
email: mlsi...@ncsu.edu <mailto:mlsi...@ncsu.edu>


--
Jennifer Jackson
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

Reply via email to