Re: [galaxy-user] [galaxy-bugs] Galaxy tool error report from taka.nakada@nifty.com

Ross Lazarus Fri, 18 Feb 2011 07:16:12 -0800

Hi, Taka,

I noticed that the full manhattan plot looks odd in the history I have
shared with you, and I think it's because the offsets for some of your
snp are wrong.

For example, the very last marker in chr1 in your data is rs11488669.
In your data, the offset is 2147483647 which is way beyond the end of
chr1 - the genome is only 3B  base pairs - so the manhatten plot looks
clumpy instead of uniform.

According to genome.ucsc.edu it is at chr1:153517269-153517769

I'm going to guess that your data (eg the map file) has at some stage
been changed using spreadsheet software such as excel which can easily
do strange things to numeric columns.

If all your processing is inside Galaxy, these kinds of errors can be
prevented. I can see you have tried unsuccessfully to upload some
plink lped files in the history you shared - here's some information
that might help you from a previous enquiry on galaxy-user a few weeks
ago:

==============================================
Hi, Sylvian,

The plink/rgenetics lped and pbed (compressed) formats are special
'composite' Galaxy datatypes because the map and pedigree/genotype
files need to be kept together correctly inside Galaxy. As a result,
the upload tool requires that the file type be specified so all of the
components can be properly uploaded and stored together.

For example, to upload pbed data from your local desktop, choose
'Upload file' from the Get Data tools.

When the upload form appears, the trick is that you *must* change the
default 'Autodetect' in the first (filetype) select box to the
specific rgenetics datatype - either 'pbed' as the format for
compressed plink data (or 'lped' for uncompressed plink genotype data)
as the very first step. Type the first few letters into the first box,
and select the right one from the list that appears.

Once this is done, you will see that the upload tool form will change
to show three separate file upload inputs - one each for the plink
xxx.bim xxx.bed and xxx.fam where xxx is the name you set when you ran
plink to create the files, or for uncompressed linkage format two
separate file upload inputs - the plink .ped and .map files.

Now you can  browse for the corresponding file for each input box from
your local machine - be careful not to mix them up as the upload tool
is unable to tell unfortunately.

At the bottom of the form, I suggest you then change the genome build
to the appropriate one (eg hg18 or hg19).

Finally, I'd recommend that you change the 'metadata value for
basename' (which will be the new dataset name) to something that will
remind you what the data are - something more meaningful than the
default 'rgenetics'.

Click 'execute' to upload the data and create the new dataset in your
history.  Compressed (pbed) format is preferred so the upload is
quicker.

Note that some tools will autoconvert between lped and pbed so there
is a delay the first time some tools are run on a new dataset. There
are built in converters (use the pencil icon) also if you need them.

I hope this helps - thanks for using Galaxy and Rgenetics - please let
us know how you go and feel free to contact me if you have other
questions.

On Fri, Feb 18, 2011 at 9:26 AM, Ross Lazarus
<ross.laza...@channing.harvard.edu> wrote:
> Hi, Taka.
>
> Thanks for trying the tool. Sorry to hear you are having problems with
> your data.
> Unfortunately, the history associated with this error does not have
> any datasets with data in the format required for the Manhatten/qq
> plot tool.
>
> The file you were attempting to use was a bed file. In fact it is not
> even a valid bed format file because it has spaces instead of tabs as
> delimiters. The tool is unable to parse the header row correctly so
> you have the error about an index out of range on the header row.
>
> As the tool form explains, the input required is:
> "Tabular Data is a tab delimited header file with chromosome, offset
> and p values to be plotted"
>
> I tried changing the datatype from bed to tabular but discovered that
> you have spaces as delimiters! So, I downloaded and repaired your
> dataset by converting the delimiters to tabs interactively in python
> so it is now in the required format and uploaded the first few
> thousand rows to the original history and plotted it to check that the
> tool works as expected. I also ran the plots for the entire million
> rows and the plot is in the history.
>
> I have shared the new history with you and attached is a low-res
> version of the resulting output from the first few thousand p values.
> You should be able to find the history by choosing 'options' from your
> current history then 'histories shared with me'.
>
> If you can ensure that the data are in the correct format (only use
> tabs as delimiters and have a header row) then the tool should be able
> to perform correctly.
> Data in any other format is likely to cause the tool to crash.
>
> Thanks for using Galaxy - I hope it is useful for your research.
>
> In case you need to fix any other defective files, here's what I did:
>
> rerla@rosst61:~/Downloads$ python
> Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)
> [GCC 4.4.5] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> f = 'nakada.tab'
>>>> bad = open(f,'r').readlines()
>>>> sbad = [x.split() for x in bad]
>>>> good = ['\t'.join(x) for x in sbad]
>>>> good[:3]
> ['CHR\tSNP\tBP\tA1\tF_A\tF_U\tA2\tCHISQ\tP\tOR',
> '1\trs28659788\t713170\tG\t0.04094\t0.03725\tC\t0.08434\t0.7715\t1.103',
> '1\trs3094315\t742429\tG\t0.1754\t0.1533\tA\t0.835\t0.3608\t1.175']
>>>> o = open(f,'w')
>>>> o.write('\n'.join(good))
>>>> o.close()
>
>
> On Fri, Feb 18, 2011 at 8:32 AM,  <galaxy-b...@bx.psu.edu> wrote:
>>
>> GALAXY TOOL ERROR REPORT
>> ------------------------
>>
>> This error report was sent from the Galaxy instance hosted on the server
>> "main.g2.bx.psu.edu"
>> -----------------------------------------------------------------------------
>> This is in reference to dataset id 2071026 from history id 485682
>> -----------------------------------------------------------------------------
>> You should be able to view the history containing the related history item
>>
>> 12: Manhattan_and_QQ_plots.html
>>
>> by logging in as a Galaxy admin user to the Galaxy instance referenced above
>> and pointing your browser to the following link.
>>
>> main.g2.bx.psu.edu/history/view?id=df22bcb1488553c7
>> -----------------------------------------------------------------------------
>> The user 'taka.nak...@nifty.com' provided the following information:
>>
>> Hi,
>>
>> I am trying to run Manhattan and QQ plots using PLINK file, but have this 
>> error.  Would you please let me know how to solve it.
>>
>> Thanks
>> Taka-aki Nakada
>> -----------------------------------------------------------------------------
>> job id: 1813212
>> tool id: rgManQQ1
>> -----------------------------------------------------------------------------
>> job command line:
>> python /galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py 
>> '/galaxy/main_database/files/002/070/dataset_2070806.dat' "Manhattan and QQ 
>> plots" 
>> '/galaxy/main_database/tmp/job_working_directory/1813212/galaxy_dataset_2071026.dat'
>>  
>> '/galaxy/main_database/tmp/job_working_directory/1813212/dataset_2071026_files'
>>  '0' '2' '8' 'false'
>> -----------------------------------------------------------------------------
>> job stderr:
>> Traceback (most recent call last):
>>  File "/galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py", line 
>> 318, in <module>
>>    main()
>>  File "/galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py", line 
>> 287, in main
>>    rlog,flist = 
>> doManQQ(input_fname,chrom_col,offset_col,p,title,grey,ctitle,outdir)
>>  File "/galaxy/home/g2main/galaxy_main/tools/rgenetics/rgManQQ.py", line 
>> 219, in doManQQ
>>    newhead = [ohead[chrom_col],ohead[offset_col]]
>> IndexError: list index out of range
>>
>> -----------------------------------------------------------------------------
>> job stdout:
>>
>> -----------------------------------------------------------------------------
>> job info:
>> None
>> -----------------------------------------------------------------------------
>> job traceback:
>> None
>> -----------------------------------------------------------------------------
>> (This is an automated message).
>> _______________________________________________
>> galaxy-bugs mailing list
>> galaxy-b...@lists.bx.psu.edu
>> http://lists.bx.psu.edu/listinfo/galaxy-bugs
>>
>
>
>
> --
> Ross Lazarus MBBS MPH
> Associate Professor, HMS; Director of Bioinformatics, Channing Laboratory;
> 181 Longwood Ave., Boston MA 02115, USA. Tel: +1 617 505 4850;
> Head, Medical Bioinformatics, BakerIDI;  PO Box 6492, St Kilda Rd Central;
> Melbourne, VIC 8008, Australia; Tel: +61 385321444
>

-- 
Ross Lazarus MBBS MPH
Associate Professor, HMS; Director of Bioinformatics, Channing Laboratory;
181 Longwood Ave., Boston MA 02115, USA. Tel: +1 617 505 4850;
Head, Medical Bioinformatics, BakerIDI;  PO Box 6492, St Kilda Rd Central;
Melbourne, VIC 8008, Australia; Tel: +61 385321444

_______________________________________________
The Galaxy User list should be used for the discussion
of Galaxy analysis and other features on the public
server at usegalaxy.org. For discussion of local Galaxy
instances and the Galaxy source code, please use the
Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other
Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] [galaxy-bugs] Galaxy tool error report from taka.nakada@nifty.com

Reply via email to