Re: [galaxy-user] Barcode splitter on paired end data

Jennifer Jackson Tue, 09 Apr 2013 14:21:17 -0700

Hi Veranja,

I am going to try to address all questions in one go since they are allin the same thread. Next time though, it would be best send newquestions as a brand new question, not as a reply with just the subjectline changed. This helps us greatly with tracking and other users whensearching prior posts.

In the first email you seemed to have some trouble with the format ofyour custom reference genome, but later in the second email this seemsto be resolved, at least as far as format is concerned (SAM->BAMconversion is possible using this genome, in Galaxy?). I am going topoint you to our help for custom reference genomes, and if you clickthrough to the main page there is a table with detailed formattroubleshooting help. But, I will tell you first that I do not believethat this is going to be helpful for your overall goals, if I amunderstanding correctly.


But, here is the link:
http://wiki.galaxyproject.org/Support#Custom_reference_genome

Your reference genome sounds as if it is not really a reference genomebut instead more of a collection of short read sequences? If this numberis very large, and the sequences are very short, you will likely runinto memory or related indexing problems with many tools. There reallyisn't an easy way around this. You could try taking the analysis to acloud version of Galaxy and scaling up the memory to see if that helps.You also might try breaking the job up into smaller jobs - you mentionedthat the data is from multiple genomes - perhaps split by genome. Butyou will have to test this - I don't know the actual profile of yourdata. I can let you know that using purely a short read dataset, inparticular one that has redundancy, will be problematic, likely nomatter what is attempted. Some assembly or other strategy is likelyrequired to move forward.


Galaxy CloudMan:
http://usegalaxy.org/cloud

For the last question, different tools are probably expected to vary abit in the results since they use a different method. If you want tocompare datasets, using identifiers would be a good way. Convert thefiles to tabular, cut out the identifiers, compare these to finddifferences, then adjust the tabular files as needed, and convert backto fastq/fasta. Tools to do these sorts of functions are in the toolgroups "Text Manipulation", "FASTA manipulation", "Filter and Sort, andJoin", "Subtract and Group", "NGS: QC and manipulation". I know thatseems like a lot of places to look - but use the tool search at the topof the tool panel and search by data type or tool name to make findingthese easier, for example "Cut" or "Join" or "Tabular" - these toolshave the names you would probably expect them to have and tool help isdirectly on each form. Our 101 tutorial also would be a goodintroduction for an overview: https://main.g2.bx.psu.edu/u/aun1/p/galaxy101


Hopefully this gives you some helpful information to work with,

Jen
Galaxy team

On 4/8/13 7:21 PM, Veranja Liyanapathirana wrote:

Dear all,
I was using the barcode splitter on Miseq paired end reads, however Iam not sure if I did it correctly as the results I get in terms of thenumber of reads alocated per each barcode does not tally with theresutls obtained by the our service provider by one of their in-housescript based methods. I use it for splitting some inhouse barcodes. Ineed to make sure that read 1 and read 2 are split in to the samegroup, and drop the sequences where this criteria is not met. Not surehow to get about doing this. Would using FASTQ joiner on the two readsand subsequent splitting work?
Thank you,
Kind Regards,
Veranja
*From:* Veranja Liyanapathirana <[email protected]>
*To:* galaxy-user <[email protected]>
*Sent:* Saturday, 6 April 2013, 23:13
*Subject:* Error in creating Depth of Coverage files after Bowtie forIllumina alignment
Dear Galaxy team/ users,
I am sorry to spam the thread again but I still could not figure outwhat is worng with my work flow and need some help.As mentioned earlier, I use Miseq reads, demultiplex for an inhousebarcode using barcode splitter, re-upload and map with a ref sequencethat is consisting of multiple short reference sequences. The workflow goes well up to this stage, conversion from SAM to BAM afterfiltering the SAM files also fine but I can not use the GATK depth ofcoverage tool to get the alignment data or create pileups. An errorcomes up in all instances.
I would really appreciate any inputs in to this.
Thanks a lot,
Veranja Liyanapathirana
Graduate Student (Microbiology)

*From:* Veranja Liyanapathirana <[email protected]>
*To:* galaxy-user <[email protected]>
*Sent:* Thursday, 4 April 2013, 6:39
*Subject:* Using segments of sequences as a reference genome - Bowtiefor Illumina
Dear all,
My problem seems like something that should have a very simplesolution from my end and due to my lack of knowledge inbioinformatics, I am probably messing up with the workflows. Theexperiment I run is one where we used Miseq to sequence amplicons of amultiplex PCR. We introduced an inhouse barcodeto our PCR products viaan adaptor.Miseq data was demultiplexed for the Illumina barcodes using Miseqreporter on intrument software by our service provider and I am tryingto run the rest of the process on Galaxy web port with no commandprompt programming.The data for R1 and R2 was imported, and then I used barcode splitterto de-multiplex the amplicons after quality triming. (I did not useFASTQ groomer as Miseq data is supposed to be Sanger FastQ thanIllumina).Then the sequence trimmer was used to trim the barcode+adaptorsequences. The results of this were re-uploaded and designated asFASTQ for alignment.Now for the reference genome, as our aplicons are of from differentsequences, we have segmented FASTA sequences in one file withdifferent FASTA identifiers. When this file was input as the referencegenome and mapping was performed using Bowtie for Illumina, themapping went on with no errors.I could filter the alignment file using SAM filters too. But I can notdo any more downstream visualozations, not even SAM to BAM conversion.I suspect that this may be due to an error in the way that thereference genome was formulated but can not get around to figure itout. I would be extremely grateful if you could help me with thisissue. I tihnk if I string together the sequences as one it wouldwork, but converting this back for interpretation becomes an issue then.
Thank you,
Kind Regards,
Veranja
Veranja Liyanapathirana
Graduate Student (Microbiology)






___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Barcode splitter on paired end data

Reply via email to