Hiram and All, Granted that the sequences that were obviously not human were not of interest to the assemblers, but they do represent a wealth of information on a variety of levels. So the general consensus is that these non-human contaminant sequences are unavailable, correct?
As a rule, I keep all of my data regardless of my own interest as it often times is useful for other reasons much later. I would be disappointed if the assemblers of the human genome would have trashed this data. In the mean time I will do as you suggest Hiram and wade through the old ChrUn for the odd contaminant sequence, but so far (I've analyzed approximately 1/3 of the total data) they seem to be human derived sequences. Thank you again for the assistance, Jeremy On Apr 2, 2009, at 10:27 AM, Hiram Clawson wrote: > Good Morning Jeremy: > > I believe you have answered your own question. The contamination > sequences have been removed by the assemblers. They do this by > checking a sequence in question with the contents of all sequences > in genbank. The chrUn sequences in earlier human assemblies should > also be free of contamination, or else it would be unknown > contamination. > The newer human assemblies are free of chrUn since the sequence has > either > been localized to at least a chromosome, or it was identified as > contamination > and been thrown away. I guess you could take the previous chrUn > sequences, > break it up into small pieces, and then blat it against current > assemblies. Bits that do not match could be this contamination you > are looking form. > > The latest assembly, currently under construction here: > http://genome-test.cse.ucsc.edu/cgi-bin/hgGateway?db=hg19 > has a number of unplaced and unlocalized bits that normally would > have been put together into the chrUn. For hg19 we are not > going to place them in chrUn. You will see their names as: > chrUn_gl000nnn > take a look here: > http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks? > db=hg19&chromInfoPage= > > --Hiram > > Jeremy Ellis wrote: >> Hiram and All; >> I appreciate the responses. I am interested in the bacterial >> contamination sequences as indicated here in section VI: >>> http://www.ncbi.nlm.nih.gov/genome/assembly/assembly.shtml >> It states: >> "contamination: All assemblies should be screened for foreign and >> vector sequences. The source of these foreign sequences can range >> from bacterial genome contamination (due to propagating clones in >> bacteria) to contamination from other projects being sequenced at >> a particular sequencing center." >> Not all of these contaminant sequences would be from the bacteria >> that the clones were propagated with, but there are likely >> sequences from normal bacterial/organismal flora from the donor >> human that were cloned and sequenced as well (not to mention >> purely random genomic fragments from a wide variety of sources >> (pollen, water contamination, etc). >> I have looked through ChrUn from both hg16 and hg15 (hg17 and 18 >> do not have the ChrUn data) and there does not appear to be any of >> the bacterial contaminant sequences in this data (it looks like it >> is information from rare PCR products and other cloning >> artifacts). So, my question is simply, "Where are the non-human >> contaminant sequences?". >> I hope this clarifies my question. >> Jeremy >> On Apr 1, 2009, at 4:57 PM, Hiram Clawson wrote: >>> Good Afternoon Jeremy: >>> >>> You may find the following discussion of interest: >>> >>> http://www.ncbi.nlm.nih.gov/genome/assembly/assembly.shtml >>> >>> >>>> Jeremy Ellis wrote: >>>>> Hello all again. I appreciate the responses I had for my >>>>> first question and they helped. I have been looking through >>>>> ChrUn from the earlier assemblies and I now realize that this >>>>> isn't quite what I expected. Most of these sequences (so far) >>>>> appear to be odd human- like sequences due to a variety of >>>>> probable reasons (PCR/cloning artifacts, etc). I think that >>>>> the sequences I am interested in is the stuff that might have >>>>> been thrown out as it appeared to be a contaminant sequence >>>>> from bacteria, fungi, or water borne protozoa, etc. Would >>>>> these sequences have been long since disposed of and ignored >>>>> or could there still be hope for me in finding a treasure >>>>> trove of "garbage" sequence? >>>>> >>>>> Thank you again for your help! >>>>> >>>>> J. >>> >> Jeremy Ellis >> [email protected] >> 949-824-1223 >> Arora Lab >> Developmental and Cell Biology >> University of California, Irvine > Jeremy Ellis [email protected] 949-824-1223 Arora Lab Developmental and Cell Biology University of California, Irvine _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
