You can find all sequences in the NCBI trace archives: http://www.ncbi.nlm.nih.gov/Traces/home/?cmd=show&f=overview&m=main&s=overview
Everything is there. Nothing has been thrown away. You can freely search it for anything you would like to find. The human assembly is supposed to contain human sequence. All other sequences are in the trace archives. --Hiram Jeremy Ellis wrote: > Hiram and All, > > Granted that the sequences that were obviously not human were not of > interest to the assemblers, but they do represent a wealth of > information on a variety of levels. So the general consensus is that > these non-human contaminant sequences are unavailable, correct? > > As a rule, I keep all of my data regardless of my own interest as it > often times is useful for other reasons much later. I would be > disappointed if the assemblers of the human genome would have trashed > this data. > > In the mean time I will do as you suggest Hiram and wade through the old > ChrUn for the odd contaminant sequence, but so far (I've analyzed > approximately 1/3 of the total data) they seem to be human derived > sequences. > > Thank you again for the assistance, > > Jeremy > > On Apr 2, 2009, at 10:27 AM, Hiram Clawson wrote: > >> Good Morning Jeremy: >> >> I believe you have answered your own question. The contamination >> sequences have been removed by the assemblers. They do this by >> checking a sequence in question with the contents of all sequences >> in genbank. The chrUn sequences in earlier human assemblies should >> also be free of contamination, or else it would be unknown contamination. >> The newer human assemblies are free of chrUn since the sequence has >> either >> been localized to at least a chromosome, or it was identified as >> contamination >> and been thrown away. I guess you could take the previous chrUn >> sequences, >> break it up into small pieces, and then blat it against current >> assemblies. Bits that do not match could be this contamination you >> are looking form. >> >> The latest assembly, currently under construction here: >> http://genome-test.cse.ucsc.edu/cgi-bin/hgGateway?db=hg19 >> has a number of unplaced and unlocalized bits that normally would >> have been put together into the chrUn. For hg19 we are not >> going to place them in chrUn. You will see their names as: >> chrUn_gl000nnn >> take a look here: >> http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?db=hg19&chromInfoPage= >> >> >> --Hiram >> >> Jeremy Ellis wrote: >>> Hiram and All; >>> I appreciate the responses. I am interested in the bacterial >>> contamination sequences as indicated here in section VI: >>>> http://www.ncbi.nlm.nih.gov/genome/assembly/assembly.shtml >>> It states: >>> "contamination: All assemblies should be screened for foreign and >>> vector sequences. The source of these foreign sequences can range >>> from bacterial genome contamination (due to propagating clones in >>> bacteria) to contamination from other projects being sequenced at a >>> particular sequencing center." >>> Not all of these contaminant sequences would be from the bacteria >>> that the clones were propagated with, but there are likely sequences >>> from normal bacterial/organismal flora from the donor human that were >>> cloned and sequenced as well (not to mention purely random genomic >>> fragments from a wide variety of sources (pollen, water >>> contamination, etc). >>> I have looked through ChrUn from both hg16 and hg15 (hg17 and 18 do >>> not have the ChrUn data) and there does not appear to be any of the >>> bacterial contaminant sequences in this data (it looks like it is >>> information from rare PCR products and other cloning artifacts). So, >>> my question is simply, "Where are the non-human contaminant sequences?". >>> I hope this clarifies my question. >>> Jeremy >>> On Apr 1, 2009, at 4:57 PM, Hiram Clawson wrote: >>>> Good Afternoon Jeremy: >>>> >>>> You may find the following discussion of interest: >>>> >>>> http://www.ncbi.nlm.nih.gov/genome/assembly/assembly.shtml >>>> >>>> >>>>> Jeremy Ellis wrote: >>>>>> Hello all again. I appreciate the responses I had for my first >>>>>> question and they helped. I have been looking through ChrUn from >>>>>> the earlier assemblies and I now realize that this isn't quite >>>>>> what I expected. Most of these sequences (so far) appear to be >>>>>> odd human- like sequences due to a variety of probable reasons >>>>>> (PCR/cloning artifacts, etc). I think that the sequences I am >>>>>> interested in is the stuff that might have been thrown out as it >>>>>> appeared to be a contaminant sequence from bacteria, fungi, or >>>>>> water borne protozoa, etc. Would these sequences have been long >>>>>> since disposed of and ignored or could there still be hope for me >>>>>> in finding a treasure trove of "garbage" sequence? >>>>>> >>>>>> Thank you again for your help! >>>>>> >>>>>> J. >>>> >>> Jeremy Ellis >>> [email protected] >>> 949-824-1223 >>> Arora Lab >>> Developmental and Cell Biology >>> University of California, Irvine >> > > Jeremy Ellis > [email protected] > 949-824-1223 > Arora Lab > Developmental and Cell Biology > University of California, Irvine > > > _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
