Hello Andreas, We have put together some tools for you. These instruction should help your bioinformatics team get things set up for you.
General idea: Get program, compile, run using a ".gcg" file which is an expanded version of your regular expression. You can create similar .gcg files as needed and run against any genome of your choosing. Location of the .gcg file for McrBC: http://hgwdev.cse.ucsc.edu/~aamp/ Sequence (genomic) files (inside of each genomes sub-folders): http://hgdownload.cse.ucsc.edu/downloads.html Ftp: http://genome.ucsc.edu/FAQ/FAQdownloads#download1 http://genome.ucsc.edu/FAQ/FAQdownloads#download32 Downloading our source: http://genome.ucsc.edu/FAQ/FAQdownloads#download27 Creating/loading a custom track: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks http://genome.ucsc.edu/goldenPath/customTracks/custTracks.html Program to do the search: findCutters - Find REBASE restriction enzymes using their GCG file usage: findCutters rebase.gcg sequence output.bed where "sequence" is a .fa, .nib, or .2bit file options: -justThis=enzyme Only search for this enzyme. -justThese=file File of enzymes (one per line) to restrict search. -countsOnly Only output the # of times each enzyme is found in the sequence in a simple 2 column file. -consolidateCounts This option is used in the situation that a bunch of output files have been created and cat'ed together (Like after a cluster run). The program usage then changes to: findCutters -consolidateCounts input.counts output.counts NOTE: a proper GCG file is the one available from NEB, using a command like: curl -A "Mozilla/4.0" http://rebase.neb.com/rebase/link_gcgenz > rebase.gcg To compile, use the kent libraries. cd kent/src make libs To then compile findCutters cd hg/utils/findCutters make Note, the program will expect your unix environment to have in your home directory a ./bin/x86_64 directory and a path to this location in your shell file. Any problems, please let us know, Thanks, Jennifer Jackson UCSC Genome Bioinformatics Group Weinhäusel Andreas wrote: > Dear Colleagues, > > is there an possibility to visualise potentially McrBC enzyme recognition > sites and density within UCSC-GB? > > > > McrBC has the preferred recognition seq. "RC(N*(55-103))RC" - in case if I > would search a vertebrate genome for CpG methylation the preferred > recognition seq. would be "RCG(N*(55-103))RCG" . > > > > Would be nice to get this visualized.... > > > > Greetings ANDREAS > > > > DIDr Andreas Weinhäusel > > _____________________________________ > > > > Austrian Research Centers GmbH - ARC > > Life Sciences > > Research Center: 2444 Seibersdorf, Austria > > T +43 (0) 50 550-3402, F +43 (0) 50 550-3653 > > [email protected] <mailto:[email protected]> > > http://www.arcs.ac.at <http://www.arcs.ac.at/> > > _____________________________________ > > > > http://www.lifesciences.at <http://www.lifesciences.at/> > > _____________________________________ > > > > FBN: 115980i HG Wien, UID: ATU14703506 > > > > _______________________________________________ > Genome maillist - [email protected] > http://www.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
