Dear Hiram

Many thanks for the detailed information.  We will test your recommended 
commands and give you feedback. 

Best regards 

Jacques van Helden

Université d'Aix-Marseille (AMU). 
Lab. Technological Advances for Genomics and Clinics (TAGC)
INSERM Unit U1090, 163, Avenue de Luminy, 13288 MARSEILLE cedex 09. France
Fax: +33 4 91 82 87 01
Web:  http://jacques.van-helden.perso.luminy.univmed.fr/
Email: [email protected]




On 20 Mar 2012, at 19:17, Hiram Clawson wrote:

> Good Morning Jacques:
> 
> The data files for all multiple alignments is currently just
> under 1 Tb in size (uncompressed).  The best way for you to access that data
> in an efficient manner is to actually have the .maf files at
> your site and use the maf selection tools from the kent source
> code to extract information from those files.  It would be
> very difficult to access this information via the DAS or
> table browser interface due to the immense amount of data in the
> answer sets and the processing time to extract an answer.
> 
> There are several mechanisms you can use to obtain the maf
> files for local use.  The rsync server at hgdownload can be
> used to obtain a list of files.  For example, to obtain
> a list of the uncompressed maf files used by the genome browser:
> 
> rsync -navP --exclude 'genbank/' rsync://hgdownload.cse.ucsc.edu/gbdb/ 2>&1 \
>   | grep multiz | grep -v "^d" | egrep 'maf$' > /tmp/gbdb.maf.file.list
> 
> Alternatively, the gzipped compressed maf files from the goldenPath downloads:
> 
> rsync -navP rsync://hgdownload.cse.ucsc.edu/goldenPath/ 2>&1 \
>   | grep "multiz" | grep "maf.gz" | grep -v upstream > 
> /tmp/goldenPath.maf.gz.file.list
> 
> To select the file names from those listings:
> 
> awk '{print $NF}' /tmp/goldenPath.maf.gz.file.list > /tmp/fetch.maf.list
> 
> And then to transfer just those files:
> 
> rsync -avP --files-from=/tmp/fetch.maf.list 
> rsync://hgdownload.cse.ucsc.edu/goldenPath/  ./
> 
> The hierarchy of those files will be constructed in ./
> 
> You can now work directly with the maf files to answer all questions about 
> the alignment,
> for example, extract a list of species in the alignment:
> 
> mafSpeciesList file.maf.gz stdout
> 
> Note the maf utilities in the kent source tree:
> 
> mafAddIRows mafAddQRows mafCoverage mafFetch mafFilter
> mafFrag mafFrags mafGene mafMeFirst mafOrder mafRanges
> mafSpeciesList mafSpeciesSubset mafSplit mafSplitPos
> mafToAxt mafToPsl mafsInRegion
> 
> --Hiram
> 
> Jacques van Helden wrote:
>> Dear UCSC team,
>> First of all , thank you very much for developing and maintaining the UCSC 
>> Genome Browser, which is a great resource for all the community. We  
>> developed, since 1997, a software suite called Regulatory Sequence Analysis 
>> Tools (RSAT, http://rsat.ulb.ac.be/rsat/). For a list of supported 
>> functionalities, see  http://www.ncbi.nlm.nih.gov/pubmed/18495751
>> and the 2011 update
>>      http://www.ncbi.nlm.nih.gov/pubmed/21715389
>> We recently developed a new tool called peak-motifs, which detects 
>> transcription factor binding motifs in full collections of ChIP-seq peaks.   
>>      http://www.ncbi.nlm.nih.gov/pubmed/22156162
>> We are now extending the approach to analyze conserved motifs under the 
>> peaks. We are currently using the MAF files produced by multiz, but this 
>> requires for us to maintain a local copy of all the multiz alignemnts, which 
>> poses problems of consistency with updates of supported genomes.
>> We would thus like to establish a programmatic connection to UCSC Genome 
>> Browser, in order to dynamically retrieve multi-genome alignments of the 
>> conserved regions covered by a set of peaks (more generally, we would like 
>> to obtain the MAFs under a set of genomic coordinates specified as a bed 
>> file). We already saw how to use your DAS interface for retrieving 
>> single-organism sequences under the peaks, but we did not find the 
>> equivalent for retrieving the MAFS and the related taxonomic information. 
>> Could you indicate us if there is a programmatic access to UCSC (DAS, 
>> SOAP/WSDL, Perl modules, Python modules or anything else) that would allow 
>> us to do the following queries ?
>> 1) Return the list of organisms for which a multi-z alignment is available. 
>> Currently, we must first get (with DAS) the list of all supported organisms, 
>> and then send one request for each organism in order to know if it contains 
>> one or several multizNway attributes).
>> 2) Given the name of a reference organism, obtain the list of other 
>> organisms aligned with its genome in the multizNway alignments (the list 
>> varies from organism to organism).
>> 3) Given a clade, obtain the list of included organisms. 4) Given a set of 
>> genomic coordinates (bed file), retrieve the subset of MAFs intersecting 
>> these coordinates.
>> It would be event better if the method would allow the client to specify a 
>> subset of organisms for which the aligned sequences would be returned (much 
>> in the same way as the UCSC viewer allows to select a subset of organisms to 
>> be displayed in the multiz track). Many thanks for your help
>> Pr. Jacques van Helden
>> Université d'Aix-Marseille (AMU). Lab. Technological Advances for Genomics 
>> and Clinics (TAGC)
>> INSERM Unit U1090, 163, Avenue de Luminy, 13288 MARSEILLE cedex 09. France
>> Fax: +33 4 91 82 87 01
>> Web:  http://jacques.van-helden.perso.luminy.univmed.fr/
>> Email: [email protected]

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to