On Tue, Jun 22, 2010 at 3:46 PM, Hiram Clawson <[email protected]> wrote: > pslSort is used when you have 100's of thousands of psl files from > a massive blat run on a supercomputer and you need to get all the > results put back together into a single file. It is sorted by > chrom name (qName) and chromStart (qStart). You can perform the same sort > with > the unix 'sort' command: > sort -k10,10 -k12,12n > which also functions in a two stage procedure in exactly the > same manner. See also: > http://en.wikipedia.org/wiki/Sort_algorithm > > The sort works by making temporary sorted larger files in a temporary > directory(stage 1) > then continuing to put those files together into the final result (stage 2).
I'm still confused. Why a temp file is created? What is the difference between the temp files and the final result files? Is the final file just a combination of the temp files? I don't pinpoint where on the wiki page mentions the "2 stages". Would you please be a little bit specific on which part you are referring to? > Please note the complete usage message explains this procedure: >> >> pslSort - merge and sort psCluster .psl output files >> usage: >> pslSort dirs[1|2] outFile tempDir inDir(s) >> This will sort all of the .psl files in the directories >> inDirs in two stages - first into temporary files in tempDir >> and second into outFile. The device on tempDir needs to have >> enough space (typically 15-20 gigabytes if processing whole genome) >> pslSort g2g[1|2] outFile tempDir inDir(s) >> This will sort a genome to genome alignment, reflecting the >> alignments across the diagonal. >> >> Adding 1 or 2 after the dirs or g2g will limit the program to >> only the first or second pass repectively of the sort This explanation is confusing to me. It seems that " pslSort g2g[1|2] outFile tempDir inDir(s)" does the sorting just like "sort -k10,10 -k12,12n". But what does "pslSort dirs[1|2] outFile tempDir inDir(s)" option do? -- Regards, Peng _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
