On Tue, Jun 22, 2010 at 3:46 PM, Hiram Clawson <[email protected]> wrote:
> pslSort is used when you have 100's of thousands of psl files from
> a massive blat run on a supercomputer and you need to get all the
> results put back together into a single file.  It is sorted by
> chrom name (qName) and chromStart (qStart).  You can perform the same sort
> with
> the unix 'sort' command:
> sort -k10,10 -k12,12n
> which also functions in a two stage procedure in exactly the
> same manner.  See also:
> http://en.wikipedia.org/wiki/Sort_algorithm
>
> The sort works by making temporary sorted larger files in a temporary
> directory(stage 1)
> then continuing to put those files together into the final result (stage 2).

I'm still confused. Why a temp file is created? What is the difference
between the temp files and the final result files? Is the final file
just a combination of the temp files? I don't pinpoint where on the
wiki page mentions the "2 stages". Would you please be a little bit
specific on which part you are referring to?

> Please note the complete usage message explains this procedure:
>>
>> pslSort - merge and sort psCluster .psl output files
>> usage:
>>  pslSort dirs[1|2] outFile tempDir inDir(s)
>> This will sort all of the .psl files in the directories
>> inDirs in two stages - first into temporary files in tempDir
>> and second into outFile.  The device on tempDir needs to have
>> enough space (typically 15-20 gigabytes if processing whole genome)
>>  pslSort g2g[1|2] outFile tempDir inDir(s)
>> This will sort a genome to genome alignment, reflecting the
>> alignments across the diagonal.
>>
>> Adding 1 or 2 after the dirs or g2g will limit the program to
>> only the first or second pass repectively of the sort

This explanation is confusing to me. It seems that " pslSort g2g[1|2]
outFile tempDir inDir(s)" does the sorting just like "sort -k10,10
-k12,12n". But what does "pslSort dirs[1|2] outFile tempDir inDir(s)"
option do?

-- 
Regards,
Peng

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to