Hi Adrian,
No, but sometimes the automatic detection of "outer distances" will fail if the
seeds are too short.
In that case, you just need to provide the information manually.
On Assemblathon-2/Bird:
Contigs >= 100 nt
Number: 88826
Total length: 1169161521
Average: 13162
N50: 41098
Median: 3368
Largest: 465622
Contigs >= 500 nt
Number: 68550
Total length: 1164709611
Average: 16990
N50: 41306
Median: 6862
Largest: 465622
Scaffolds >= 100 nt
Number: 47279
Total length: 1270995781
Average: 26882
N50: 567125
Median: 725
Largest: 3236250
Scaffolds >= 500 nt
Number: 27408
Total length: 1266700501
Average: 46216
N50: 571612
Median: 2137
Largest: 3236250
Sébastien
http://github.com/sebhtml/ray
> ________________________________________
> De : Adrian Platts [[email protected]]
> Date d'envoi : 3 octobre 2011 11:36
> À : Sébastien Boisvert
> Objet : Re: [Denovoassembler-users] Ray v1.7
>
> Hi Sebastien,
>
> Looking forward to trying 1.7. One question - when inputting illumina MP
> reads do you have any special command line option to tell the compiler these
> are mates and their orientation? Or do we rev-comp the ends ourselves to
> make them look like illumina PE orientations?
>
> Adrian
>
>
> On 2011-10-03, at 11:19 AM, Sébastien Boisvert wrote:
>
>> Dear assemblers,
>>
>>
>> Ray v1.7 is now available.
>>
>>
>> Summary of what changed:
>>
>> * MANUAL_PAGE.txt replaces the PDF manual.
>> * Output files are written to the directory specified by -o (previously it
>> was a file prefix)
>> * Round-robin reception of messages
>> * Bloom filter
>> * Illumina mate-pairs support
>> * Job checkpointing
>> * New scaffolding algorithm
>> * New assembly engine for the extension of seeds with mate-pairs (NovaEngine)
>> * Parallel file partitionning
>> * Network latency testing
>> * Compiles cleanly on 32-bit systems
>>
>>
>>
>> All the changes:
>>
>> v1.7 Mon Oct 3 10:42:01 2011 -0400 1 commits
>>
>> d28e76a Removed data files in unit tests.
>>
>> v1.7.0 Mon Oct 3 10:37:25 2011 -0400 6 commits
>>
>> 2943c44 Updated the manual page.
>> 1f712cd Fixed PATH problems in unit tests.
>> ddc513c Simplified the release procedure.
>> 41f9603 Fixed some PATH issues in system tests.
>> b388cfc Migrated the version in the Makefile.
>> 28b2a69 Added granularity summary for option -run-profiler.
>>
>> v1.7-rc2 Wed Sep 28 22:40:40 2011 -0400 7 commits
>>
>> f49a434 Added the compilation option CONFIG_CLOCK_GETTIME for the
>> profiler.
>> 95f2488 Remove expired reads from the list of unmated reads to reduce
>> the computation granularity.
>> 9bf6e69 Reduced the granularity in call_RAY_SLAVE_MODE_EXTENSION() by
>> cleaning expiration positions.
>> d705e55 Reduced the computation granularity for the code that computes
>> reverse-complement extensions.
>> f76e565 Added timer warnings with -run-profiler.
>> 0ee6fcc Added comments in the communication layer of Ray.
>> 57099e3 Disabled persistent communication in the round-robin reception.
>>
>> v1.7.0-rc1 Mon Sep 26 16:59:17 2011 -0400 58 commits
>>
>> f75bfa6 Removed inline code because compilers optimize the code anyway.
>> 6b345f9 Changed mpirun to mpiexec as mpiexec is in the standard.
>> ec32cf4 Merged the persistent communication layer with the round-robin
>> reception.
>> 714875d Implemented a round-robin algorithm for the reception of
>> messages.
>> 63a3131 Write raw data for network tests if -test-network-only is
>> provided.
>> dc196b2 Added option -write-network-test-raw-data.
>> fd3c033 Added a time period during which other messages are more
>> important than urgent messages.
>> 7c2d6f6 Fixed a bug in the code that loads the checkpoint GenomeGraph.
>> 58733de Enabled the communication optimizer for the network test too.
>> a08888b The option -show-communication-events now shows all messages
>> with overlays too.
>> c97ea8e Added a communication optimizer with urgent messages.
>> b63035e Added overlays for option -show-communication-events.
>> 9779ced Added option -show-communication-events.
>> 7619e84 Added option -show-read-placement.
>> d6b1f5d Added more details in the output of -run-profiler.
>> 6aa7b59 Removed dependency for clock_gettime.
>> 078148a Added option -debug-scaffolder.
>> 9181dd8 Added assertions and fixed a bug in GridTable.
>> 0df6381 Fixed some divisions by 0 in the scaffolder.
>> 0554af1 Regression bug on phix system test fixed.
>> 76b0b0d Fixed a bug in JoinerWorker in which two overlapping paths
>> would not be joined together.
>> bc7b314 Added some debugging information for -debug-fusions.
>> 783d522 Fixed a communication problem in MessageProcessor.
>> 4e2b599 The number of enabled MPI ranks can be changed during the
>> network test by changing a variable in the source code.
>> 95d825d Merge branch 'master' of [email protected]:sebhtml/ray
>> bdf49a2 Fixed an integer overflow in the computation of standard
>> deviations.
>> 53eb012 Fixed scripts to accomodate new prefix directory option.
>> 2828d85 Added more debugging information in the scaffolding test.
>> 2afa3cf Updated for ScaffoldLinks.txt format to v2.0.
>> 5cd9848 Added some documentation for Infiniband.
>> 1dab4e3 Restored the default number of words in the network test to
>> 500.
>> 21e312c Modified some scaffolding code to obtain the correct side of a
>> contig when it allows both.
>> a908567 Implemented a new greedy scaffolding algorithm as discussed
>> with François Laviolette.
>> 81be6d9 Added the standard deviation in ScaffoldLinks.txt
>> 32646f2 Added non-persistent MPI communication just to compare.
>> 06c7695 Added information in ScaffoldLinks.txt
>> f0fcdd3 Changed the default message size for network testing.
>> eb9f8c7 Limiting scaffolding links to vertices that have one parent,
>> one child and a coverage value near the peak.
>> b5a98d2 RayVersion and RayCommand are now written (a bug was
>> introduced).
>> 8dc6766 Fixed the code that counts the number of extended seeds.
>> 862b5fc Added option -write-contig-paths to write contig paths with
>> coverage values. This is enabled by default.
>> 1db2865 The checkpoint ContigPaths is now fully operational (read and
>> write).
>> edbfcd7 The checkpoint ContigPaths is now written on demand.
>> 9150bc1 Removed the minimum number of raw scaffolding links.
>> 77e4a01 Modified the scaffolder routines to check the vertex coverage
>> values in paths.
>> 18edfd0 Fixed the content of a displayed text in the fusion task
>> creator.
>> eaadc7c Modified the Sun Grid Engine job template to erase the
>> directory before running the whole thing.
>> 672b5f3 Changed --oneline to --pretty=oneline for compatibility with
>> older versions of git.
>> fdad4e9 Cleaned some code in the task creator routines for edge
>> purging.
>> b9fe00b Cleaned some code in the task creator routines for edge
>> purging.
>> e6a4fd8 Fixed a bug in the virtual processor wherein it was not
>> force-flushing messages when needed.
>> 32c1384 Corrected the number of flowed vertices in the seed extension.
>> 6538859 Improved heuristics for selection.
>> 221e7b4 Implemented the reverse strand case in JoinerWorker.
>> 2754a54 Added 3 unit tests for NovaEngine and improved the heuristics.
>> f704b0f Corrected positions in JoinerWorker when on the other strand.
>> ef06930 The default is now ASSERT=y in the Makefile.
>> e77388e All output files are written in a directory provided with
>> option -o.
>>
>> v1.7.0-beta1 Tue Sep 6 12:02:43 2011 -0400 50 commits
>>
>> db6b8d3 reset() must be called in the constructor.
>> d946b29 Fixed compilation warning.
>> 8200a28 Merge branch 'master' of github.com:sebhtml/ray
>> 78af599 Adding new files in Documentation/.
>> c6fb5d0 Added INSTALL.txt.
>> 151028b Migrated some code only utilised by the scaffolder.
>> 345d267 Added \author tag to all classes.
>> 952e1a2 Updated Documentation files.
>> 068ce0e Added VirtualProcessor initialization.
>> 8a2be4a Removed MyForest and its iterator minion.
>> 4eebe61 Introducing the VirtualProcessor class.
>> 4897465 Fixed compilation warnings for 32-bit systems.
>> 91799e1 Fixed an argument name.
>> 1039128 Added documentation for the network latency.
>> f11cb16 Added documentation for the virtual processor.
>> 8152dbb Added documentation for the virtual communicator.
>> 7e04a0a Fixed compilation warnings.
>> e5f0ac7 Joiner software stack now joins otherwise un-joined paths in
>> the distributed graph.
>> e53c0cf Now printing hit information in JoinerWorker.
>> 3d49b71 Added selected hit in standard output for JoinerWorker.
>> 1ef78fb Updated a threshold in FusionWorker.
>> 9595b0d Added debugging information in JoinerWorker.
>> 441d133 Added Joiner code.
>> b3b8dde Disabled the reverse-complement copies of extensions.
>> e42c2d1 Workers push virtual messages, not real messages.
>> 80d8fc8 Fixed a state-machine bug in TaskCreator/FusionTaskCreator.
>> ae4af42 Fixed a machine-state bug in FusionWorker.
>> 15fc7d1 Added an AUTHORS file.
>> 6a5954b Changed the default algorithm in VirtualProcessor -- now using
>> a minimum work unit.
>> 3cbc5fd Added some debugging information for FusionTaskCreator.
>> 45280ff Removed OperatingSystem dependency in unit tests.
>> f778c5b Implemented a new better and simpler merger module --
>> FusionTaskCreator/FusionWorker.
>> 57753e7 Fixed some unit tests by moving scaffolder methods.
>> 84656ae Using the VirtualProcessor for edge purge.
>> c801302 Added some debugging information for fusions.
>> 53085bb Restored worker codes.
>> d05a902 Added interface Worker for worker classes.
>> 3547950 Added method hasWorkToDo to VirtualProcessor.
>> 75f29f1 Added debugging messages in FusionData.
>> b826e68 Added TaskCreator and Merger classes.
>> 173a2f3 Removed hard-coded parameter -debug-fusions.
>> 778e0d2 Changed the maximum number of cycles to 16 in merging code.
>> 10d3346 Merge branch 'master' of [email protected]:sebhtml/ray
>> 1048b06 Added scaffolder cases in Documentation/
>> 45ecca1 The ChangeLog file will not be maintained anymore, use
>> ./scripts/dump-ChangeLog.sh
>> b01fe11 Added option -version to Ray.
>> 58e1a6d Modified the behavior of Ray when fusions are generated.
>> 26ee554 Updated the path to Ray in system tests.
>> c920e1e Fixed a compilation warning.
>> 7dea079 Added a function to create directories.
>>
>> v1.6.2-rc2 Wed Aug 24 20:50:07 2011 -0400 71 commits
>>
>> 70c8e92 Changed where is written the binary Ray.
>> b14a5b7 Removed the manual target from the Makefile.
>> a5ff5c4 Added a Documentation directory.
>> 6b6fd7c Removed logo from source.
>> fa809fa Testing symbolic links.
>> 6a290ce Added additional debugging information.
>> b6aa616 Restored original state.
>> 7f8708f Added an explicit flush.
>> 1a11ee5 Added checkpoint Sequences.
>> ffef055 Updated the ouput of -help option.
>> 89ed32b Added option -debug-fusions.
>> 5e09388 Updated the MANUAL.
>> 797baef Added -read-write-checkpoints in the changes.
>> d15a0ca Added gmane link in the README
>> 98dbbd4 Removed unused scripts.
>> 0034730 Skip a seed if within it during flow 1 and a vertex is already
>> processed.
>> 37aa1ff Limiting seeds to probably unique vertices.
>> 03cb71f Don't write a checkpoint if it was just read.
>> dcf3eb6 Added a file describing checkpoints.
>> ee971f5 Read checkpoint before writing it.
>> aec738a Changed 1 hash function because it was a copy.
>> 6c4fac9 Fixed a hanging problem.
>> 0094fba Added checkpoint Extensions.
>> 5dd5470 Added checkpoint Partition.
>> ddf2125 Fixed a bug when no sequence files are provided.
>> 5a7b56e Improved checkpointing message.
>> a6596ea Added option -test-network-only to only to test the network
>> and return.
>> a72d0ec Improved checkpointing messages.
>> 6d29d43 Fixed a messaging bug occuring very rarely.
>> e2a2bb6 Added a MANUAL file.
>> 218546f Checkpoint files are now written in a binary format.
>> f63590a Checkpoints are now operational.
>> adf7222 -read-checkpoints works with checkpoints
>> <CoverageDistribution>, <GenomeGraph> and <Seeds>.
>> 890ba11 Option -write-checkpoints writes all checkpoints.
>> 4822527 Added options -read-checkpoints and -write-checkpoints, this
>> is still in development.
>> 23f7218 Preparing code for a change.
>> 60d1ebe Reduced the number of messages with tag
>> RAY_MPI_TAG_REQUEST_VERTEX_COVERAGE in SeedExtender.cpp
>> c510d96 Added tag counts for option -run-profiler.
>> 63beff4 Fixed a display problem.
>> 1e3490f Modified the order of the steps performed when merging
>> identical paths.
>> 5fc8c13 Changed the prototype of
>> VirtualCommunicator::getMessageResponseElements.
>> 237339a Only send a RAY_MPI_TAG_ASK_IS_ASSEMBLED message if starting
>> on a seed on flow 1.
>> 6fa20a9 Don't fetch read markers when not needed, use less memory to
>> know is a vertex was assembled.
>> 7c2a084 Added skipping events.
>> d4e5a25 Fixed N50 when there is only 1 scaffold.
>> c17c1fd RayCommands file is written correctly now.
>> 3ece690 Ray merger will merge more things now.
>> b6e342c Fixed a segmentation fault that occurs in rare cases.
>> 6208303 Fixed a bug in the scaffolder, now more vertices should be
>> investigated.
>> eb13301 Changed the precision of things that go together.
>> 0f78f4e Fixed which arguments are picked up by opcodes -p and -i.
>> dd5c796 Input opcodes are now shuffled before being utilised.
>> 44d2bc8 Extension of seeds if done endlessly until growth stops.
>> ac01c27 Modified NovaEngine to pass a new unit test (as well as the
>> old unit tests).
>> f23b52d Changed the default number of persistent requests.
>> fe05cee Added list of working C++ compilers.
>> 6a1854a The paired read simulator is now a separate project, see
>> https://github.com/sebhtml/paired-read-simulator
>> 6ccac7a Flag invalid choices before doing the actual selection.
>> 82f6107 Fixed a compilation Warning.
>> 5a99139 Fixed a integer overflow.
>> c63ba4a Fixed a compilation warning with gcc.
>> 0d1aab0 Fixed a compilation warning with Intel compiler.
>> f84d308 ExtensionElement objects now contains reads in 2-bit format.
>> ec0f6d1 Fixed a memory problem in the computation of optimal read
>> markers.
>> 5a7c941 Implemented a Bloom filter to reduce the memory usage. This is
>> ridiculously good and the false positive rate has no effect whatsoever on
>> Ray thanks to the KmerAcademy.
>> 27f24d5 Added a few things in the README.
>> a1233c0 Fixed a recently introduced regression in heuristics (should
>> not choose an invalid choice).
>> ed67f25 Updates in the README.
>> 83f0e0a Fixed the maximum length of input reads to 65535.
>> dbba521 Added some documentation files.
>> 40d1d8a Fixed a bug.
>>
>> v1.6.2-rc1 Wed Aug 3 13:39:46 2011 -0400 87 commits
>>
>> c9aa372 fixing a segfault when no contigs are found
>> e6c66af Added N50, median, average and largest contig and scaffold
>> lengths in PREFIX.OutputNumbers.txt
>> 1aac7a6 Removed the coverage threshold from the algorithm that finds
>> seeds in the distributed graph. Suggested by David Eccles(gringer) from
>> Max-Planck-Gesellschaft, München.
>> a6244b0 Modified the selection engine to that the new unit tests also
>> pass.
>> 0811707 Removed email of contributor.
>> 91f3251 Added David Eccles in the README
>> 798f262 Added debugging option -show-distance-summary.
>> 34d1af2 New development option: -show-distance-summary.
>> 454d5ee Added a test for open addressing.
>> bffce62 Merging of similar paths has been modified.
>> 3b83d6c Added read placement freezing.
>> dac7de5 Modified the extension algorithm to avoid collapsing of
>> repeated k-mers that are near each other in the genome.
>> c667133 Added a unit test and fixed NovaEngine to handle it.
>> 7a3443c Improved the manual.
>> d842360 Added a section on how to launch Ray in the manual.
>> 9c449be Fixed a compilation warning.
>> 182fe2c Added some unit tests for the Ray NovaEngine (for mate-pair
>> reads). Seems to work quite well so far.
>> a5e631b New option: -write-seeds which is useful for debugging the
>> code.
>> 1902636 Only use NovaEngine when paired information is available.
>> 91359b0 Added options -use-NovaEngine and -show-NovaEngine for
>> debugging purposes.
>> 8bc58aa Fixed compilation errors with HAVE_LIBZ=y and HAVE_LIBBZ2=y
>> af7557e New output files: PREFIX.SequencePartition.txt and
>> PREFIX.NumberOfSequences. Improved the content shown with -help.
>> 9598fda Now using the NovaEngine.
>> ddeca62 Fixed a bug in the NovaEngine.
>> 13e5e76 Removed by default the reporting of libraries in stdout.
>> 5e5edae Option use:NovaEngine enables experimental NovaEngine.
>> 73c61bd Fixed a bug in the recently introduced peer-to-peer parallel
>> Partitioner.
>> 0bbd8d3 Removed options in 2 system tests.
>> 1c53c6e Added comments at random places.
>> 3696be0 Added 2 scripts for code editing.
>> 1c135af Disabling by default the experimental NovaEngine. Results so
>> far are promising !
>> e1efa6e Fixed a compilation warning.
>> 3fe9fab Removed roughly half of the messages with the MPI tag
>> RAY_MPI_TAG_KMER_ACADEMY_DATA.
>> 4b3639f Added a unit test and modified CoverageDistribution.cpp to
>> handle low-coverage datasets.
>> b76e9dd * Added information on unknown nucleotides in the instruction
>> manual.
>> 9847b9e Added a TODO item.
>> 20ad18f Added an abstraction layer for the operating system.
>> 9706603 Improved the README for system tests.
>> 4372845 Only show nova choices if -show-extension-choice is provided.
>> 67a87ed Improved the document about patching.
>> 231dbc9 Added a file describing how to submit a patch.
>> 88b13a0 Unit tests are now files named test_<test_name>.sh
>> 43200b5 Fixed a typo.
>> efda9d9 Moved Kmer routines in the class Kmer.
>> 207a193 Added an entry in the changelog.
>> 19b3905 Added a symbolic link.
>> f7addb4 Added 35 unit tests.
>> e704aa1 Simplified the algorithm that finds peak and added 35 unit
>> tests to test it (for various datasets).
>> 1a8b0e0 Improved the NovaEngine according to unit tests.
>> 412052b Improved the unit tests for the NovaEngine.
>> 862e6dd Removed some messages.
>> 1bd4d5a Removed assertion.
>> 4471c37 Improved the NovaEngine, but not using it yet. It needs more
>> testing.
>> 668ad2f Added a unit test for the NovaEngine.
>> 9c8b02d New experimental heuristics: The Ray NovaEngine.
>> 1ef868d Created an heuristics module and moved related bits in it.
>> cd8c1af Improved the peak finder when there is no deviation.
>> 15ea481 Fixed a compilation error.
>> a2470f2 Moved the configuration of the virtual communicator in
>> Machine.cpp
>> 810187d Corrected a code comment.
>> 982ff04 Updated the coding style.
>> f7d9087 Added a coding style file.
>> 3f15314 Don't compute or update peaks for libraries with
>> manually-provided information.
>> d386173 When the extension is finished, show library peak usage.
>> 952dfb8 Don't print the tree.
>> 5a49add Changed to 64 slots.
>> 8c15d79 Corrected the peak finder.
>> fdc1bde Merge branch 'master' of [email protected]:sebhtml/ray
>> 2b5f716 Changed the behavior for repeats.
>> 0d6565e Fixed a system test.
>> 857c207 Fixed some compilation warning with gcc 4.1.2.
>> 4bb20ba Added an entry in the change log for 1.6.2.
>> 5fab076 Now working on v1.6.2
>> 8354455 Fixed a compilation errors due to the algorithm library.
>> 8ffee2b Merge branch 'master' of github.com:sebhtml/ray
>> 29ab020 Fixed a bug in the incremental resizing algorithm of
>> MyHashTable.
>> bacc0c7 Fixed comments and assertions for new correct code.
>> d48fa33 Fixed an implementation bug for double hashing.
>> c9e8a5e Added a TODO item.
>> 8549758 File partitioning is now performed in parallel.
>> 9825ecf Implemented parallel file partitioning.
>> 4a4b739 Don't set NSLOTS if already defined.
>> d652a1c Now Ray uses the maximum peak of a paired library to compute
>> the expiry position of a read.
>> 065df3a Choosing the good peak for a paired library if a mate is
>> already available.
>> bd324ea Now selecting the correct peak to choose the next vertex.
>> 7a72aac Ray can find more than one peak in any paired library.
>> 7d92f06 Ported the prototype for finding peaks from Python to C++.
>>
>>
>> Sébastien
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Denovoassembler-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users