my users start to take comparability serious and start to downsample.
But it seems like the random_lines_two_pass.py tool is very slow with large
input files, like bed files with 40million reads to e.g. 33million reads
I don't understand the rationale behind the deletion of the positions from the
array, in most programming languages deletion from an array is slow.
Benchmarking the two random sampling methods was too difficult for me, I
removed the get_random_by_subtraction method,
and my users are happy.
Did anybody really benchmark this?
thank you very much,
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at: