On 02/23/2015 11:05 AM, Leonard Goldstein wrote:
Hi Michael and Thomas,

I ran into the same problem in the past (i.e. when I started working
with functions like scanBam I expected them not to return the same
alignment multiple times)

One thing to consider might be that returning alignments multiple
times is consistent with the behavior of the samtools view command.
Quoting from the samtools manual:

“Important note: when multiple regions are given, some alignments may
be output multiple times if they overlap more than one of the
specified regions.”

Thanks Leonard for pointing this out. This is indeed the reason why all
the functions in Rsamtools and GenomicAlignments that take a 'which'
argument (via a ScanBamParam object) treat it "the samtools way", that
is, as a vector of meaningful loci.

Most of them track the index of the individual loci via a "which_label"
metadata column. See for example Rsamtools::pileup() and all the
readGAlignment*() functions in the GenomicAlignments package.
FWIW the man page for the readGAlignment*() functions contains the
following note:

     Note that a given record is loaded one time for each region it
     belongs to (this is a scanBam() feature, readGAlignments()
     is based on scanBam()).

but maybe this should be emphasized a little bit more.

Cheers,
H.


Maybe there is an argument for keeping things consistent with
samtools? As you said, if documented properly, the user can decide
whether to reduce regions specified in which or not.

Leonard


On Mon, Feb 23, 2015 at 10:52 AM, Michael Lawrence
<lawrence.mich...@gene.com> wrote:
We should at leaast try to avoid surprising the user. Seems like most
people expect "which" to be a simple restriction, so I think for now I will
just reduce the which, and if someone has a use case for separate queries,
we can address it in the future.

On Mon, Feb 23, 2015 at 10:41 AM, Thomas Sandmann <sandmann.tho...@gene.com>
wrote:

Personally, I don't have a use case with "meaningful loci" worth tracking,
so keeping it simple would work for me.

Incidentally, would it be good to deal with the 'which' parameter in a
consistent way across different methods ? I just saw this recent post on
the mailing list in which a used got confused by duplicate counts returned
after passing 'which' to scanBamParam:

https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html


---

Thomas Sandmann, PhD
Computational biologist

Genentech, Inc.
1 DNA Way
South San Francisco, CA 94080
USA

Phone: +1 650 225 6273
Fax: +1 650 225 5389
Email: sandmann.tho...@gene.com

"If a man will begin with certainties, he shall end in doubts; but if he
will be content to begin with doubts he shall end in certainties." -- Sir
Francis Bacon


On Mon, Feb 23, 2015 at 10:37 AM, Michael Lawrence <
lawrence.mich...@gene.com> wrote:

We just have to decide which is the more useful interpretation of which
-- as a simple restriction, or as a vector of meaningful locii, which will
be analyzed individually? I would actually favor the first one (the same as
yours), just because it's simpler. To keep track of the query ranges, we
would need to add a new column to the returned object, which will more
often than not just be clutter. I guess we could introduce a new parameter,
"reduceWhich" which defaults to TRUE and reduces the which. If FALSE, it
instead adds the column mapping back to the original which ranges.


On Sun, Feb 22, 2015 at 2:36 PM, Thomas Sandmann <
sandmann.tho...@gene.com> wrote:

Hi Michael,

ah, I see. I hadn't realized that returning the pileups separately for
each region could be a desired feature, but that makes sense. I agree, as
it is easy for the user to 'reduce' the ranges beforehand your first option
(e.g. returning the ID of the range) would be more flexible.

Perhaps you would consider adding a sentence to the documentation of
'which' on BamTallyParam's help page explaining that users might want to
'reduce' their ranges beforehand if they are only interested in a single
tally for each base ?

Thanks a lot !
Thomas





         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to