Re: [Bioc-sig-seq] the as.matrix method of the RangesMatchingList

Michael Lawrence Thu, 14 May 2009 06:04:24 -0700

On Thu, May 14, 2009 at 5:19 AM, Michael Lawrence <[email protected]>wrote:


>
>
> On Thu, May 14, 2009 at 5:07 AM, Nicolas Delhomme <[email protected]>wrote:
>
>> Hi Michael,
>>
>> Thanks a lot. This is working very nicely. However, the user has to pay
>> attention to the fact that it's query and subject are ordered in the same
>> way to properly use the index generated.
>>
>> For example, my subject has the names: "2L" "2R" "3L" "3R" "4"  "X" and
>> the query: "4"  "X"  "3R" "2R" "2L" "3L". The overlap function takes care of
>> this and compare the right spaces. It returns a RangesMatchingList with
>> names:
>> "4"  "X"  "3R" "2R" "2L" "3L". This means that when I export the result as
>> a matrix, the indices will be corrupted.
>>
>
> Yes, I noted this in the documentation. I am not sure I would say the
> results are "corrupted".
>
>
>>
>> I can think of two solutions:
>> Either, there should be a warning emitted (when doing the overlap if the
>> names are not ordered the same)
>> Or, and that would be my preferred solution, have an additional slot in
>> the RangesMatchingList holding the mapping index from the query names to the
>> subject names. This could then be used by the as.matrix method to return the
>> "correct" indices. This should make it safe for the case where the user does
>> not provide a query and a subject ordered in the same way. And it should be
>> robust to the cases where the query and subject spaces are not entirely
>> identical.
>>
>
> I was thinking of doing something like this. Thanks for giving me the
> motivation to actually do it.
>

Just checked this into the devel branch (trunk).


>
> Michael
>
>
>>
>> Well, this is just my two cents' worth as I'm not (yet) so familiar with
>> the code.
>>
>> Best,
>>
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>>
>> High Throughput Functional Genomics Center
>>
>> European Molecular Biology Laboratory
>>
>> Tel: +49 6221 387 8426
>> Email: [email protected]
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>> ---------------------------------------------------------------
>>
>>
>>
>> On 13 May 2009, at 22:20, Michael Lawrence wrote:
>>
>>
>>>
>>> On Wed, May 13, 2009 at 7:05 AM, Nicolas Delhomme <[email protected]>
>>> wrote:
>>> Hi all,
>>>
>>> I've got the impression that the as.matrix method of the
>>> RangesMatchingList does not work as it should.
>>>
>>> I have a RangesMatchingList which I obtained by using the overlap (from
>>> the RangesList class) function that takes two RangesList as input. When I
>>> apply as.matrix() on the RangesMatchingList, it gives me the following
>>> error:
>>>
>>> Error in .Method(..., deparse.level = deparse.level) :
>>>  number of rows of matrices must match (see arg 2)
>>>
>>> The function is pretty easy:
>>>
>>> setMethod("as.matrix", "RangesMatchingList", function(x) {
>>>  cbind(space = space(x), do.call(cbind, lapply(x, as.matrix)))
>>> })
>>>
>>> When I replace the cbind in the do.call by an rbind, it's already better
>>>
>>> Thanks, yes this was a bug. As the documentation states,
>>> RangesMatchingList was considered experimental and not something that was
>>> really tested. But I should have done a better job.
>>>
>>>
>>> Warning message:
>>> In .Method(..., deparse.level = deparse.level) :
>>>  number of rows of result is not a multiple of vector length (arg 1)
>>>
>>> This is due to the fact that space(x) returns many more spaces than there
>>> are overlaps.
>>>
>>> This is a bug in space().
>>>
>>>
>>> I could solve that by changing the function into:
>>>
>>> setMethod("as.matrix", "RangesMatchingList", function(x) {
>>>  do.call(rbind,lapply(c(1:length(x)),function(i){mat <-
>>> as.matrix(x[[i]]);cbind(space=rep(names(x)[[i]],nrow(mat)),mat)}))
>>> })
>>>
>>> Now, I do not know if I might have a particular use-case (having a
>>> RangesMatchingList coming from the RangesList overlap function) that you
>>> guys did not think of.
>>>
>>> It turns out that I had to rethink this method. As above, the user will
>>> receive a character matrix, which probably isn't very useful. Could
>>> translate the space names into integer IDs, but in order to use that, one
>>> would have to split the matrix and loop over each block. In that case, it
>>> would just be easier to loop over the RangesMatchingList. Thus, I changed
>>> the function to return a doublet matrix, just like RangesMatching, where the
>>> indices are adjusted so that they are aligned with the result of calling
>>> 'unlist' on the subject and query RangesLists (ie the index is global). I
>>> think this will satisfy more use cases, but I'm not sure.
>>>
>>> These changes were applied in both trunk and release.
>>>
>>> Thanks for the feedback, and I'd appreciate more if you have any,
>>> Michael
>>>
>>>
>>> Just let me know,
>>>
>>> Best,
>>>
>>> ---------------------------------------------------------------
>>> Nicolas Delhomme
>>>
>>> High Throughput Functional Genomics Center
>>>
>>> European Molecular Biology Laboratory
>>>
>>> Tel: +49 6221 387 8426
>>> Email: [email protected]
>>> Meyerhofstrasse 1 - Postfach 10.2209
>>> 69102 Heidelberg, Germany
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> [email protected]
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>
>>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] the as.matrix method of the RangesMatchingList

Reply via email to