[ 
https://issues.apache.org/jira/browse/MAHOUT-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071016#comment-14071016
 ] 

ASF GitHub Bot commented on MAHOUT-1597:
----------------------------------------

Github user avati commented on the pull request:

    https://github.com/apache/mahout/pull/33#issuecomment-49808739
  
    Hmm, I think moving canHaveMissingRows to DrmRddInput should work. Unlike 
nrow and ncol which can signal an error, havemissingrows silently fixes it (i.e 
"take extra step" instead of "assert consistency"). So I don't think it has to 
be known upfront. fixIntConsistency() is anyways called within a physical 
operator - so we just need to guarantee that the physical operator can see a 
reliable canHaveMissingRows value.
    
    Since the plan is always evaluated bottom up at the physical layer, even if 
intermediate operators are optimized out by the logical optimizer, the flag 
still propagates DrmRddInput to DrmRddInput as long as the physical operators 
propagate it. So AewScalar would be seeing a trustable srcA.canHaveMissingRows.


> A + 1.0 (element-wise scala operation) gives wrong result if rdd is missing 
> rows, Spark side
> --------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1597
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1597
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.9
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>
>
> {code}
>     // Concoct an rdd with missing rows
>     val aRdd: DrmRdd[Int] = sc.parallelize(
>       0 -> dvec(1, 2, 3) ::
>           3 -> dvec(3, 4, 5) :: Nil
>     ).map { case (key, vec) => key -> (vec: Vector)}
>     val drmA = drmWrap(rdd = aRdd)
>     val controlB = inCoreA + 1.0
>     val drmB = drmA + 1.0
>     (drmB -: controlB).norm should be < 1e-10
> {code}
> should not fail.
> it was failing due to elementwise scalar operator only evaluates rows 
> actually present in dataset. 
> In case of Int-keyed row matrices, there are implied rows that yet may not be 
> present in RDD. 
> Our goal is to detect the condition and evaluate missing rows prior to 
> physical operators that don't work with missing implied rows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to