[ 
https://issues.apache.org/jira/browse/MAHOUT-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071043#comment-14071043
 ] 

ASF GitHub Bot commented on MAHOUT-1597:
----------------------------------------

Github user avati commented on the pull request:

    https://github.com/apache/mahout/pull/33#issuecomment-49810561
  
    OK. If we were looking at just ensuring correctness in the spark engine
    then it could have been restricted to just physical operators and
    DrmRddInput to propagate up the missingness-flag in the RDD it represents.
    However, If we are looking at doing cost estimation in the optimizer in the
    future with this info, then you may want to put it into DrmLike.
    
    +1


> A + 1.0 (element-wise scala operation) gives wrong result if rdd is missing 
> rows, Spark side
> --------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1597
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1597
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.9
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>
>
> {code}
>     // Concoct an rdd with missing rows
>     val aRdd: DrmRdd[Int] = sc.parallelize(
>       0 -> dvec(1, 2, 3) ::
>           3 -> dvec(3, 4, 5) :: Nil
>     ).map { case (key, vec) => key -> (vec: Vector)}
>     val drmA = drmWrap(rdd = aRdd)
>     val controlB = inCoreA + 1.0
>     val drmB = drmA + 1.0
>     (drmB -: controlB).norm should be < 1e-10
> {code}
> should not fail.
> it was failing due to elementwise scalar operator only evaluates rows 
> actually present in dataset. 
> In case of Int-keyed row matrices, there are implied rows that yet may not be 
> present in RDD. 
> Our goal is to detect the condition and evaluate missing rows prior to 
> physical operators that don't work with missing implied rows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to