[jira] [Commented] (MAHOUT-1660) Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

ASF GitHub Bot (JIRA) Thu, 11 Jun 2015 12:22:38 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582402#comment-14582402
 ]


ASF GitHub Bot commented on MAHOUT-1660:
----------------------------------------

Github user andrewmusselman commented on a diff in the pull request:

    https://github.com/apache/mahout/pull/135#discussion_r32257679
  
    --- Diff: 
math-scala/src/main/scala/org/apache/mahout/math/drm/DistributedEngine.scala ---
    @@ -73,20 +75,39 @@ trait DistributedEngine {
       def drmDfsRead(path: String, parMin: Int = 0)(implicit sc: 
DistributedContext): CheckpointedDrm[_]
     
       /** Parallelize in-core matrix as spark distributed matrix, using row 
ordinal indices as data set keys. */
    -  def drmParallelizeWithRowIndices(m: Matrix, numPartitions: Int = 1)
    -      (implicit sc: DistributedContext): CheckpointedDrm[Int]
    +  def drmParallelizeWithRowIndices(m: Matrix, numPartitions: Int = 
1)(implicit sc: DistributedContext):
    +  CheckpointedDrm[Int]
     
       /** Parallelize in-core matrix as spark distributed matrix, using row 
labels as a data set keys. */
    -  def drmParallelizeWithRowLabels(m: Matrix, numPartitions: Int = 1)
    -      (implicit sc: DistributedContext): CheckpointedDrm[String]
    +  def drmParallelizeWithRowLabels(m: Matrix, numPartitions: Int = 
1)(implicit sc: DistributedContext):
    +  CheckpointedDrm[String]
     
       /** This creates an empty DRM with specified number of partitions and 
cardinality. */
    -  def drmParallelizeEmpty(nrow: Int, ncol: Int, numPartitions: Int = 10)
    -      (implicit sc: DistributedContext): CheckpointedDrm[Int]
    +  def drmParallelizeEmpty(nrow: Int, ncol: Int, numPartitions: Int = 
10)(implicit sc: DistributedContext):
    +  CheckpointedDrm[Int]
     
       /** Creates empty DRM with non-trivial height */
    -  def drmParallelizeEmptyLong(nrow: Long, ncol: Int, numPartitions: Int = 
10)
    -      (implicit sc: DistributedContext): CheckpointedDrm[Long]
    +  def drmParallelizeEmptyLong(nrow: Long, ncol: Int, numPartitions: Int = 
10)(implicit sc: DistributedContext):
    +  CheckpointedDrm[Long]
    +
    +  /**
    +   * Convert non-int-keyed matrix to an int-keyed, computing optionally 
mapping from old keys
    +   * to row indices in the new one. The mapping, if requested, is returned 
as a 1-column matrix.
    +   */
    +  def drm2IntKeyed[K: ClassTag](drmX: DrmLike[K], computeMap: Boolean = 
false): (DrmLike[Int], Option[DrmLike[K]])
    +
    +  /**
    +   * (Optional) Sampling operation. Consistent with Spark semantics of the 
same.
    +   * @param drmX
    +   * @param fraction
    +   * @param replacement
    +   * @tparam K
    +   * @return
    +   */
    +  def drmSampleRows[K: ClassTag](drmX: DrmLike[K], fraction: Double, 
replacement: Boolean = false): DrmLike[K]
    +
    +  def drmSampleKRows[K: ClassTag](drmX: DrmLike[K], numSamples:Int, 
replacement:Boolean = false) : Matrix
    --- End diff --
    
    Why does this return a Matrix whereas the previous one returns DrmLike[K], 
and is there a default number of samples in the previous one?


> Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
> ----------------------------------------------------------
>
>                 Key: MAHOUT-1660
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1660
>             Project: Mahout
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.10.0
>            Reporter: Suneel Marthi
>            Assignee: Dmitriy Lyubimov
>            Priority: Minor
>             Fix For: 0.10.2
>
>
> Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop configuration from 
> Context and not ignore it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAHOUT-1660) Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

Reply via email to