[
https://issues.apache.org/jira/browse/MAHOUT-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250307#comment-15250307
]
ASF GitHub Bot commented on MAHOUT-1833:
----------------------------------------
Github user dlyubimov commented on the pull request:
https://github.com/apache/mahout/pull/224#issuecomment-212522188
@resec there's authoritative, sort of, original doc in this branch.
https://github.com/apache/mahout/tree/gh-pages/doc
You may take it and submit a PR to it (it requires lyx/latex editor,
assuming you are on ubuntu).
Good thing is that you can take it and modify it with a usual pull request
process.
Bad things about doing it that way are that, first, it is sort of an
authored document I originally contributed from elsewhere (I can remove any
attributions if really becomes community-maintained).
But the real reason is, secondly, currently it really being migrated to ASF
CMS, and it needs to be changed here :
http://mahout.apache.org/users/environment/in-core-reference.html which
requires apache committer status.
So you really would need to change ASF CMS page and if you don't have
privileges to do it, somebody would have to do it on your behalf after this PR
is in.
> Enhance svec function to accepting cardinality as parameter
> ------------------------------------------------------------
>
> Key: MAHOUT-1833
> URL: https://issues.apache.org/jira/browse/MAHOUT-1833
> Project: Mahout
> Issue Type: Improvement
> Affects Versions: 0.12.0
> Environment: Mahout Spark Shell 0.12.0,
> Spark 1.6.0 Cluster on Hadoop Yarn 2.7.1,
> Centos7 64bit
> Reporter: Edmond Luo
>
> It will be nice to add one more wrapper function like below to
> org.apache.mahout.math.scalabindings
> {code}
> /**
> * create a sparse vector out of list of tuple2's with specific
> cardinality(size),
> * throws IllegalArgumentException if cardinality is not bigger than required
> cardinality of sdata
> * @param cardinality sdata
> * @return
> */
> def svec(cardinality: Int, sdata: TraversableOnce[(Int, AnyVal)]) = {
> val required = if (sdata.nonEmpty) sdata.map(_._1).max + 1 else 0
> if (cardinality < required) {
> throw new IllegalArgumentException(s"Cardinality[%cardinality] must be
> bigger than required[%required]!")
> }
> val initialCapacity = sdata.size
> val sv = new RandomAccessSparseVector(cardinality, initialCapacity)
> sdata.foreach(t ⇒ sv.setQuick(t._1,
> t._2.asInstanceOf[Number].doubleValue()))
> sv
> }
> {code}
> So user can specify the cardinality for the created sparse vector.
> This is very useful and convenient if user wants to create a DRM with many
> sparse vectors and the vectors are not with the same actual size(but with the
> same logical size, e.g. rows of a sparse matrix).
> Below code should demonstrate the case:
> {code}
> var cardinality = 20
> val rdd = sc.textFile("/some/file.txt").map(_.split(",")).map(line =>
> (line(0).toInt, Array((line(1).toInt,1)))).reduceByKey((v1, v2) => v1 ++
> v2).map(row => (row._1, svec(cardinality, row._2)))
> val drm = drmWrap(rdd.map(row => (row._1, row._2.asInstanceOf[Vector])))
> // All element wise opperation will fail for those DRM with not
> cardinality-consistent SparseVector
> val drm2 = drm + drm
> val drm3 = drm - drm
> val drm4 = drm * drm
> val drm5 = drm / drm
> {code}
> Notice that in the last map, the svec acceptted one more cardinality
> parameter, so the cardinality of those created SparseVector can be consistent.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)