[
https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305990#comment-15305990
]
ASF GitHub Bot commented on MAHOUT-1866:
----------------------------------------
Github user andrewpalumbo commented on a diff in the pull request:
https://github.com/apache/mahout/pull/237#discussion_r65007959
--- Diff:
math-scala/src/main/scala/org/apache/mahout/math/drm/package.scala ---
@@ -148,6 +148,42 @@ package object drm {
def drmSampleKRows[K](drmX: DrmLike[K], numSamples: Int, replacement:
Boolean = false): Matrix =
drmX.context.engine.drmSampleKRows(drmX, numSamples, replacement)
+ /**
+ * Convert a sampled DRM into a Tab Separated Vector (TSV) to be loaded
into an R-DataFrame
+ * for plotting and sketching
+ * @param drmX - DRM
+ * @param samplePercent - Percentage of Sample elements from the DRM to
be fished out for plotting
+ * @tparam K
+ * @return TSV String
+ */
+ def sampleMatrixToTSV[K](drmX: DrmLike[K], samplePercent: Double = 1):
String = {
+
--- End diff --
Minor point: maybe rename to `drmSampleToTSV` or something along those
lines so that it is obvious that its a DRM and not a Matrix? Other than that
+1 from me.
> Add matrix-to-tsv string function
> ---------------------------------
>
> Key: MAHOUT-1866
> URL: https://issues.apache.org/jira/browse/MAHOUT-1866
> Project: Mahout
> Issue Type: Sub-task
> Components: visiualization
> Affects Versions: 0.12.1
> Reporter: Trevor Grant
> Assignee: Suneel Marthi
> Fix For: 0.13.0
>
>
> Need a function to convert a matrix to a tsv string which can then be plotted
> by
> - Zeppelin %table visualization packages
> - Passed to R / Python via Zeppelin Resource Manager
> It has been noted that a matrix can be registered as an RDD and passed across
> contexts directly in Spark, however this breaks the 'backend agnoistic'
> philosophy. Until H20 and Flink also both support Python / R environments it
> is more reasonable to use tab-seperated-value strings.
> Further, matrices might be extremely large and unfit for being directly
> converted to tsvs. It may be wise to introduce some sort of safety valve for
> preventing excessively large matrices from being materialized into local
> memory (eg. supposing the user hasn't called their own sampling method on a
> matrix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)