[
https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977439#comment-13977439
]
Dmitriy Lyubimov commented on MAHOUT-1518:
------------------------------------------
There's a set of fundamental differences as i see it.
(1) Table can have whatever cell type, Matrices have numeric (double).
(2) Dimensions in matrices are logically equal. In tables columns vs. rows most
of the time have very different implications (put it simply, tables are mostly
"tall skinny" structures).
(3) Internally, matrices represented by in-core math matrix blocks, data frames
are probably better be represented by columnar array fragments.
(4) In-core and DRM algebra operators are semantically homogenous. In-core and
data frames operations simply don't bear the same set of operations (e.g. R
doesn't define a "norm" of a data frame, or eigen decomposition of a data
frame).
(5) Data frames have a specific set of operators. e.g. going by dplyr package,
those are select(), filter(), mutate(), summarize(), merge() and group_by().
MLI provides a very similar semantic alphabet for MLTable.
(6) i don't beleive in transpose for data frames. I don't think R has such
operation, and even if it does, i never used it, and it is certainly not t().
MLI doesn't have such a notion either. More likely, a table is like a
relational table (which doesn't assume transpositions either). There are
concepts of data frame tabulation and cross-tabulation which are special case
of cube pivoting used in OLAP. While this type of UI-based online exploration
is useful, I don't want to touch these in this particular project -- there are
much more fit projects for this (I worked on one of those too).
> Preprocessing for collaborative filtering with the Scala DSL
> ------------------------------------------------------------
>
> Key: MAHOUT-1518
> URL: https://issues.apache.org/jira/browse/MAHOUT-1518
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 1.0
>
> Attachments: MAHOUT-1518.patch
>
>
> The aim here is to provide some easy-to-use machinery to enable the usage of
> the new Cooccurrence Analysis code from MAHOUT-1464 with datasets represented
> as follows in a CSV file with the schema _timestamp, userId, itemId, action_,
> e.g.
> {code}
> timestamp1, userIdString1, itemIdString1, “view"
> timestamp2, userIdString2, itemIdString1, “like"
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)