[
https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004275#comment-14004275
]
Anand Avati commented on MAHOUT-1490:
-------------------------------------
[~dlyubimov] it is true that unless you iterate over the data multiple times,
type-compression (scaling,biasing, reducing bit-width) does not give a lot of
benefit. However, if random and mixed read/write is the expected access, the
overheads of inflation can be minimized by choosing a smaller Chunk size (which
will not worsen the compression.) Really depends on the use case of these
R-like data frame bindings in Mahout (of which I do not know much).
Type-compression apart, sparse compression is something which is probably still
applicable to just scale to larger dimensions.
Naive question - Are these "Data frame" bindings really for just interactive
use case? Or do we expect ML algos to be implemented on top of Data frames
(instead of just DRM/matrix)?
> Data frame R-like bindings
> --------------------------
>
> Key: MAHOUT-1490
> URL: https://issues.apache.org/jira/browse/MAHOUT-1490
> Project: Mahout
> Issue Type: New Feature
> Reporter: Saikat Kanjilal
> Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
> Original Estimate: 20h
> Remaining Estimate: 20h
>
> Create Data frame R-like bindings for spark
--
This message was sent by Atlassian JIRA
(v6.2#6252)