[
https://issues.apache.org/jira/browse/MAHOUT-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186096#comment-15186096
]
ASF GitHub Bot commented on MAHOUT-1802:
----------------------------------------
GitHub user andrewpalumbo opened a pull request:
https://github.com/apache/mahout/pull/185
MAHOUT-1802: Capture attached checkpoints (if cached)
Currently, the optimizer generates checkpoints and attaches them to actual
logical elements of the DAG via CheckpointAction$cp.
ie:
```
drmC = drmA+ drmB
val cp1 = drmC.checkpoint() // checkpoint
val cp2 = drmC.checkpoint() // cp2 == cp1
drmD = cp1 + drmE // cp1 + drmE
```
but, in:
`
drmD = drmC + drmE // computes drmA + drmB + drmC all over`
`drmC` already has` cp1` attached to it so we should assume the common
computational path is the intent here regardless and should be used, instead of
building plans that recompute it. That is,
`drmD = drmC + drmE` should imply `cp1 + drmE `as well even if checkpoint
is not used explicitly.
This PR allows us to avoid excessive declarations like
```
drmAcp = drmA.checkpoint
drmB = drmAcp %*%...
```
and instead just use
```
drmA.checkpoint()
drmB = drmA %*% ....
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andrewpalumbo/mahout MAHOUT-1802
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/mahout/pull/185.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #185
----
commit 8e28a6c41061a7f210e69804b977fbf3bff0fcc8
Author: Andrew Palumbo <[email protected]>
Date: 2016-03-08T23:04:32Z
Include CacheHint in CheckpointedDrm trait. Check for a logical
CheckpiointAction in physical translation and use its caching policy for the
physical checkpoint
commit 302d34c2b10ff882344e04b9f6f17d4cde3f676f
Author: Andrew Palumbo <[email protected]>
Date: 2016-03-08T23:10:38Z
Merge branch 'master' into MAHOUT-1802
----
> Capture attached checkpoints (if cached)
> -----------------------------------------
>
> Key: MAHOUT-1802
> URL: https://issues.apache.org/jira/browse/MAHOUT-1802
> Project: Mahout
> Issue Type: Improvement
> Affects Versions: 0.11.1
> Reporter: Andrew Palumbo
> Assignee: Andrew Palumbo
> Fix For: 0.11.2
>
>
> Currently, the optimizer generates checkpoints and attaches them to actual
> logical elements of the DAG via CheckpointAction$cp.
> the way it worsk today is as follows:
> {code}
> drmC = drmA+ drmB
> val cp1 = drmC.checkpoint() // checkpoint
> val cp2 = drmC.checkpoint() // cp2 == cp1
> drmD = cp1 + drmE // cp1 + drmE
> {code}
> but, in:
> {code}
> drmD = drmC + drmE // computes drmA + drmB + drmC all over
> {code}
> {{drmC}} already has {{cp1}} attached to it so we should assume the common
> computational path is the intent here regardless and should be used, instead
> of building plans that recompute it. That is,
> {{drmD = drmC + drmE}} should imply {{cp1 + drmE}} as well even if checkpoint
> is not used explicitly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)