[ 
https://issues.apache.org/jira/browse/MAHOUT-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213618#comment-15213618
 ] 

Andrew Palumbo commented on MAHOUT-1817:
----------------------------------------

I merged #203 as a workaround for this.  I changing the {{checkpoint}} call to 
throw an exception for anything other than {{CacheHint.DISK_ONLY}} or 
{{CacheHint.NONE}} but some of the tests must have other cache hints so I had 
left it as is.  There's still probably room for improvement here.  But this 
should work for now.

As Implemented now, a {{checkpoint()}} call on a Drm will trigger a logical 
optimization and then create a physical Flink plan.  After the plan is created, 
the {{cache}} call will assign a name to the Drm (if {{cache}} has not been 
called before) persist the backing {{DataSet}} to either the directory as 
specified in the {{taskmanager.tmp.dirs}} properties if set in a 
{{$MAHOUT_HOME/conf/flink-config.yaml}} file or to {{/tmp}} if no file exists 
or if the property is not set.  This will trigger the evaluation of the 
physical plan.  The backing {{DataSet}} is then re-read from the given 
directory, wrapped into a Drm and returned by the {{cache}} function.

If {{cache}} has already been called on this Drm, the dataset simply reads the 
previously persisted {{DataSet}} from the filesystem, wraps that into a Drm and 
returns it.

There is some question about how to reset the parallelism degree of the cached 
{{DataSet}} remaining.

> Implement caching in Flink Bindings
> -----------------------------------
>
>                 Key: MAHOUT-1817
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1817
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Flink
>    Affects Versions: 0.11.2
>            Reporter: Andrew Palumbo
>            Assignee: Andrew Palumbo
>            Priority: Blocker
>             Fix For: 0.12.0
>
>
> Flink does not have in-memory caching analogous to that of Spark.  We need 
> find a way to honour the {{checkpoint()}} contract in Flink Bindings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to