[ 
https://issues.apache.org/jira/browse/MAHOUT-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602510#comment-14602510
 ] 

Alexey Grigorev commented on MAHOUT-1755:
-----------------------------------------

Yes each time something like rowSums is executed, a Flink job is created, 
executed, and the results are returned. This is kept in memory and there are no 
problems with that.  

Flushing is needed for checkpointing, but also when results of one calculation 
are needed in another, or for iterations. For example, here: 
https://github.com/alexeygrigorev/mahout/blob/flink-binding/flink/src/test/scala/org/apache/mahout/flinkbindings/UseCasesSuite.scala#L109

When I do `A = A - evdComponent`, it throws some exceptions - which, I think, 
should be solved if the results are flushed to FS and then re-read.


> Mahout DSL for Flink: Flush intermediate results to FS
> ------------------------------------------------------
>
>                 Key: MAHOUT-1755
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1755
>             Project: Mahout
>          Issue Type: Task
>          Components: Math
>    Affects Versions: 0.10.2
>            Reporter: Alexey Grigorev
>            Priority: Minor
>
> Now Flink (unlike Spark) doesn't keep intermediate results in memory - 
> therefore they should be flushed to a file system, and read back when 
> required. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to