Artem Aliev created TINKERPOP-2081:
--------------------------------------

             Summary: PersistedOutputRDD materialises rdd lazily with Spark 2.x
                 Key: TINKERPOP-2081
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2081
             Project: TinkerPop
          Issue Type: Bug
    Affects Versions: 3.3.4
            Reporter: Artem Aliev


PersistedOutputRDD is not actually persist RDD in spark memory but mark it for 
lazy caching in the future. It looks like caching was eager in Spark 1.6, but 
in spark 2.0 it lazy.
The lazy caching looks wrong for this case, the source graph could be changed 
after snapshot is created and snapshot should not be affected by that changes.

The fix itself is simple: PersistedOutputRDD should call any spark action to 
trigger eager caching. For example count()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to