[ https://issues.apache.org/jira/browse/CRUNCH-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772424#comment-16772424 ]
Andrew Olson commented on CRUNCH-678: ------------------------------------- [~jwills] I'm working on it, will have a pull request open shortly. > Avoid unnecessary retrieval of last modified time > ------------------------------------------------- > > Key: CRUNCH-678 > URL: https://issues.apache.org/jira/browse/CRUNCH-678 > Project: Crunch > Issue Type: Improvement > Components: Core > Reporter: Andrew Olson > Assignee: Josh Wills > Priority: Major > > There is no assurance that the last modified time can be retrieved > efficiently for all file systems. In particular, with object stores and large > data sets it could be very slow. Since this information is actually not > always needed, we should only retrieve it when necessary (i.e. when the write > mode is checkpoint) for sources and targets. > CRUNCH-658 expressed similar concerns for the getSize method. This would be a > simpler and safer optimization to make. -- This message was sent by Atlassian JIRA (v7.6.3#76005)