[
https://issues.apache.org/jira/browse/CRUNCH-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772421#comment-16772421
]
Josh Wills commented on CRUNCH-678:
-----------------------------------
Totally on board with this one; [~noslowerdna] do you want to take a crack at
it, or do you want me to?
> Avoid unnecessary retrieval of last modified time
> -------------------------------------------------
>
> Key: CRUNCH-678
> URL: https://issues.apache.org/jira/browse/CRUNCH-678
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Reporter: Andrew Olson
> Assignee: Josh Wills
> Priority: Major
>
> There is no assurance that the last modified time can be retrieved
> efficiently for all file systems. In particular, with object stores and large
> data sets it could be very slow. Since this information is actually not
> always needed, we should only retrieve it when necessary (i.e. when the write
> mode is checkpoint) for sources and targets.
> CRUNCH-658 expressed similar concerns for the getSize method. This would be a
> simpler and safer optimization to make.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)