[ https://issues.apache.org/jira/browse/SPARK-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947234#comment-14947234 ]
Mark Grover commented on SPARK-10965: ------------------------------------- Thanks Sean. I haven't really decided on the approach yet but will keep you posted. > Optimize filesEqualRecursive > ---------------------------- > > Key: SPARK-10965 > URL: https://issues.apache.org/jira/browse/SPARK-10965 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.5.2 > Reporter: Mark Grover > Priority: Minor > > When we try to download dependencies, if there is a file at the destination > already, we compare if the files are equal (recursively, if they are > directories). For files, we compare their bytes. Now, these dependencies can > be jars and be really large and byte-by-byte comparisons can super slow. > I think it'd be better to do a checksum. > Here's the code in question: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L500 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org