[ https://issues.apache.org/jira/browse/SPARK-17633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513013#comment-15513013 ]
Sean Owen commented on SPARK-17633: ----------------------------------- It's not clear what you're reporting. textFiles and wholeTextFiles do quite different things. Updating a file on disk does not change, necessarily, the result that was already computed and cached in an RDD. You could have the wrong line separators. > texFile() and wholeTextFiles() count difference > ----------------------------------------------- > > Key: SPARK-17633 > URL: https://issues.apache.org/jira/browse/SPARK-17633 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 1.6.2 > Environment: Unix/Linux > Reporter: Anshul > > sc.textFile() creates an RDD of string from a text file. > After that when count is performed, the line count is correct, but if more > than one line is appended to the file manually and counting the same RDD of > string increments the output/result only by 1. > But in case of sc.wholeTextFiles() the output/result is correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org