[ 
https://issues.apache.org/jira/browse/SPARK-17633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513168#comment-15513168
 ] 

Anshul commented on SPARK-17633:
--------------------------------

What could be the possible reason for this? As spark's transformations are 
lazy, when the action is performed, the transformations are computed for that 
action. So in this case, after modifying the file, the count computation should 
be correct.

> texFile() and wholeTextFiles() count difference
> -----------------------------------------------
>
>                 Key: SPARK-17633
>                 URL: https://issues.apache.org/jira/browse/SPARK-17633
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 1.6.2
>         Environment: Unix/Linux
>            Reporter: Anshul
>
> sc.textFile() creates an RDD of string from a text file.
> After that when count is performed, the line count is correct, but if more 
> than one line is appended to the file manually and counting the same RDD of 
> string increments the output/result only by 1. 
> But in case of sc.wholeTextFiles() the output/result is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to