I guess on a technicality the docs just say "first item in this RDD", not
"first line in the source text file". AFAIK there is no way apart from
filtering to remove header lines
<http://stackoverflow.com/a/24734612/877069>.

As long as first() always returns the same value for a given RDD, I think
it's fine, no?

Nick


On Sun Feb 22 2015 at 9:09:01 PM Michael Malak
<michaelma...@yahoo.com.invalid> wrote:

> Since RDDs are generally unordered, aren't things like textFile().first()
> not guaranteed to return the first row (such as looking for a header row)?
> If so, doesn't that make the example in
> http://spark.apache.org/docs/1.2.1/quick-start.html#basics misleading?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to