Environment:
Apache Spark 1.6.2
Scala: 2.10

I am currently using the spark-csv package courtesy of databricks and I
would like to have a (pre processing ?) stage when reading the CSV file that
also adds a row number to each row of data being read from the csv file. 
This will allow for better traceability and data lineage in case of
validation or data processing issues downstream.

In doing the research it seems like the zipWithIndex API is the right or
only way to get this pattern implemented.

Would this be the preferred route?  Would this be safe for parallel
operations as far as respect no collisions?  Any body have a similar
requirement and have a better solution you can point me to.

Appreciate any help and responses anyone can offer.

Thanks
-a



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/CSV-Reader-with-row-numbers-tp18946.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to