Hi,

Do you know any simple way to load multiple csv files (same schema) that are in 
different paths?
Wildcards are not a solution, as I want to load specific csv files from 
different folders.

I came across a solution 
(https://stackoverflow.com/questions/37639956/how-to-import-multiple-csv-files-in-a-single-load
 
<https://stackoverflow.com/questions/37639956/how-to-import-multiple-csv-files-in-a-single-load>)
 that suggests something like

spark.read.format("csv").option("header", "false")
            .schema(custom_schema)
            .option('delimiter', '\t')
            .option('mode', 'DROPMALFORMED')
            .load(paths.split(','))
However, even it mentions that this approach would work in Spark 2.x, I don’t 
find an implementation of load that accepts an Array[String] as an input 
parameter.

Thanks in advance for your help.


Didac Gil de la Iglesia
PhD in Computer Science
didacg...@gmail.com
Spain:     +34 696 285 544
Sweden: +46 (0)730229737
Skype: didac.gil.de.la.iglesia

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to