Re: HOw to concatenate two csv files into one RDD?

Sujit Pal Fri, 26 Jun 2015 16:29:51 -0700

Hi Rex,

If the CSV files are in the same folder and there are no other files,
specifying the directory to sc.textFiles() (or equivalent) will pull in all
the files. If there are other files, you can pass in a pattern that would
capture the two files you care about (if thats possible). If neither of
these work for you, you can create individual RDDs for each file and union
them.


-sujit


On Fri, Jun 26, 2015 at 11:00 AM, Rex X <dnsr...@gmail.com> wrote:

> With Python Pandas, it is easy to do concatenation of dataframes
> by combining  pandas.concat
> <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html>
> and pandas.read_csv
>
> pd.concat([pd.read_csv(os.path.join(Path_to_csv_files, f)) for f in
> csvfiles])
>
> where "csvfiles" is the list of csv files
>
>
> HOw can we do this in Spark?
>
>
>

Re: HOw to concatenate two csv files into one RDD?

Reply via email to