I am not sure why you need to create an RDD first. You can create a
data frame directly from csv file, for instance:
spark.read.format("csv").option("header","true").schema(yourSchema).load(ftpUrl)
-- ND
On 8/5/21 3:14 AM, igyu wrote:
val ftpUrl ="ftp://test:test@ip:21/upload/test/_temporary/0/_temporary/task_20191211114756_0002_m_000000_0/*"
val rdd = spark.sparkContext.wholeTextFiles(ftpUrl)
val value = rdd.map(_._2).map(csv=>csv.split(",").toSeq)
val schemas =StructType(List(
new StructField("id", DataTypes.StringType, true), new StructField("name",
DataTypes.StringType, true), new StructField("year", DataTypes.IntegerType, true), new
StructField("city", DataTypes.StringType, true)))
val DF = spark.createDataFrame(value,schemas)
How can I createDataFrame
------------------------------------------------------------------------
igyu