Hello all,
I have to process using Scala the data generated from molecular dynamics
simulations stored in NetCDF format. So I am trying to use Sedona to read
the data spread across multiple files. From the posts available on this
mailing list, the closest I found to my requirement was
https://www.mail-archive.com/[email protected]/msg00684.html. However I
am unsuccessful in using it for my case. Some other resources I tried
suggest org.apache.sedona.core.format.WKBFileReader which I could not find
in the current version where the closest available was
org.apache.sedona.core.formatMapper.WkbReader (which does not seem to be
useful).
I would be grateful if anyone can help with a pointer so that I can create
a single RDD from multiple netCDF files (say f1.nc, f2.nc,...). The same
was done with SciSpark which is not compatible with Spark 3. What I am
trying to accomplish is as below that can be used for further processing:
# Read NetCDF files from filelists in ncDirectoryPath
val ncFilesRDD: org.apache.spark.rdd.RDD[org.dia.core.SciTensor] =
sc.netcdfFileList(
"file://" + ncDirectoryPath, List("time",
"cell_lengths", "coordinates")
)
# Read data of interest
val ncFileCRDArrayRDD = ncFilesRDD.map(x =>
(x.variables.get("time").get.data.toArray,
x.variables.get("cell_lengths").get.data.toArray,
x.variables.get("coordinates").get.data.toArray
))
Thank in advance,
With regards,
-Sanjeev