swamirishi commented on pull request #536: URL: https://github.com/apache/incubator-sedona/pull/536#issuecomment-896125604
> @swamirishi Thanks for your contribution. However, currently, Sedona DataFrame can be saved and loaded as Parquet files. I am not sure if we need this function for RDD. We have the adapter to travel between RDD and DataFrame. > @jiayuasu It is true we can load parquet files through a dataframe. But this is done through WKB as far as I understand. Parquet stores meta data stats at row group level. We can use those stats for predicate push & spatial Joins etc. E.g. If I have a file having location data spread across the globe. But while querying I am only interested for data located in India. We can use the row group stats for figuring out only the row groups containing India location data. This would reduce IO costs to a great extent. You can take look at the sample stats above for one of the dummy parquet files created. > Could you elaborate more about the purpose of this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
