[GitHub] [incubator-sedona] swamirishi commented on pull request #536: [SEDONA-36] Parquet reader & Writers

GitBox Tue, 10 Aug 2021 09:28:03 -0700


swamirishi commented on pull request #536:
URL: https://github.com/apache/incubator-sedona/pull/536#issuecomment-896125604



   > @swamirishi Thanks for your contribution. However, currently, Sedona 
DataFrame can be saved and loaded as Parquet files. I am not sure if we need 
this function for RDD. We have the adapter to travel between RDD and DataFrame.
   > @jiayuasu It is true we can load parquet files through a dataframe. But 
this is done through WKB as far as I understand. Parquet stores meta data stats 
at row group level. We can use those stats for predicate push & spatial Joins 
etc. 
   E.g. If I have a file having location data spread across the globe. But 
while querying I am only interested for data located in India. We can use the 
row group stats for figuring out only the row groups containing India location 
data. This would reduce IO costs to a great extent. You can take look at the 
sample stats above for one of the dummy parquet files created.
   > Could you elaborate more about the purpose of this PR?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-sedona] swamirishi commented on pull request #536: [SEDONA-36] Parquet reader & Writers

Reply via email to