csringhofer commented on PR #156:
URL: https://github.com/apache/parquet-site/pull/156#issuecomment-3876292387

   Reflecting on the discussion about incomplete statistic support.
   
   I checked a few implementation and while writing statistics for geometries 
seems to be there in general, I haven't found a single implementation of 
geography with any edge interpolation algorithm. The rust 
[implementation](https://github.com/apache/arrow-rs/blob/7dbe58a6e0e18985861db1dfa71507174e838cae/parquet/src/geospatial/accumulator.rs#L151)
 seems to handle the stats for points (where edge interpolation is not needed) 
and allows the user to inject its own implementation.
   
    >Maybe a more accurate summary is that the column statistics collection is 
not yet fully integrated into all engines.
   
   I agree in case of geometry, but I think that it would make things clearer 
to mention that for geography this is incomplete, at least in common open 
source libraries. The blog post mentions "Spatial statistics" as core feature 
and generally mentions geometry and geography side by side, so the reader may 
assume that statistics support is widely available for both logical types. This 
also effect the approach to choosing the best type to use - if bounding boxes 
are not yet available for geography and per file skipping is critical, then the 
user should try to build their workload on geometry.
   
   I don't know the status of statistics implementation of geography, but I 
haven't seen PRs about this, so my assumption is that it may take a significant 
time to have at least spherical interpolation available widely in Parquet 
libraries (or extension libraries). I would be happy to be proven wrong :)
   
   Btw the blog was a great read!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to