zhangfengcdt commented on PR #2005: URL: https://github.com/apache/sedona/pull/2005#issuecomment-3005566360
> > Looks great! I think for the first step, we could target covering whatever these scala tests cover in exact results. > > That made perfect sense originally, but now I see why it wasn't done in Scala. These test files we're using (`self.mixedWktGeometryInputLocation`) have 100 entries. Hard coding 100 polygons for a single test (e.g for ST_buffer) would make reading and navigating the test file a real pain. > > Another thought. Originally, I had a test checking if the sedona sql function's result matched our new sgpd result. However, we agreed to remove it because it seemed trivial that of course the results would match. Following that same logic, if we can assume the new sedona geopandas results match the sedona sql results, then why do we need to replicate the Scala tests again if we already know sedona sql passes it? Or maybe we should just not make that assumption (in that case maybe it is worth bringing back that old sgpd vs sedona test). Personally, I'm fine with making the assumption. If anything, maybe we should hard-code results for the `test_match_geopandas_series.py` tests since those results are smaller. WDYT @zhangfengcdt? I think the reason why we still need the hard-coded tests are that they are not simply replicating the scala expression tests, they should extend the scala cases. Basically, the goals are: - Verify the SQL created to convert geopandas -> sedona queries are expected; - Verify manually the results for matching or differences (reason) between geopandas origin and geopandas on Sedona. This means we will very possibly cover more cases than the scala expression tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
