[GitHub] [arrow] danepitkin commented on a diff in pull request #37097: GH-36730: [Python] Add support for Cython 3.0.0

via GitHub Tue, 22 Aug 2023 06:53:11 -0700


danepitkin commented on code in PR #37097:
URL: https://github.com/apache/arrow/pull/37097#discussion_r1301683589



##########
python/pyarrow/_dataset.pyx:
##########
@@ -1838,7 +1838,7 @@ cdef class FileFragment(Fragment):
             typ = ""
         partition_dict = get_partition_keys(self.partition_expression)
         partition = ", ".join(
-            [f"{key}={val}" for key, val in partition_dict.items()]
+            sorted([f"{key}={val}" for key, val in partition_dict.items()])

Review Comment:
   Yes, the `__repr__` test was failing. I thought it might be nice to have 
deterministic output for the `__repr__`. The key/value pair ordering was 
swapped when I compiled with Cython 3. Would it be better to update the test 
case to handle non-determinism?
   
   
https://github.com/apache/arrow/blob/main/python/pyarrow/tests/test_dataset.py#L1614
   ```
   def test_fragments_repr(tempdir, dataset):
       # partitioned parquet dataset
       fragment = list(dataset.get_fragments())[0]
       assert (
           repr(fragment) ==
           "<pyarrow.dataset.ParquetFileFragment 
path=subdir/1/xxx/file0.parquet "
           "partition=[key=xxx, group=1]>"
       )
   ```
   
   My test output was returning `"partition=[group=1, key=xxx]>"`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] danepitkin commented on a diff in pull request #37097: GH-36730: [Python] Add support for Cython 3.0.0

Reply via email to