Re: [PR] Implement DataFrame methods `len`, `columns`, and `shape` [sedona-db]

via GitHub Thu, 02 Oct 2025 15:46:01 -0700


Copilot commented on code in PR #176:
URL: https://github.com/apache/sedona-db/pull/176#discussion_r2400298297



##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -151,6 +151,29 @@ def count(self) -> int:
         """
         return self._impl.count()
 
+    def __len__(self) -> int:
+        """Compute the number of rows in the DataFrame"""
+        return self.count()
+
+    @property
+    def columns(self) -> list[str]:
+        """Return the column names in the DataFrame"""
+        columns = list()
+        field_index = 0
+        while True:
+            try:
+                columns.append(self._impl.schema().field(field_index).name)
+                field_index += 1
+            except IndexError:
+                break
+
+        return columns
+
+    @property
+    def shape(self) -> tuple[int, int]:
+        """Return the shape of the DataFrame as a tuple of integers 
corresponding to (rows, columns)"""
+        return self.count(), len(self.columns)

Review Comment:
   The `shape` property calls both `count()` and `len(self.columns)`, which 
results in two separate operations. Since `columns` property already performs 
schema iteration, consider caching or optimizing this to avoid redundant work 
when both dimensions are needed.
   ```suggestion
           columns = self.columns
           return self.count(), len(columns)
   ```



##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -151,6 +151,29 @@ def count(self) -> int:
         """
         return self._impl.count()
 
+    def __len__(self) -> int:
+        """Compute the number of rows in the DataFrame"""
+        return self.count()
+
+    @property
+    def columns(self) -> list[str]:
+        """Return the column names in the DataFrame"""
+        columns = list()
+        field_index = 0
+        while True:
+            try:
+                columns.append(self._impl.schema().field(field_index).name)
+                field_index += 1
+            except IndexError:
+                break
+

Review Comment:
   The manual loop with exception handling to iterate through schema fields is 
unnecessarily complex. Consider using a more Pythonic approach with list 
comprehension or iterating over the schema directly if the API supports it.
   ```suggestion
           # Use a more Pythonic approach to extract column names
           columns = [field.name for field in self._impl.schema()]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Implement DataFrame methods `__len__`, `columns`, and `shape` [sedona-db]

Reply via email to

Re: [PR] Implement DataFrame methods `len`, `columns`, and `shape` [sedona-db]