Copilot commented on code in PR #730:
URL: https://github.com/apache/tsfile/pull/730#discussion_r2851460547


##########
python/tsfile/utils.py:
##########
@@ -164,8 +179,11 @@ def _gen(is_iterator: bool) -> Iterator[pd.DataFrame]:
                         continue
                     total_rows += len(dataframe)
                     if time_column is not None:
-                        if _column_names is None or time_column.lower() not in 
[c.lower() for c in _column_names]:
+                        if _column_names is None or time_column not in 
_column_names:
                             dataframe = 
dataframe.rename(columns={dataframe.columns[0]: time_column})
+                    if no_data_query and _column_names is not None:
+                        _column_names.insert(0, TIME_COLUMN)

Review Comment:
   This line inserts TIME_COLUMN (the constant "time") instead of the actual 
time_column variable. If the table schema defines a time column with a 
different name (e.g., "id"), and that column name is not explicitly requested 
by the user, the dataframe will have a column with the schema's time column 
name, not "time". When trying to filter the dataframe by _column_names at line 
186, pandas will raise a KeyError because "time" doesn't exist in the 
dataframe. This should use time_column instead of TIME_COLUMN. Additionally, 
this insertion should only happen if time_column is not already in 
_column_names.
   ```suggestion
                       if no_data_query and _column_names is not None and 
time_column is not None:
                           if time_column not in _column_names:
                               _column_names.insert(0, time_column)
   ```



##########
python/tsfile/utils.py:
##########
@@ -164,8 +179,11 @@ def _gen(is_iterator: bool) -> Iterator[pd.DataFrame]:
                         continue
                     total_rows += len(dataframe)
                     if time_column is not None:
-                        if _column_names is None or time_column.lower() not in 
[c.lower() for c in _column_names]:
+                        if _column_names is None or time_column not in 
_column_names:
                             dataframe = 
dataframe.rename(columns={dataframe.columns[0]: time_column})
+                    if no_data_query and _column_names is not None:
+                        _column_names.insert(0, TIME_COLUMN)
+                        dataframe = dataframe[_column_names]

Review Comment:
   This code modifies the caller's original list by calling insert on 
_column_names. Since _column_names is a reference to the column_names parameter 
(after lowercasing at line 128), this mutates the caller's list. This mutation 
will cause issues if the same list is reused across multiple calls or if the 
caller expects their list to remain unchanged. Additionally, this mutation 
happens inside the iterator loop, so it will occur for every batch of data, 
repeatedly inserting TIME_COLUMN at position 0, which will cause incorrect 
behavior. The insertion should be done on a copy of the list, not on the 
original reference.
   ```suggestion
                           columns_with_time = [TIME_COLUMN] + 
list(_column_names)
                           dataframe = dataframe[columns_with_time]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to