Copilot commented on code in PR #730:
URL: https://github.com/apache/tsfile/pull/730#discussion_r2851460547
##########
python/tsfile/utils.py:
##########
@@ -164,8 +179,11 @@ def _gen(is_iterator: bool) -> Iterator[pd.DataFrame]:
continue
total_rows += len(dataframe)
if time_column is not None:
- if _column_names is None or time_column.lower() not in
[c.lower() for c in _column_names]:
+ if _column_names is None or time_column not in
_column_names:
dataframe =
dataframe.rename(columns={dataframe.columns[0]: time_column})
+ if no_data_query and _column_names is not None:
+ _column_names.insert(0, TIME_COLUMN)
Review Comment:
This line inserts TIME_COLUMN (the constant "time") instead of the actual
time_column variable. If the table schema defines a time column with a
different name (e.g., "id"), and that column name is not explicitly requested
by the user, the dataframe will have a column with the schema's time column
name, not "time". When trying to filter the dataframe by _column_names at line
186, pandas will raise a KeyError because "time" doesn't exist in the
dataframe. This should use time_column instead of TIME_COLUMN. Additionally,
this insertion should only happen if time_column is not already in
_column_names.
```suggestion
if no_data_query and _column_names is not None and
time_column is not None:
if time_column not in _column_names:
_column_names.insert(0, time_column)
```
##########
python/tsfile/utils.py:
##########
@@ -164,8 +179,11 @@ def _gen(is_iterator: bool) -> Iterator[pd.DataFrame]:
continue
total_rows += len(dataframe)
if time_column is not None:
- if _column_names is None or time_column.lower() not in
[c.lower() for c in _column_names]:
+ if _column_names is None or time_column not in
_column_names:
dataframe =
dataframe.rename(columns={dataframe.columns[0]: time_column})
+ if no_data_query and _column_names is not None:
+ _column_names.insert(0, TIME_COLUMN)
+ dataframe = dataframe[_column_names]
Review Comment:
This code modifies the caller's original list by calling insert on
_column_names. Since _column_names is a reference to the column_names parameter
(after lowercasing at line 128), this mutates the caller's list. This mutation
will cause issues if the same list is reused across multiple calls or if the
caller expects their list to remain unchanged. Additionally, this mutation
happens inside the iterator loop, so it will occur for every batch of data,
repeatedly inserting TIME_COLUMN at position 0, which will cause incorrect
behavior. The insertion should be done on a copy of the list, not on the
original reference.
```suggestion
columns_with_time = [TIME_COLUMN] +
list(_column_names)
dataframe = dataframe[columns_with_time]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]