Xiao-zhen-Liu commented on code in PR #4488:
URL: https://github.com/apache/texera/pull/4488#discussion_r3170310892


##########
amber/src/main/python/core/models/state.py:
##########
@@ -15,58 +15,56 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from dataclasses import dataclass
-from pandas import DataFrame
-from pyarrow import Table
-from typing import Optional
+import base64
+import json
+from typing import Any, Dict, TypeAlias
 
-from .schema import Schema, AttributeType
-from .schema.attribute_type import FROM_PYOBJECT_MAPPING
+from .schema import Schema
+from .tuple import Tuple
 
+State: TypeAlias = Dict[str, Any]

Review Comment:
   `State` being a type alias is fine for annotations, but here it is used as a 
runtime match pattern in `main_loop.py`:
   
   ```
   match(
       element,
       Tuple,
       self._process_tuple,
       State,
       self._process_state,
   )
   ```
   
   Since `State: TypeAlias = Dict[str, Any]`, it is not a concrete runtime 
class like `dict`; it is a subscripted `typing.Dict`.
   
   That means runtime checks against it can fail, e.g.:
   
   ```
   from typing import Any, Dict, TypeAlias
   State: TypeAlias = Dict[str, Any]
   isinstance({"x": 1}, State)
   # TypeError: Subscripted generics cannot be used with class and instance 
checks
   ```
   
   So an incoming `StateFrame` payload can reach this dispatch path and fail 
before `_process_state` is called. Could we match `dict` in `main_loop.py` 
instead, or make `State` a real runtime class if we need to distinguish state 
from ordinary dictionaries?
   



##########
amber/src/main/python/core/architecture/packaging/output_manager.py:
##########
@@ -248,7 +248,7 @@ def emit_state(
                         receiver,
                         (
                             StateFrame(payload)
-                            if isinstance(payload, State)
+                            if isinstance(payload, dict)

Review Comment:
   Using `dict` here avoids the `State` type-alias runtime issue, but it also 
makes the dispatch very broad: any dictionary returned from `flush_state` 
becomes a `StateFrame`. Is `flush_state` guaranteed to return only either a 
State dictionary or a list of Tuples? If so, could we make that contract 
explicit, or consider a concrete `State` runtime type/wrapper so this branch 
does not silently classify arbitrary dictionaries as state?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to