fresh-borzoni opened a new issue, #152:
URL: https://github.com/apache/fluss-rust/issues/152

   ### Search before asking
   
   - [x] I searched in the 
[issues](https://github.com/apache/fluss-rust/issues) and found nothing similar.
   
   
   ### Motivation
   
    The Python bindings currently only expose `to_pandas()` and `to_arrow()` 
methods, which internally poll until **all** data is fetched and loaded into 
memory at once.
   
     ## Current Behavior
   
     ```python
     # Only batch API available - loads ALL data into memory
     scanner = await table.new_log_scanner()
     scanner.subscribe(None, None)
     df = scanner.to_pandas()  # Internally polls in a loop until complete
   ```
   
     Under the hood, to_pandas() does poll repeatedly (see 
bindings/python/src/table.rs:378-414), but this is hidden from users and 
doesn't allow for streaming consumption patterns.
   
   
   ### Solution
   
     Add an explicit poll() method to LogScanner that matches the C++ API and 
Rust core:
   
     # Streaming API - poll repeatedly with timeout
     scanner = await table.new_log_scanner()
     scanner.subscribe(None, None)
   
     while True:
         records = await scanner.poll(timeout_ms=5000)
         if records.num_rows > 0:
             process_in_realtime(records)
         else:
             break
   
   ### Anything else?
   
   _No response_
   
   ### Willingness to contribute
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to