hntd187 commented on issue #1544:
URL:
https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1013426317
> I've used kafka-streams, flink and beam professionally, the point of
streaming is to execute windowed functions, join aggregated data with in-memory
tables and distribute these computations (DAGs) according to partition keys, so
that the right value gets sent to the right thread.
>
> In terms of priority, it looks to me like this would not be a reasonable
thing to do before having methods like `DataFrame::map_partition` and traits
like `SortMergeJoin`
We can add that to the design if we have to implement it. We maybe able to
be dumb about it in the meantime.
```rust
let ec = ExecutionContext::new();
let dfs = DFSchema::new(
vec![
DFField::new(Some("topic1"), "key", DataType::Binary, false),
DFField::new(Some("topic1"), "value", DataType::Binary, false),
]
).unwrap();
let lp = LogicalPlan::StreamingScan(StreamScan {
topic_name: "topic1".to_string(),
source: Arc::new(KafkaExecutionPlan {
time_window: Duration::from_millis(200),
topic: "topic1".to_string(),
batch_size: 5,
conf: consumer_config("0", None),
}),
schema: Arc::new(dfs),
batch_size: Some(5),
});
let df = DataFrameImpl::new(ec.state, &lp);
timeout(Duration::from_secs(10), async move {
let mut b = df.execute_stream().await.unwrap();
let batch = b.next().await;
dbg!(batch);
}).await.unwrap();
```
Based on my awful awful base implementation this gets data back. Hooking
this up has taken me into the depths of DF more than I hoped, I don't know if
outside of a custom DataFrame impl this would be possible in a contrib module.
I guess the struggle I am facing here is it seems like the dataframe right now
has to get a "wait or end" of data reading notification, where obviously for
streaming this really never comes. Is anyone more readily aware of where this
occurs or if I'm on the right track?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]