kinow commented on issue #35508: URL: https://github.com/apache/arrow/issues/35508#issuecomment-1542061091
Hi @westonpace I work with @kat-grayson on the same project , but on another component that does not use t-digest directly (so I am a bit lost, still learning the ropes around streaming, t-digest, etc.). >You could choose to do this completely outside of Acero using the function registry directly. You would end up creating something that looks quite a bit like the aggregate node so I'd recommend starting by looking at that and getting familiar. A little more context here; our project has already a streaming component, and Apache Arrow is not used anywhere in the project (I think, but could be wrong… large project). But still a good suggestion to look at the [aggregate node in Acero](https://arrow.apache.org/docs/cpp/streaming_execution.html#aggregate). From a brief look, I **think** the aggregate node [accepts functions](https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/cpp/src/arrow/acero/aggregate_node.cc) to apply to the streaming data. One being an [aggregate function with tdigest](https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/cpp/src/arrow/compute/kernels/aggregate_tdigest.cc#L81). @kat-grayson I think that aggregate calls `NanAdd` in [the tdigest](https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/cpp/src/arrow/util/tdigest.h#L73). So the C++ code seems to support adding to an existing t-digest. But I think the Python version was made simpler, without exposing some functions from the C++ t-digest object. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
