Susmit07 commented on issue #44135:
URL: https://github.com/apache/arrow/issues/44135#issuecomment-2354517143
One option can be on-demand data retrieval in an Apache Arrow Flight server,
and to handle scenarios where the data returned is potentially very large,
fetch and stream data efficiently without overwhelming memory resources.
override def doGet(context: FlightHandler.DoGetContext, ticket: Ticket):
FlightStream = {
val schema = /* Retrieve schema */
val stream = new ArrowStreamWriter(/* Output Stream */, allocator,
schema)
val data = fetchDataFromS3(ticket) // Implement this to fetch data from
S3
val reader = new ArrowStreamReader(data, allocator)
while (reader.loadNextBatch()) {
val batch = reader.getVectorSchemaRoot.getRowCount
stream.writeBatch(batch)
}
stream.end()
new FlightStream(stream)
}
one downside we see is that everytime it makes an IO call to read from S3 ..
but data will be read fresh
Do you see any other issue ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]