Re: [I] Apache Arrow Flight Server (Data as a Service) [arrow]

via GitHub Mon, 16 Sep 2024 21:58:03 -0700


Susmit07 commented on issue #44135:
URL: https://github.com/apache/arrow/issues/44135#issuecomment-2354517143


   One option can be on-demand data retrieval in an Apache Arrow Flight server, 
and to handle scenarios where the data returned is potentially very large, 
fetch and stream data efficiently without overwhelming memory resources.
   
   override def doGet(context: FlightHandler.DoGetContext, ticket: Ticket): 
FlightStream = {
       val schema = /* Retrieve schema */
       val stream = new ArrowStreamWriter(/* Output Stream */, allocator, 
schema)
   
       val data = fetchDataFromS3(ticket) // Implement this to fetch data from 
S3
   
       val reader = new ArrowStreamReader(data, allocator)
       while (reader.loadNextBatch()) {
         val batch = reader.getVectorSchemaRoot.getRowCount
         stream.writeBatch(batch)
       }
   
       stream.end()
       new FlightStream(stream)
     }
   
   one downside we see is that everytime it makes an IO call to read from S3 .. 
but data will be read fresh
   
   Do you see any other issue ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Apache Arrow Flight Server (Data as a Service) [arrow]

Reply via email to