jtuglu1 commented on PR #18880:
URL: https://github.com/apache/druid/pull/18880#issuecomment-3721354011

   > > 👍 A few questions:
   > > ```
   > > * Will this support reading a single column from a segment (instead of 
needing to download + scan entire segment)? I guess mapping offset ranges of a 
segment file is analogous to the row-group concept in Parquet.
   > > ```
   > 
   > Definitely supporting partial downloads at the level of columns and/or 
projections is a goal of this format, and something it would enable doing.
   > 
   > > ```
   > > * Are there any thoughts to make Druid formats Arrow-compatible? This 
would open up many more integrations with existing big data ecosystem 
externally, as well as making intra-cluster data transfer potentially much 
faster (send everything as RecordBatch).
   > > ```
   > 
   > For intra-cluster data transfer, the MSQ query paths (which to me are the 
ones I want to focus on 😄) are using Frames, which are similar to Arrow in 
efficiency. For integrating with the big data ecosystem in ways that require 
actually using Arrow, there is a question about whether we're doing something 
for data in flight (RPC) or for data at rest (in object storage). For RPC I 
think an API that returns Arrow streams can make sense in theory. It wouldn't 
be related to the segment format, it would be more related to the query side. 
For data at rest, I don't know how much sense that makes. I haven't heard much 
of people using Arrow for data at rest.
   
   Check out https://github.com/lancedb/lance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to