Re: [I] Make it easier to treat `Rows` as bytes [arrow-rs]

via GitHub Mon, 15 Jul 2024 19:18:01 -0700


bkirwi commented on issue #6063:
URL: https://github.com/apache/arrow-rs/issues/6063#issuecomment-2229873330


   > Just out of curiosity, how do you go the other way?
   
   Currently, the idea is to go from of `&[u8]` to an iterator of `Row` via 
`RowParser::parse` and then feed that to `convert_rows`. At a glance this 
doesn't seem to cost more than converting a `BinaryArray` to `Rows` and then 
passing that to `convert_rows`... and may be better if the binary array is 
shared, since `Rows` wouldn't be able to reuse the allocation.
   
   I do think it makes sense to have an API that goes the other way - it seems 
"natural" and easy to implement - but IIUC it's less important for performance.
   
   > What other advantages do you see?
   
   API consistency and code reuse, I suppose... you can imagine having an API 
like `impl Rows { fn binary(&self) -> &BinaryArray; }`, and then doing things 
like finding the maximum row by passing that to `arrow::compute::max_binary` or 
using any of the other functionality the ecosystem has on the array type. And 
the `Rows` impl itself may get to reuse some code - since it's structurally 
very similar to `BinaryArray`, and that code's been very heavily optimized.
   
   Might be worthwhile! But to me it feels a bit murkier than the other APIs 
under discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Make it easier to treat `Rows` as bytes [arrow-rs]

Reply via email to