Re: [I] Proposal: Alternate APIs with stronger typing [arrow-rs]

via GitHub Tue, 14 May 2024 22:15:03 -0700


HadrienG2 commented on issue #5700:
URL: https://github.com/apache/arrow-rs/issues/5700#issuecomment-2111601507


   Hi @gstvg .
   
   Right now I am working on array builders, because that seemed like a better 
starting point since there was already a stronger-typed API available to 
constrain the problem. But the intent is to eventually cover array access too.
   
   ---
   
   I have strong feelings against using proc macros for this, for reasons that 
I explained in the OP:
   
   > The design that I am proposing purposely uses a minimal amount of 
procedural macros. Procedural macros are hard to write, hard to test, hard to 
maintain, hard for users to reason about, and hard on IDEs. In my opinion, they 
should therefore be a last-resort tool that one only uses for when no good 
alternative is available.
   
   In retrospect, one point which I should have added, although it was somewhat 
covered by "hard for users to reason about", is "hard to document". With proc 
macros (or macros in general indeed), it is hard to explain to users what 
methods the type generated by a macro will have, what these methods do, what is 
their usage contract if they are unsafe, etc. This is not to say that the 
process is super-easy with sufficiently generic code, but I think it's much 
easier and getting there in my prototype.
   
   For this reason, my intent is that if I provide proc macros at all, it will 
only be as an optional extension for support of user-defined types like custom 
structs and enums.
   
   ---
   
   > The reason for returning a intermediary value with methods instead of a 
tuple or a struct with values is to not access any memory that the user may not 
want.
   
   My preferred way to handle this problem (and a few other issues of row-based 
access like memory access inefficiency caused by all the transposes and bad 
compiler optimizations of complex tuples) is to provide slice-based accessors. 
For example, the data from a builder of tuples `TypedBuilder<(T, U, V)>` will 
be accessible as tuple of slices `(&[T], &[U], &[V])`. In the context of 
builders, there will also be an inner builder accessor following the same logic 
(morally an `impl AsMut<(&mut TypedBuilder<T>, &mut TypedBuilder<U>, &mut 
TypedBuilder<V>)>` with extra consistency checks), so that users can call 
type-specific builder methods. This may or may not be needed for array access 
after building as well, I'll see when I get there.
   
   ---
   
   Overall, my plan so far has been to submit a WIP prototype here once I'm 
confident that I have a design that can scale to all the current builder/array 
types that arrow offers. Right now I have nulls, booleans, `Option<T>`, 
primitive types are mostly done, and lists are well on the way too although I 
am cleaning up a few things at the moment. Next I want structs and unions. I 
believe that after that, all the remaining builders have APIs that can be 
considered a combination of the primitive, lists, and structs APIs, modulo a 
few extra/missing constructor parameters, and thus the builder API design that 
I'm proposing should be robust/stabilized enough to warrant an external review.
   
   But if you want, I can push the code before that, so that you can have an 
early look at it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Proposal: Alternate APIs with stronger typing [arrow-rs]

Reply via email to