HadrienG2 commented on issue #5700: URL: https://github.com/apache/arrow-rs/issues/5700#issuecomment-2111601507
Hi @gstvg . Right now I am working on array builders, because that seemed like a better starting point since there was already a stronger-typed API available to constrain the problem. But the intent is to eventually cover array access too. --- I have strong feelings against using proc macros for this, for reasons that I explained in the OP: > The design that I am proposing purposely uses a minimal amount of procedural macros. Procedural macros are hard to write, hard to test, hard to maintain, hard for users to reason about, and hard on IDEs. In my opinion, they should therefore be a last-resort tool that one only uses for when no good alternative is available. In retrospect, one point which I should have added, although it was somewhat covered by "hard for users to reason about", is "hard to document". With proc macros (or macros in general indeed), it is hard to explain to users what methods the type generated by a macro will have, what these methods do, what is their usage contract if they are unsafe, etc. This is not to say that the process is super-easy with sufficiently generic code, but I think it's much easier and getting there in my prototype. For this reason, my intent is that if I provide proc macros at all, it will only be as an optional extension for support of user-defined types like custom structs and enums. --- > The reason for returning a intermediary value with methods instead of a tuple or a struct with values is to not access any memory that the user may not want. My preferred way to handle this problem (and a few other issues of row-based access like memory access inefficiency caused by all the transposes and bad compiler optimizations of complex tuples) is to provide slice-based accessors. For example, the data from a builder of tuples `TypedBuilder<(T, U, V)>` will be accessible as tuple of slices `(&[T], &[U], &[V])`. In the context of builders, there will also be an inner builder accessor following the same logic (morally an `impl AsMut<(&mut TypedBuilder<T>, &mut TypedBuilder<U>, &mut TypedBuilder<V>)>` with extra consistency checks), so that users can call type-specific builder methods. This may or may not be needed for array access after building as well, I'll see when I get there. --- Overall, my plan so far has been to submit a WIP prototype here once I'm confident that I have a design that can scale to all the current builder/array types that arrow offers. Right now I have nulls, booleans, `Option<T>`, primitive types are mostly done, and lists are well on the way too although I am cleaning up a few things at the moment. Next I want structs and unions. I believe that after that, all the remaining builders have APIs that can be considered a combination of the primitive, lists, and structs APIs, modulo a few extra/missing constructor parameters, and thus the builder API design that I'm proposing should be robust/stabilized enough to warrant an external review. But if you want, I can push the code before that, so that you can have an early look at it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
