I'm not sure this makes sense as an external stable api. I definitely think it is useful as an internal representation for use within a particular algorithm. I also think that can be informed by the particular algorithm that you're working on.
We definitely had this requirement in Dremio and came up with an internal representation that we are happy with for the use in hash tables. I'll try to dig up the design docs we had around this but the actual pivoting/unpivoting code that we developed can be seen here: [1], [2]. Our main model is two blocks: a fixed width block and a variable width block (with the fixed width block also carrying address & length of the variable data). Fixed width is randomly accessible and variable width is randomly accessible through fixed width. [1] https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/Pivots.java [2] https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/Unpivots.java On Tue, Jun 26, 2018 at 10:20 AM, Wes McKinney <wesmck...@gmail.com> wrote: > hi Antoine, > > On Sun, Jun 24, 2018 at 1:06 PM, Antoine Pitrou <anto...@python.org> > wrote: > > > > Hi Wes, > > > > Le 24/06/2018 à 08:24, Wes McKinney a écrit : > >> > >> If this sounds interesting to the community, I could help to kickstart > >> a design process which would likely take a significant amount of time. > >> The requirements could be complex (i.e. we might want to support > >> variable-size record fields while also providing random access > >> guarantees). > > > > What do you call "variable-sized" here? A scheme where the length of a > > record's field is determined by the value of another field in the same > > record? > > As an example, here is a fixed size record > > record foo { > a: int32; > b: float64; > c: uint8; > } > > With padding suppose this is 16 bytes per record; so if we have a > column of these, then random accessing any value in any record is > simple. > > Here's a variable-length record: > > record bar { > a: string; > b: list<int32>; > } > > What I've seen done to represent this in memory is to have a fixed > size record followed by a sidecar containing the variable-length data, > so the fixed size portion might look something like > > a_offset: int32; > a_length: int32; > b_offset: int32; > b_length: int32; > > So from this, you can do random access into the record. If you wanted > to do random access on a _column_ of such records, it is similar to > our current variable-length Binary type. So it might be that the > underlying Arrow memory layout would be FixedSizeBinary for fixed-size > records and variable Binary for variable-size records. > > - Wes > > > > > > > > > Regards > > > > Antoine. >