bkietz commented on a change in pull request #11607:
URL: https://github.com/apache/arrow/pull/11607#discussion_r743153046



##########
File path: docs/source/format/Columnar.rst
##########
@@ -693,6 +690,16 @@ Only the slot in the array corresponding to the type index 
is considered. All
 "unselected" values are ignored and could be any semantically correct array
 value.
 
+.. note:: Critically, the sparse union allows shared columns to be reused 
between union members
+   in the ubiquitous union-of-structs with non-overlapping-fields use case.  
For example the union::
+
+       SparseUnion<m1: Struct<i: Int32>,
+                   m2: Struct<i: Int32, f: Float32, s: VarBinary>,
+                   m3: Struct<f: Float32, s: VarBinary>>
+
+   could be backed by just three columns (one for each type) since no union 
member requires more
+   than one of each.

Review comment:
       I've added a test which passes locally. I don't think it'd be illegal 
for an IPC writer to reuse a buffer though it's an optimization I wouldn't 
expect to be useful with any frequency. Should I just remove this note?

##########
File path: docs/source/format/Columnar.rst
##########
@@ -622,11 +619,11 @@ use cases:
 * A sparse union is more amenable to vectorized expression evaluation in some 
use cases.
 * Equal-length arrays can be interpreted as a union by only defining the types 
array.
 
-**Example layout: ``SparseUnion<u0: Int32, u1: Float, u2: VarBinary>``**
+**Example layout: ``SparseUnion<i: Int32, f: Float, s: VarBinary>``**

Review comment:
       will do

##########
File path: docs/source/format/Columnar.rst
##########
@@ -693,6 +690,16 @@ Only the slot in the array corresponding to the type index 
is considered. All
 "unselected" values are ignored and could be any semantically correct array
 value.
 
+.. note:: Critically, the sparse union allows shared columns to be reused 
between union members
+   in the ubiquitous union-of-structs with non-overlapping-fields use case.  
For example the union::
+
+       SparseUnion<m1: Struct<i: Int32>,
+                   m2: Struct<i: Int32, f: Float32, s: VarBinary>,
+                   m3: Struct<f: Float32, s: VarBinary>>
+
+   could be backed by just three columns (one for each type) since no union 
member requires more
+   than one of each.

Review comment:
       I'll remove it. FTR this note has been there since ARROW-3 and the first 
writing of the sparse/dense union format spec. @wesm any comment?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to