[
https://issues.apache.org/jira/browse/ARROW-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254801#comment-16254801
]
ASF GitHub Bot commented on ARROW-590:
--------------------------------------
jacques-n commented on a change in pull request #987: ARROW-590: Add
integration tests for Union types
URL: https://github.com/apache/arrow/pull/987#discussion_r151325934
##########
File path: java/vector/src/main/codegen/templates/UnionVector.java
##########
@@ -269,12 +373,14 @@ public void clear() {
public Field getField() {
List<org.apache.arrow.vector.types.pojo.Field> childFields = new
ArrayList<>();
List<FieldVector> children = internalMap.getChildren();
- int[] typeIds = new int[children.size()];
for (ValueVector v : children) {
- typeIds[childFields.size()] = v.getMinorType().ordinal();
Review comment:
Yes, my perspective on arrow types in the java layer has been that one
should collapse to leaf nodes (in general, I've always been against the more
complex arbritrary approach the spec allows). In the situation we usually see,
it doesn't make sense to ever have duplicate types within a union and we wanted
to have a simplified path for this scenario.
For example:
v1_record = {
date
record_id
measure1
}
v2_record = {
date
record_id
measure2
}
I would suggest using this representation:
record = {
date
record_id
measure1
measure2
}
This typically works well because initial union values usually come from
separate stream (e.g. files) and processing is cleaner if we don't have
indirect for common fields (a frequent case).
I'll take a look at the changes. I want to understand if you're introducing
indirection or not.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add integration tests for Union types
> -------------------------------------
>
> Key: ARROW-590
> URL: https://issues.apache.org/jira/browse/ARROW-590
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, Java - Vectors
> Reporter: Wes McKinney
> Assignee: Li Jin
> Labels: pull-request-available
> Fix For: 0.9.0
>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)