paul-rogers commented on a change in pull request #1383: DRILL-6613: Refactor
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r204166252
##########
File path:
exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java
##########
@@ -49,39 +54,79 @@ private MaterializedField(String name, MajorType type,
LinkedHashSet<Materialize
this.children = children;
}
+ private MaterializedField(String name, MajorType type, int size) {
+ this(name, type, new LinkedHashSet<>(size));
+ }
+
+ private <T> void copyFrom(Collection<T> source, Function<T,
MaterializedField> transformation) {
+ Preconditions.checkState(children.isEmpty());
+ source.forEach(child -> children.add(transformation.apply(child)));
+ }
+
+ public static MaterializedField create(String name, MajorType type) {
+ return new MaterializedField(name, type, 0);
+ }
+
public static MaterializedField create(SerializedField serField) {
- LinkedHashSet<MaterializedField> children = new LinkedHashSet<>();
- for (SerializedField sf : serField.getChildList()) {
- children.add(MaterializedField.create(sf));
+ MaterializedField field = new
MaterializedField(serField.getNamePart().getName(), serField.getMajorType(),
serField.getChildCount());
+ if (OFFSETS_FIELD.equals(field)) {
+ return OFFSETS_FIELD;
}
- return new MaterializedField(serField.getNamePart().getName(),
serField.getMajorType(), children);
+ field.copyFrom(serField.getChildList(), MaterializedField::create);
+ return field;
}
- /**
- * Create and return a serialized field based on the current state.
- */
- public SerializedField getSerializedField() {
- SerializedField.Builder serializedFieldBuilder = getAsBuilder();
- for(MaterializedField childMaterializedField : getChildren()) {
-
serializedFieldBuilder.addChild(childMaterializedField.getSerializedField());
+ public MaterializedField copy() {
+ return copy(getName(), getType());
+ }
+
+ public MaterializedField copy(MajorType type) {
+ return copy(name, type);
+ }
+
+ public MaterializedField copy(String name) {
+ return copy(name, getType());
+ }
+
+ public MaterializedField copy(String name, final MajorType type) {
+ if (this == OFFSETS_FIELD) {
+ return this;
}
- return serializedFieldBuilder.build();
+ MaterializedField field = new MaterializedField(name, type,
getChildren().size());
+ field.copyFrom(getChildren(), MaterializedField::copy);
Review comment:
My point is not how things are implemented. The point is: we almost never
want to copy children when copying a `MaterializedField`. Why? Because the
consumer of that copy may be code that creates a new vector. When it does, it
will add child vectors (copies of the source vectors), and that action will add
new child fields.
This is why I said that "copy" is misleading: we need to understand the
context in which we make the copy, and possibly name the method accordingly:
"copyForNewVector", "fullCopy", "copyIfNeeded", and so on. All of these should
have comments that explain the use case that they serve.
Sorry that this is so complex; it is just the way the code has evolved. As I
recently found, it is quite hard to change entrenched code behavior, even when
it is not quite right.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services