[jira] [Commented] (DRILL-4264) Allow field names to include dots

ASF GitHub Bot (JIRA) Wed, 23 Aug 2017 10:30:27 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138704#comment-16138704
 ]


ASF GitHub Bot commented on DRILL-4264:
---------------------------------------

Github user vvysotskyi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/909#discussion_r134686740
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
    @@ -359,30 +361,109 @@ public Mutator(OperatorExecContext oContext, 
BufferAllocator allocator, VectorCo
         public <T extends ValueVector> T addField(MaterializedField field,
                                                   Class<T> clazz) throws 
SchemaChangeException {
           // Check if the field exists.
    -      ValueVector v = fieldVectorMap.get(field.getPath());
    -      if (v == null || v.getClass() != clazz) {
    +      ValueVector vector = fieldVectorMap.get(field.getName());
    +      ValueVector childVector = vector;
    +      // if a vector does not exist yet, creates value vector, or if it 
exists and has map type, omit this code
    +      if (vector == null || (vector.getClass() != clazz
    +            && (vector.getField().getType().getMinorType() != MinorType.MAP
    +                || field.getType().getMinorType() != MinorType.MAP))) {
             // Field does not exist--add it to the map and the output 
container.
    -        v = TypeHelper.getNewVector(field, allocator, callBack);
    -        if (!clazz.isAssignableFrom(v.getClass())) {
    +        vector = TypeHelper.getNewVector(field, allocator, callBack);
    +        childVector = vector;
    +        // gets inner field if the map was created the first time
    +        if (field.getType().getMinorType() == MinorType.MAP) {
    +          childVector = getChildVectorByField(vector, field);
    +        } else if (!clazz.isAssignableFrom(vector.getClass())) {
               throw new SchemaChangeException(
                   String.format(
                       "The class that was provided, %s, does not correspond to 
the "
                       + "expected vector type of %s.",
    -                  clazz.getSimpleName(), v.getClass().getSimpleName()));
    +                  clazz.getSimpleName(), 
vector.getClass().getSimpleName()));
             }
     
    -        final ValueVector old = fieldVectorMap.put(field.getPath(), v);
    +        final ValueVector old = fieldVectorMap.put(field.getName(), 
vector);
             if (old != null) {
               old.clear();
               container.remove(old);
             }
     
    -        container.add(v);
    +        container.add(vector);
             // Added new vectors to the container--mark that the schema has 
changed.
             schemaChanged = true;
           }
    +      // otherwise, checks that field and existing vector have a map type
    +      // and adds child fields from the field to the vector
    +      else if (field.getType().getMinorType() == MinorType.MAP
    +                  && vector.getField().getType().getMinorType() == 
MinorType.MAP
    +                  && !field.getChildren().isEmpty()) {
    +        // an incoming field contains only single child since it determines
    +        // full name path of the field in the schema
    +        childVector = addNestedChildToMap((MapVector) vector, 
Iterables.getLast(field.getChildren()));
    +        schemaChanged = true;
    +      }
     
    -      return clazz.cast(v);
    +      return clazz.cast(childVector);
    +    }
    +
    +    /**
    +     * Finds and returns value vector which path corresponds to the 
specified field.
    +     * If required vector is nested in the map, gets and returns this 
vector from the map.
    +     *
    +     * @param valueVector vector that should be checked
    +     * @param field       field that corresponds to required vector
    +     * @return value vector whose path corresponds to the specified field
    +     *
    +     * @throws SchemaChangeException if the field does not correspond to 
the found vector
    +     */
    +    private ValueVector getChildVectorByField(ValueVector valueVector,
    +                                              MaterializedField field) 
throws SchemaChangeException {
    +      if (field.getChildren().isEmpty()) {
    +        if (valueVector.getField().equals(field)) {
    +          return valueVector;
    +        } else {
    +          throw new SchemaChangeException(
    +            String.format(
    +              "The field that was provided, %s, does not correspond to the 
"
    +                + "expected vector type of %s.",
    +              field, valueVector.getClass().getSimpleName()));
    +        }
    +      } else {
    +        // an incoming field contains only single child since it determines
    +        // full name path of the field in the schema
    +        MaterializedField childField = 
Iterables.getLast(field.getChildren());
    +        return getChildVectorByField(((MapVector) 
valueVector).getChild(childField.getName()), childField);
    +      }
    +    }
    +
    +    /**
    +     * Adds new vector with the specified field to the specified map 
vector.
    +     * If a field has few levels of nesting, corresponding maps will be 
created.
    +     *
    +     * @param valueVector a map where new vector will be added
    +     * @param field       a field that corresponds to the vector which 
should be created
    +     * @return vector that was created
    +     *
    +     * @throws SchemaChangeException if the field has a child and its type 
is not map
    +     */
    +    private ValueVector addNestedChildToMap(MapVector valueVector,
    --- End diff --
    
    This method was designed to handle the case when we have a map inside other 
map and we want to add a field into the nested map. But considering your second 
comment, this method also is not needed. 


> Allow field names to include dots
> ---------------------------------
>
>                 Key: DRILL-4264
>                 URL: https://issues.apache.org/jira/browse/DRILL-4264
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Codegen
>            Reporter: Alex
>            Assignee: Volodymyr Vysotskyi
>              Labels: doc-impacting
>             Fix For: 1.12.0
>
>
> If you have some json data like this...
> {code:javascript}
>     {
>       "0.0.1":{
>         "version":"0.0.1",
>         "date_created":"2014-03-15"
>       },
>       "0.1.2":{
>         "version":"0.1.2",
>         "date_created":"2014-05-21"
>       }
>     }
> {code}
> ... there is no way to select any of the rows since their identifiers contain 
> dots and when trying to select them, Drill throws the following error:
> Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference 
> "0.0.1"; a field reference identifier must not have the form of a qualified 
> name
> This must be fixed since there are many json data files containing dots in 
> some of the keys (e.g. when specifying version numbers etc)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4264) Allow field names to include dots

Reply via email to