[jira] [Comment Edited] (DRILL-4264) Dots in identifier are not escaped correctly

Paul Rogers (JIRA) Wed, 26 Jul 2017 18:10:51 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102548#comment-16102548
 ]


Paul Rogers edited comment on DRILL-4264 at 7/27/17 1:09 AM:
-------------------------------------------------------------

Wonderful detailed analysis! You caught many detailed issues that my quick scan 
missed.

The solution for Parquet metadata seems good. I'm not an expert in that area, 
but a few unit tests will validate the change once you make it. Bumping the 
version number will solve the forward/backward compatibility issues (using the 
mechanism from DRILL-5660.)

The {{MaterializedField}} issue is harder. Fortunately, some of the nested-name 
issues might not be actual issues.

For example, your example of 
[ScanBatch.Mutator:362|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java#L225]
 should be OK as long as the caller knows call this method for top-level 
columns. This line is used to build up a record batch during reading such as in 
JSON or Parquet. The problem is if the container is a map. In this case, the 
caller should be calling {{AbstractMapVector.addOrGet()}} to add the field 
rather than adding it at the top level using the {{Mutator}}.

Are there other cases where the code assembles a path then tears it down again? 
Or, parses a path?

Otherwise, we can find all uses of {{MaterializedField.getPath()}}, verify that 
the really only use the leaf name, and replace them with {{getName()}}. The 
same is true of {{getLastName()}}.


was (Author: paul-rogers):
Wonderful detailed analysis! You caught many detailed issues that my quick scan 
missed.

The solution for Parquet metadata seems good. I'm not an expert in that area, 
but a few unit tests will validate the change once you make it. Bumping the 
version number will solve the forward/backward compatibility issues (using the 
mechanism from DRILL-5660.)

The {{MaterializedField}} issue is harder. Fortunately, some of the nested-name 
issues might not be actual issues.

For example, your example of 
[ScanBatch.Mutator:362|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java#L225]
 should be OK as long as the caller knows to pass in only the leaf name. This 
line is used to build up a record batch during reading such as in JSON or 
Parquet. The problem is if the container is a map. In this case, the caller 
should be calling {{AbstractMapVector.addOrGet()}} to add the field rather than 
adding it at the top level using the {{Mutator}}.

Are there other cases where the code assembles a path then tears it down again? 
Or, parses a path?

Otherwise, we can find all uses of {{MaterializedField.getPath()}}, verify that 
the really only use the leaf name, and replace them with {{getName()}}. The 
same is true of {{getLastName()}}.

> Dots in identifier are not escaped correctly
> --------------------------------------------
>
>                 Key: DRILL-4264
>                 URL: https://issues.apache.org/jira/browse/DRILL-4264
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Codegen
>            Reporter: Alex
>            Assignee: Volodymyr Vysotskyi
>              Labels: doc-impacting
>
> If you have some json data like this...
> {code:javascript}
>     {
>       "0.0.1":{
>         "version":"0.0.1",
>         "date_created":"2014-03-15"
>       },
>       "0.1.2":{
>         "version":"0.1.2",
>         "date_created":"2014-05-21"
>       }
>     }
> {code}
> ... there is no way to select any of the rows since their identifiers contain 
> dots and when trying to select them, Drill throws the following error:
> Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference 
> "0.0.1"; a field reference identifier must not have the form of a qualified 
> name
> This must be fixed since there are many json data files containing dots in 
> some of the keys (e.g. when specifying version numbers etc)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-4264) Dots in identifier are not escaped correctly

Reply via email to