[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator

ASF GitHub Bot (JIRA) Fri, 20 May 2016 13:18:31 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294108#comment-15294108
 ]


ASF GitHub Bot commented on DRILL-4679:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/504#discussion_r64101122
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
 ---
    @@ -136,6 +145,10 @@ public VectorContainer getOutgoingContainer() {
     
       @Override
       protected IterOutcome doWork() {
    +    if (wasNone) {
    +      return IterOutcome.NONE;
    +    }
    +
         int incomingRecordCount = incoming.getRecordCount();
     
         if (first && incomingRecordCount == 0) {
    --- End diff --
    
    Actually, if the first batch was non-empty, the new changes wouldn't apply 
because of the following check: 
          if (first && incomingRecordCount == 0) { ... }
    Then if the next incoming  batch is empty, it should continue to work since 
we have already produced the schema from the first batch.  On the other hand if 
the first batch is empty and we see a NONE iterator outcome, we want to make 
sure that a schema is produced but at the same time not call next() since a 
NONE outcome has already been seen. 


> CONVERT_FROM()  json format fails if 0 rows are received from upstream 
> operator
> -------------------------------------------------------------------------------
>
>                 Key: DRILL-4679
>                 URL: https://issues.apache.org/jira/browse/DRILL-4679
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.6.0
>            Reporter: Aman Sinha
>            Assignee: Jinfeng Ni
>
> CONVERT_FROM() json format fails as below if the underlying Filter produces 0 
> rows: 
> {noformat}
> 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x 
> from cp.`tpch/region.parquet` where r_regionkey = 9999;
> Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without 
> first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch]
> Fragment 0:0
> {noformat}
> If the conversion is applied as UTF8 format,  the same query succeeds: 
> {noformat}
> 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x 
> from cp.`tpch/region.parquet` where r_regionkey = 9999;
> +----+
> | x  |
> +----+
> +----+
> No rows selected (0.241 seconds)
> {noformat}
> The reason for this is the special handling in the ProjectRecordBatch for 
> JSON.  The output schema is not known for this until the run time and the 
> ComplexWriter in the Project relies on seeing the input data to determine the 
> output schema - this could be a MapVector or ListVector etc.  
> If the input data has 0 rows due to a filter condition, we should at least 
> produce a default output schema, e.g an empty MapVector ?  Need to decide a 
> good default.  Note that the CONVERT_FROM(x, 'json') could occur on 2 
> branches of a UNION-ALL and if one input is empty while the other side is 
> not, it may still cause incompatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator

Reply via email to