[ https://issues.apache.org/jira/browse/DRILL-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Phillips updated DRILL-1781: ----------------------------------- Attachment: DRILL-1781.patch > For complex functions, don't return until schema is known > --------------------------------------------------------- > > Key: DRILL-1781 > URL: https://issues.apache.org/jira/browse/DRILL-1781 > Project: Apache Drill > Issue Type: Bug > Reporter: Steven Phillips > Priority: Blocker > Fix For: 0.7.0 > > Attachments: DRILL-1781.patch, DRILL-1781.patch > > > In the case of complex output functions, it is impossible to determine the > output schema until the actual data is consumed. For example, with > convert_form(VARCHAR, 'json'), unlike most other functions, it is not > sufficient to know that the incoming data type is VARCHAR, we actually need > to decode the contents of the record before we can determine what the output > type is, whether it be map, list, or primitive type. > For fast schema return, we worked around this problem by simply assuming the > type was Map, and if it happened to be different, there would be a schema > change. This solution is not satisfactory, as it ends up breaking other > functions, like flatten. > The solution is to continue returning a schema whenever possible, but when it > is not possible, drill will wait until it is. > For non-blocking operators, drill will immediately consume the incoming > batch, and thus will not return empty schema batches if there is data to > consume. Blocking operators will return an empty schema batch. If a flattten > function occurs downstream from a blocking operator, it will not be able to > return a schema, and thus fast schema return will not happen in this case. > In the cases where the complex function is not downstream from a blocking > operator, fast schema return should continue to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)