[
https://issues.apache.org/jira/browse/DRILL-8085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471642#comment-17471642
]
ASF GitHub Bot commented on DRILL-8085:
---------------------------------------
paul-rogers commented on pull request #2419:
URL: https://github.com/apache/drill/pull/2419#issuecomment-1008495754
@luocooong, the 10 highlights of the latest version? There actually are not
that many. If we talk just about this PR, the key bits are:
* Simple integration with the Easy Format plugin.
* Integrated limit-push-down support.
If we talk about EVF V2 compared with EVF V1:
* Simpler developer API.
* Full support for combining provided, table, and discovered schemas.
(Provided is that given by the planner. Table is that which a reader can infer
at open time, as for Parquet, CSV, JDBC, etc. Discovered is that which is found
as the data is read, as in JDBC.)
* Full schema reconciliation support for all data types, including the
complex ones such as nested maps and arrays.
And if we talk about EVF itself vs. "classic" roll-your-own vector code:
* Simple API to write to vectors.
* Control batch and individual vector sizes.
* Foundation for the `RowSet` family of testing tools.
* Highlighted the limitations of "schema on read" and "schema evolution."
* Extensible support for type conversion from "native" reader types to Drill
types, and between Drill types.
Finally, if we talk about the Grand Plan for World Domination, the column
accessor and row set mechanisms allow:
* Single interface to all data in Drill, allowing us to eventually consider
evolving our storage layer.
Now, why do we need all this? Partly because working with value vectors
directly is tedious and error prone: it's like using assembly language. Partly
because Drill, for better or worse, went crazy with the complex data types it
supports: maps, arrays, arrays of maps that contain more arrays of more maps...
That is, Drill has full JSON support. Getting that right, three levels down in
nesting, when working directly with value vectors is near impossible. We've had
years of bugs because it is so complex. The EVF and related mechanisms are
intended to bring some sanity to the complex types. If we can't convince
ourselves to get rid of them, then we have to make them actually work.
That's all behind the scenes. Most community contributions these days occurs
in storage and format plugins. For that, EVF just makes the developer's job
easier by handling all the common boilerplate code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> EVF V2 support in the "Easy" format plugin
> ------------------------------------------
>
> Key: DRILL-8085
> URL: https://issues.apache.org/jira/browse/DRILL-8085
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.19.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
>
> Add support for EVF V2 to the {{EasyFormatPlugin}} similar to how EVF V1
> support already exists. Provide examples for others to follow.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)