paul-rogers commented on pull request #2419: URL: https://github.com/apache/drill/pull/2419#issuecomment-1008495754
@luocooong, the 10 highlights of the latest version? There actually are not that many. If we talk just about this PR, the key bits are: * Simple integration with the Easy Format plugin. * Integrated limit-push-down support. If we talk about EVF V2 compared with EVF V1: * Simpler developer API. * Full support for combining provided, table, and discovered schemas. (Provided is that given by the planner. Table is that which a reader can infer at open time, as for Parquet, CSV, JDBC, etc. Discovered is that which is found as the data is read, as in JDBC.) * Full schema reconciliation support for all data types, including the complex ones such as nested maps and arrays. And if we talk about EVF itself vs. "classic" roll-your-own vector code: * Simple API to write to vectors. * Control batch and individual vector sizes. * Foundation for the `RowSet` family of testing tools. * Highlighted the limitations of "schema on read" and "schema evolution." * Extensible support for type conversion from "native" reader types to Drill types, and between Drill types. Finally, if we talk about the Grand Plan for World Domination, the column accessor and row set mechanisms allow: * Single interface to all data in Drill, allowing us to eventually consider evolving our storage layer. Now, why do we need all this? Partly because working with value vectors directly is tedious and error prone: it's like using assembly language. Partly because Drill, for better or worse, went crazy with the complex data types it supports: maps, arrays, arrays of maps that contain more arrays of more maps... That is, Drill has full JSON support. Getting that right, three levels down in nesting, when working directly with value vectors is near impossible. We've had years of bugs because it is so complex. The EVF and related mechanisms are intended to bring some sanity to the complex types. If we can't convince ourselves to get rid of them, then we have to make them actually work. That's all behind the scenes. Most community contributions these days occurs in storage and format plugins. For that, EVF just makes the developer's job easier by handling all the common boilerplate code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
