[GitHub] [drill] paul-rogers commented on pull request #2419: DRILL-8085: EVF V2 support in the "Easy" format plugin

GitBox Sun, 09 Jan 2022 18:28:43 -0800


paul-rogers commented on pull request #2419:
URL: https://github.com/apache/drill/pull/2419#issuecomment-1008495754



   @luocooong, the 10 highlights of the latest version? There actually are not 
that many. If we talk just about this PR, the key bits are:
   
   * Simple integration with the Easy Format plugin.
   * Integrated limit-push-down support.
   
   If we talk about EVF V2 compared with EVF V1:
   
   * Simpler developer API.
   * Full support for combining provided, table, and discovered schemas. 
(Provided is that given by the planner. Table is that which a reader can infer 
at open time, as for Parquet, CSV, JDBC, etc. Discovered is that which is found 
as the data is read, as in JDBC.)
   * Full schema reconciliation support for all data types, including the 
complex ones such as nested maps and arrays.
   
   And if we talk about EVF itself vs. "classic" roll-your-own vector code:
   
   * Simple API to write to vectors.
   * Control batch and individual vector sizes.
   * Foundation for the `RowSet` family of testing tools.
   * Highlighted the limitations of "schema on read" and "schema evolution."
   * Extensible support for type conversion from "native" reader types to Drill 
types, and between Drill types.
   
   Finally, if we talk about the Grand Plan for World Domination, the column 
accessor and row set mechanisms allow:
   
   * Single interface to all data in Drill, allowing us to eventually consider 
evolving our storage layer.
   
   Now, why do we need all this? Partly because working with value vectors 
directly is tedious and error prone: it's like using assembly language. Partly 
because Drill, for better or worse, went crazy with the complex data types it 
supports: maps, arrays, arrays of maps that contain more arrays of more maps... 
That is, Drill has full JSON support. Getting that right, three levels down in 
nesting, when working directly with value vectors is near impossible. We've had 
years of bugs because it is so complex. The EVF and related mechanisms are 
intended to bring some sanity to the complex types. If we can't convince 
ourselves to get rid of them, then we have to make them actually work.
   
   That's all behind the scenes. Most community contributions these days occurs 
in storage and format plugins. For that, EVF just makes the developer's job 
easier by handling all the common boilerplate code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] paul-rogers commented on pull request #2419: DRILL-8085: EVF V2 support in the "Easy" format plugin

Reply via email to