Drill Hangout 2015-10-06 Attendees: Aman, Andries, Daniel, Kris, Charlie, Julien, Jacques, Jason, Jinfeng, Matt, Parth, Sudheesh, Venki
1. Matt hitting issues with Information Schema queries against Hive. Will connect with Venki on Slack to resolve. 2. Julien reported that he's working on speeding up building and running tests, noting that build-time code generation runs twice and local Drillbits for testing take 3 second to shut down. 3. Parth mentioned an off-by-one bug in Parquet reading and that he will add more Parquet reading tests as part of the fix. 4. Aman reported a regression in performance while trying metadata caching with 400K files. This is being investigated. 5. Daniel, Jacques, and Sudeesh discussed issues underlying DRILL-2288, such as the ScanBatch.next() return value (IterOutcome) contract, handling empty JSON files, handling zero-row sources that still have schemas, how to limit the DRILL-2288 fix to avoid needing to rework lots of downstream code, etc. 6. Sudheesh had various updates - Limit 0 and Limit 1 queries. Jacques suggestion to handle Limit 0 queries on schema aware systems to the planning phase. Perf tests on the RPC processing offloading seem to show higher memory consumption. This may simply be due to allowing more concurrent queries as result of the patch. Perf tests reveal issues on local data tunnel changes but these may be existing problems that are now showing up as a result of faster local data processing. Question to be resolved - should we merge these anyway? 7. Jason helping address some recent issues with flatten involving large number of repeated values. 8. We unanimously volunteered Sudheesh to work on the performance cluster. On Tue, Oct 6, 2015 at 10:06 AM, Parth Chandra <[email protected]> wrote: > > > Join us here: >> https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc >> > >
