How to proceed with IMPALA-4086 (Benchmark for SimpleScheduler)

Lars Volker Fri, 11 Nov 2016 04:51:02 -0800

Hi all,

Here is a change <https://gerrit.cloudera.org/4554> that implements a
benchmark for SimpleScheduler::ComputeScanRangeAssigment() to address
IMPALA-4086 <https://issues.cloudera.org/browse/IMPALA-4086>.


I would like to discuss whether it is possible to run the benchmark against
the Schedule() method instead. This would require changes to the scheduler
test utility classes in simple-scheduler-test-util.h to create a
TQueryExecRequest message suitable for calling Schedule().

Currently we compute these fields before calling
ComputeScanRangeAssignment(), which are basically what is contained in a
single plan node.

BackendConfig
> vector<TScanRangeLocations>
> vector<TNetworkAddress>
> TQueryOptions


To build a schedule object we need to build a TQueryExecRequest, which has
14 fields. The complex ones are:

optional Descriptors.TDescriptorTable desc_tbl
> optional list<Planner.TPlanFragment> fragments
> optional list<i32> dest_fragment_idx
> optional map<Types.TPlanNodeId, list<Planner.TScanRangeLocations>>
> per_node_scan_ranges
> optional list<TPlanExecInfo> mt_plan_exec_info
> optional Results.TResultSetMetadata result_set_metadata
> optional TFinalizeParams finalize_params
> required ImpalaInternalService.TQueryCtx query_ctx
> optional string query_plan
> required list<Types.TNetworkAddress> host_list
> optional LineageGraph.TLineageGraph lineage_graph


Some of these members have other dependencies, for example the fragments
have the plan inside, which has all plan nodes:

TQueryExecRequest:
>  list<Planner.TPlanFragment> fragments
>   partition.type
>   plan.nodes[node_id]
>    node_id (for dcheck)
>    node.hdfs_scan_node (can be unset)
>   idx (for sorting in query-schedule)
>  TQueryCtx query_ctx (only for query options, which we already have)


I think it makes sense to benchmark ComputeScanRangeAssignment() in
isolation, since its implementation is reasonably complex, i.e. not just
linear in the input size. In order to benchmark Schedule(), we should first
consider writing proper unit tests for the SimpleScheduler and extend the
test utility code where necessary to do so.

I curious for any feedback. Thanks, Lars

How to proceed with IMPALA-4086 (Benchmark for SimpleScheduler)

Reply via email to