I’m trying out the Kudu Storage with Kudu 1.4 without success, always getting a
kudu error -> “Invalid scan start key: Error decoding composite key component
‘my-key': key too short: <redacted>”
I know this storage is experimental, but I’m hoping to get it to work. Looking
at the code I noticed it’s based on a deprecated way of locating Kudu Tablets
for scans, doing a general location mapping of Tablet Servers, I suppose to
better colocate Drillbit scans. Instead of this, Kudu now recommends using the
KuduScanToken api:
" A scan token describes a partial scan of a Kudu table limited to a single
contiguous physical location. Using the {@link KuduScanTokenBuilder}, clients
can
describe the desired scan, including predicates, bounds, timestamps, and
caching, and receive back a collection of scan tokens.
Each scan token may be separately turned into a scanner using
{@link #intoScanner}, with each scanner responsible for a disjoint section
of the table.
Scan tokens may be serialized using the {@link #serialize} method and
deserialized back into a scanner using the {@link #deserializeIntoScanner}
method. This allows use cases such as generating scan tokens in the planner
component of a query engine, then sending the tokens to execution nodes based
on locality, and then instantiating the scanners on those nodes.
Scan token locality information can be inspected using the {@link #getTablet}
method.”
I’m new to Drill, but it seemed to me that this api could be retro-fitted to
good effect into KuduGroupScan, but I didn’t get very far given that I couldn’t
even suss out where the predicates were in the drill code. Unless I completely
misunderstand the concept, it seems the Kudu Storage must be pushing the
predicate to lower code levels and is therefore not exposed to them. If you
read the above, my hope is that there is a way to serialize the Kudu ScanTokens
to Drillbits to be used as scanners. Does anyone know if this is possible using
the Drill execution path? If so can someone please point me to
documentation/examples/tests I can consult to help clarify my muddled
understanding of Drill?
-Cliff