[ https://issues.apache.org/jira/browse/IMPALA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar resolved IMPALA-1618. ---------------------------------- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala server should always try to fulfill requested fetch size > --------------------------------------------------------------- > > Key: IMPALA-1618 > URL: https://issues.apache.org/jira/browse/IMPALA-1618 > Project: IMPALA > Issue Type: Sub-task > Components: Backend > Affects Versions: Impala 2.0.1 > Reporter: casey > Priority: Minor > Labels: usability > Fix For: Impala 3.4.0 > > > The thrift fetch request specifies the number of rows that it would like but > the Impala server may return fewer even though more results are available. > For example, using the default row_batch size of 1024, if the client requests > 1023 rows, the first response contains 1023 rows but the second response > contains only 1 row. This is because the server internally uses row_batch > (1024), returns the requested count (1023) and caches the remaining row, then > the next time around only uses the cache. > In general the end user should set both the row batch size and the thrift > request size. In practice the query writer setting row_batch and the > driver/programmer setting fetch size may often be different people. > There is one case that works fine now though - setting the batch size to less > than the thrift req size. In this case the thrift response is always the same > as batch size. > Code example: > {noformat} > dev@localhost:~/impyla$ git diff > diff --git a/impala/_rpc/hiveserver2.py b/impala/_rpc/hiveserver2.py > index 6139002..31fdab7 100644 > --- a/impala/_rpc/hiveserver2.py > +++ b/impala/_rpc/hiveserver2.py > @@ -265,6 +265,7 @@ def fetch_results(service, operation_handle, > hs2_protocol_version, schema=None, > req = TFetchResultsReq(operationHandle=operation_handle, > orientation=orientation, > maxRows=max_rows) > + print("req: " + str(max_rows)) > resp = service.FetchResults(req) > err_if_rpc_not_ok(resp) > > @@ -273,6 +274,7 @@ def fetch_results(service, operation_handle, > hs2_protocol_version, schema=None, > for (i, col) in enumerate(resp.results.columns)] > num_cols = len(tcols) > num_rows = len(tcols[0].values) > + print("rec: " + str(num_rows)) > rows = [] > for i in xrange(num_rows): > row = [] > dev@localhost:~/impyla$ cat test.py > from impala.dbapi import connect > conn = connect() > cur = conn.cursor() > cur.set_arraysize(1024) > cur.execute("set batch_size=1025") > cur.execute("select * from tpch.lineitem") > while True: > rows = cur.fetchmany() > if not rows: > break > cur.close() > conn.close() > dev@localhost:~/impyla$ python test.py | head > Failed to import pandas > req: 1024 > rec: 1024 > req: 1024 > rec: 1 > req: 1024 > rec: 1024 > req: 1024 > rec: 1 > req: 1024 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)