[ 
https://issues.apache.org/jira/browse/IMPALA-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768943#comment-16768943
 ] 

ASF subversion and git services commented on IMPALA-6964:
---------------------------------------------------------

Commit f0a47ab2ca6e5c19f74c55af67927f446785f23c in impala's branch 
refs/heads/master from Thomas Tauber-Marshall
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f0a47ab ]

IMPALA-8199: Fix stress test: 'No module named RuntimeProfile.ttypes'

A recent commit (IMPALA-6964) broke the stress test because it added
an import of a generated thrift value to a python file that is
included by the stress test. The stress test is intended to be able to
be run without doing a full build of Impala, but in this case the
generated thrift isn't available, leading to an import error.

The solution is to only import the thrift value in the function where
it is used, which is not called by the stress test.

Testing:
- Ran the stress test manually without doing a full build and
  confirmed that it works now.

Change-Id: I7a3bd26d743ef6603fabf92f904feb4677001da5
Reviewed-on: http://gerrit.cloudera.org:8080/12472
Reviewed-by: Thomas Marshall <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Track stats about column and page sizes in Parquet reader
> ---------------------------------------------------------
>
>                 Key: IMPALA-6964
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6964
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Sahil Takiar
>            Priority: Major
>              Labels: observability, parquet, ramp-up
>             Fix For: Impala 3.2.0
>
>
> It would be good to have stats for scanned parquet data about page sizes. We 
> currently can't tell much about the "shape" of the parquet pages from the 
> profile. Some questions that are interesting:
> * How big is each column? I.e. total compressed and decompressed size read.
> * How big are pages on average? Either compressed or decompressed size
> * What is the compression ratio for pages? Could be inferred from the above 
> two.
> I think storing all the stats in the profile per-column would be too much 
> data, but we could probably infer most useful things from higher-level 
> aggregates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to