[ 
https://issues.apache.org/jira/browse/HIVE-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904009#comment-13904009
 ] 

Remus Rusanu commented on HIVE-6449:
------------------------------------

[~prasanth_j] thanks for the guidance. Since the difference reproes on ORC 
files, I focused on them now to eliminate any Parquet related problem. For my 
test ORC file, created as 
{code}
CREATE TABLE decimal_mapjoin STORED AS ORC AS 
  SELECT cdouble, CAST (((cdouble*22.1)/37) AS DECIMAL(20,10)) AS cdecimal1, 
  CAST (((cdouble*9.3)/13) AS DECIMAL(23,14)) AS cdecimal2,
  cint
  FROM alltypesorc;
{code}
I get the following stats in describe extended:
{code}
describe extended decimal_mapjoin;
...
Windows: {numFiles=1, COLUMN_STATS_ACCURATE=true, 
transient_lastDdlTime=1392727196, numRows=0, totalSize=126087, rawDataSize=0}
Linux:       {numFiles=1, transient_lastDdlTime=1392722507, 
COLUMN_STATS_ACCURATE=true, totalSize=126087, numRows=12288, 
rawDataSize=2165060} ...
{code}
So the problem is that neither ROW_COUNT nor RAW_DATA_SIZE are initialized 
properly. I'm investigating.

> EXPLAIN has diffs in Statistics in tests generated on Windows vs. test 
> generated on Linux
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-6449
>                 URL: https://issues.apache.org/jira/browse/HIVE-6449
>             Project: Hive
>          Issue Type: Bug
>          Components: Tests
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>            Priority: Critical
>
> When .q.out files are generated on Windows the statistics in EXPLAIN differ 
> from ones generated on Linux. Eg:
> {code}
> Running: diff -a 
> /root/hive/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/vectorized_parquet.q.out
>  
> /root/hive/itests/qtest/../../ql/src/test/results/clientpositive/vectorized_parquet.q.out
> 72c72
> <             Statistics: Num rows: 12288 Data size: 73728 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >             Statistics: Num rows: 2072 Data size: 257046 Basic stats: 
> > COMPLETE Column stats: NONE
> 75c75
> <               Statistics: Num rows: 6144 Data size: 36864 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >               Statistics: Num rows: 1036 Data size: 128523 Basic stats: 
> > COMPLETE Column stats: NONE
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to