[
https://issues.apache.org/jira/browse/HIVE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt McCline updated HIVE-16919:
--------------------------------
Description:
Jason spotted a difference in the query result for
vectorization_short_regress.q.out -- that is when vectorization is turned off
and a base .q.out file created, there are 2 differences.
They both seem to be related to negation. For example, in the first one
MAX(cint) and MAX(cint) appear earlier as columns and match non-vec and vec.
So, it doesn't appear that aggregation is failing. It seems like the issue is
now that the Reducer is vectorizing, a bug is exposed. So, even though MAX and
MIN are the same, the expression with negation returns different results.
19th field of the query below: Vectorized 511 vs Non-Vectorized -58
{noformat}
SELECT MAX(cint),
(MAX(cint) / -3728),
(MAX(cint) * -3728),
VAR_POP(cbigint),
(-((MAX(cint) * -3728))),
STDDEV_POP(csmallint),
(-563 % (MAX(cint) * -3728)),
(VAR_POP(cbigint) / STDDEV_POP(csmallint)),
(-(STDDEV_POP(csmallint))),
MAX(cdouble),
AVG(ctinyint),
(STDDEV_POP(csmallint) - 10.175),
MIN(cint),
((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
(-(MAX(cdouble))),
MIN(cdouble),
(MAX(cdouble) % -26.28),
STDDEV_SAMP(csmallint),
(-((MAX(cint) / -3728))),
((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
((MAX(cint) / -3728) - AVG(ctinyint)),
(-((MAX(cint) * -3728))),
VAR_SAMP(cint)
FROM alltypesorc
WHERE (((cbigint <= 197)
AND (cint < cbigint))
OR ((cdouble >= -26.28)
AND (csmallint > cdouble))
OR ((ctinyint > cfloat)
AND (cstring1 RLIKE '.*ss.*'))
OR ((cfloat > 79.553)
AND (cstring2 LIKE '10%')))
{noformat}
Column expression is: ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) *
-3728))),
-----------------------------------------------
This is a previously existing issue and now filed as HIVE-16919:
"Vectorization: vectorization_short_regress.q has query result differences with
non-vectorized run"
10th field of the query below: Non-Vectorized -6432.000015344526 vs.
-Vectorized -6432.0
Column expression is (-(cdouble)) as c4,
Query result for vectorization_short_regress.q.out -- that is when
vectorization is turned off and a base .q.out file created.
-----------------------------------------------
10th field of the query below: Non-Vectorized -6432.000015344526 vs. Vectorized
-6432.0
Column expression is (-(cdouble)) as c4,
{noformat}
SELECT ctimestamp1,
cstring2,
cdouble,
cfloat,
cbigint,
csmallint,
(cbigint / 3569) as c1,
(-257 - csmallint) as c2,
(-6432 * cfloat) as c3,
(-(cdouble)) as c4,
(cdouble * 10.175) as c5,
((-6432 * cfloat) / cfloat) as c6,
(-(cfloat)) as c7,
(cint % csmallint) as c8,
(-(cdouble)) as c9,
(cdouble * (-(cdouble))) as c10
FROM alltypesorc
WHERE (((-1.389 >= cint)
AND ((csmallint < ctinyint)
AND (-6432 > csmallint)))
OR ((cdouble >= cfloat)
AND (cstring2 <= 'a'))
OR ((cstring1 LIKE 'ss%')
AND (10.175 > cbigint)))
{noformat}
was:
Query result for vectorization_short_regress.q.out -- that is when
vectorization is turned off and a base .q.out file created.
-----------------------------------------------
10th field of the query below: Non-Vectorized -6432.000015344526 vs. Vectorized
-6432.0
Column expression is (-(cdouble)) as c4,
{noformat}
SELECT ctimestamp1,
cstring2,
cdouble,
cfloat,
cbigint,
csmallint,
(cbigint / 3569) as c1,
(-257 - csmallint) as c2,
(-6432 * cfloat) as c3,
(-(cdouble)) as c4,
(cdouble * 10.175) as c5,
((-6432 * cfloat) / cfloat) as c6,
(-(cfloat)) as c7,
(cint % csmallint) as c8,
(-(cdouble)) as c9,
(cdouble * (-(cdouble))) as c10
FROM alltypesorc
WHERE (((-1.389 >= cint)
AND ((csmallint < ctinyint)
AND (-6432 > csmallint)))
OR ((cdouble >= cfloat)
AND (cstring2 <= 'a'))
OR ((cstring1 LIKE 'ss%')
AND (10.175 > cbigint)))
{noformat}
> Vectorization: vectorization_short_regress.q has query result differences
> with non-vectorized run. Vectorized unary function broken?
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-16919
> URL: https://issues.apache.org/jira/browse/HIVE-16919
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
>
> Jason spotted a difference in the query result for
> vectorization_short_regress.q.out -- that is when vectorization is turned off
> and a base .q.out file created, there are 2 differences.
> They both seem to be related to negation. For example, in the first one
> MAX(cint) and MAX(cint) appear earlier as columns and match non-vec and vec.
> So, it doesn't appear that aggregation is failing. It seems like the issue
> is now that the Reducer is vectorizing, a bug is exposed. So, even though
> MAX and MIN are the same, the expression with negation returns different
> results.
> 19th field of the query below: Vectorized 511 vs Non-Vectorized -58
> {noformat}
> SELECT MAX(cint),
> (MAX(cint) / -3728),
> (MAX(cint) * -3728),
> VAR_POP(cbigint),
> (-((MAX(cint) * -3728))),
> STDDEV_POP(csmallint),
> (-563 % (MAX(cint) * -3728)),
> (VAR_POP(cbigint) / STDDEV_POP(csmallint)),
> (-(STDDEV_POP(csmallint))),
> MAX(cdouble),
> AVG(ctinyint),
> (STDDEV_POP(csmallint) - 10.175),
> MIN(cint),
> ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
> (-(MAX(cdouble))),
> MIN(cdouble),
> (MAX(cdouble) % -26.28),
> STDDEV_SAMP(csmallint),
> (-((MAX(cint) / -3728))),
> ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
> ((MAX(cint) / -3728) - AVG(ctinyint)),
> (-((MAX(cint) * -3728))),
> VAR_SAMP(cint)
> FROM alltypesorc
> WHERE (((cbigint <= 197)
> AND (cint < cbigint))
> OR ((cdouble >= -26.28)
> AND (csmallint > cdouble))
> OR ((ctinyint > cfloat)
> AND (cstring1 RLIKE '.*ss.*'))
> OR ((cfloat > 79.553)
> AND (cstring2 LIKE '10%')))
> {noformat}
> Column expression is: ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) *
> -3728))),
> -----------------------------------------------
> This is a previously existing issue and now filed as HIVE-16919:
> "Vectorization: vectorization_short_regress.q has query result differences
> with non-vectorized run"
> 10th field of the query below: Non-Vectorized -6432.000015344526 vs.
> -Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> Query result for vectorization_short_regress.q.out -- that is when
> vectorization is turned off and a base .q.out file created.
> -----------------------------------------------
> 10th field of the query below: Non-Vectorized -6432.000015344526 vs.
> Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> {noformat}
> SELECT ctimestamp1,
> cstring2,
> cdouble,
> cfloat,
> cbigint,
> csmallint,
> (cbigint / 3569) as c1,
> (-257 - csmallint) as c2,
> (-6432 * cfloat) as c3,
> (-(cdouble)) as c4,
> (cdouble * 10.175) as c5,
> ((-6432 * cfloat) / cfloat) as c6,
> (-(cfloat)) as c7,
> (cint % csmallint) as c8,
> (-(cdouble)) as c9,
> (cdouble * (-(cdouble))) as c10
> FROM alltypesorc
> WHERE (((-1.389 >= cint)
> AND ((csmallint < ctinyint)
> AND (-6432 > csmallint)))
> OR ((cdouble >= cfloat)
> AND (cstring2 <= 'a'))
> OR ((cstring1 LIKE 'ss%')
> AND (10.175 > cbigint)))
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)