The unrelated changes (opencsv upgrade) were removed from this PR and were
done separately, but the differences in the results of Hive's test are
still present.

I did a few additional checks:

   - The same differences are present when Hive statistics are enabled and
   when they are disabled.
   - With the decimal_vgby table used as a source in the test in Parquet
   format instead of Orc, the differences are absent (so it is an additional
   confirmation that it is something related to Orc).
   - I dumped the content of the decimal_vgby table with and without Orc
   upgrade and they were the same, which is odd.


Can you please suggest what can be the reason for these differences?

On 2024/05/06 15:59:58 Dongjoon Hyun wrote:
> Your PR seems to have irrelevant change.
>
> For example, upgrading `net.sf.opencsv: opencsv:2.3` to
`com.opencsv:opencsv:5.9`.
>
> Could you remove `opencsv` change first to narrow down?
>
> Dongjoon.
>
> On 2024/05/06 14:55:20 Dmitriy Fingerman wrote:
> > Hello ORC Devs,
> >
> > I am working on upgrading the Orc version in Hive from 1.8.5 to 1.9.3.
> > (To 1.9.3 and not to 2.0.0 because Hive still supports Java 8 and Orc
2.0.0
> > doesn't).
> > Link to the Hive pull request: https://github.com/apache/hive/pull/5218
> > One of the tests in this Hive PR is failing because some queries'
results
> > have changed.
> > The differences are in the fractional parts of the results of Hive's
> > functions STDDEV_POP and STDDEV_SAMP.
> > (These functions are implemented with operations that include sqrt, sum,
> > division, multiplication, count, etc).
> > Here is the link to the differences with and without the upgraded Orc
> > version:
> >
https://github.com/apache/hive/pull/5218/files#diff-7c779589551c5224644bfe786d1f03a5e3aa18b219b28ae18f89fffea01ef483
> >
> > Can you please advise if Orc had some changes around precision that
could
> > explain these differences in query results?
> >
> > Thanks,
> > Dmitriy Fingerman
> >
>

Reply via email to