The unrelated changes (opencsv upgrade) were removed from this PR and were done separately, but the differences in the results of Hive's test are still present.
I did a few additional checks: - The same differences are present when Hive statistics are enabled and when they are disabled. - With the decimal_vgby table used as a source in the test in Parquet format instead of Orc, the differences are absent (so it is an additional confirmation that it is something related to Orc). - I dumped the content of the decimal_vgby table with and without Orc upgrade and they were the same, which is odd. Can you please suggest what can be the reason for these differences? On 2024/05/06 15:59:58 Dongjoon Hyun wrote: > Your PR seems to have irrelevant change. > > For example, upgrading `net.sf.opencsv: opencsv:2.3` to `com.opencsv:opencsv:5.9`. > > Could you remove `opencsv` change first to narrow down? > > Dongjoon. > > On 2024/05/06 14:55:20 Dmitriy Fingerman wrote: > > Hello ORC Devs, > > > > I am working on upgrading the Orc version in Hive from 1.8.5 to 1.9.3. > > (To 1.9.3 and not to 2.0.0 because Hive still supports Java 8 and Orc 2.0.0 > > doesn't). > > Link to the Hive pull request: https://github.com/apache/hive/pull/5218 > > One of the tests in this Hive PR is failing because some queries' results > > have changed. > > The differences are in the fractional parts of the results of Hive's > > functions STDDEV_POP and STDDEV_SAMP. > > (These functions are implemented with operations that include sqrt, sum, > > division, multiplication, count, etc). > > Here is the link to the differences with and without the upgraded Orc > > version: > > https://github.com/apache/hive/pull/5218/files#diff-7c779589551c5224644bfe786d1f03a5e3aa18b219b28ae18f89fffea01ef483 > > > > Can you please advise if Orc had some changes around precision that could > > explain these differences in query results? > > > > Thanks, > > Dmitriy Fingerman > > >