sivabalan narayanan created HUDI-8909:
-----------------------------------------

             Summary: Support/Fix Byte data type w/ partition stats 
                 Key: HUDI-8909
                 URL: https://issues.apache.org/jira/browse/HUDI-8909
             Project: Apache Hudi
          Issue Type: Improvement
          Components: metadata
            Reporter: sivabalan narayanan


While working towards making partition stats default, we ran into an issue, 
with Byte data type 

[https://github.com/apache/hudi/pull/12671]

 

min max values when merging multiple values did not align w/ manually computed 
stats. Check for column "c7" in tests in TestColStatsIndex. 

 

To reproduce:

switch data type of C7 to "Byte". 

and run 

TestColumnStatsIndex.testMetadataColumnStatsIndex. 

 

comment out 

```

assertEquals(asJson(sort(expectedColStatsIndexTableDf, validationSortColumns)),
asJson(sort(transposedColStatsDF.drop("fileName"), validationSortColumns)))

```

in ColumnStatIndexTestBase.

 

Run the test for COW table. you may find the issue w/ below validation

```

assertEquals(asJson(sort(manualColStatsTableDF.drop(colsToDrop: _*), 
pValidationSortColumns)),
asJson(sort(pTransposedColStatsDF.drop(colsToDrop: _*), 
pValidationSortColumns)))

```

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to