sivabalan narayanan created HUDI-8586:
-----------------------------------------

             Summary: Fix partition stats index to only index supported types
                 Key: HUDI-8586
                 URL: https://issues.apache.org/jira/browse/HUDI-8586
             Project: Apache Hudi
          Issue Type: Improvement
          Components: metadata
            Reporter: sivabalan narayanan


Looks like there are data type mis-matches b/w base files and log files while 
we generate col stats. So, when we try to merge them together, we are running 
into issues. 
{code:java}
java.lang.ClassCastException: class java.lang.Integer cannot be cast to class 
java.time.chrono.ChronoLocalDate (java.lang.Integer and 
java.time.chrono.ChronoLocalDate are in module java.base of loader 'bootstrap') 
{code}
ref patch: [https://github.com/apache/hudi/pull/12331] 

For eg, for "current_date" column, 

date type from parquet: 
required int32 current_date (DATE)

 

in log files, data type is 

{"type":"int","logicalType":"date"}

 

For now, lets support partition stats only for scalar/primitives types. and for 
other datatypes, we can skip generate stats into partition stats. 

We can ensure user experience is good and seamless and not see random errors. 
Even at the cost of not indexing only. 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to