sivabalan narayanan created HUDI-8586:
-----------------------------------------
Summary: Fix partition stats index to only index supported types
Key: HUDI-8586
URL: https://issues.apache.org/jira/browse/HUDI-8586
Project: Apache Hudi
Issue Type: Improvement
Components: metadata
Reporter: sivabalan narayanan
Looks like there are data type mis-matches b/w base files and log files while
we generate col stats. So, when we try to merge them together, we are running
into issues.
{code:java}
java.lang.ClassCastException: class java.lang.Integer cannot be cast to class
java.time.chrono.ChronoLocalDate (java.lang.Integer and
java.time.chrono.ChronoLocalDate are in module java.base of loader 'bootstrap')
{code}
ref patch: [https://github.com/apache/hudi/pull/12331]
For eg, for "current_date" column,
date type from parquet:
required int32 current_date (DATE)
in log files, data type is
{"type":"int","logicalType":"date"}
For now, lets support partition stats only for scalar/primitives types. and for
other datatypes, we can skip generate stats into partition stats.
We can ensure user experience is good and seamless and not see random errors.
Even at the cost of not indexing only.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)