[
https://issues.apache.org/jira/browse/HUDI-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sagar Sumit updated HUDI-8178:
------------------------------
Description:
There are two issues that need to be fixed before we can enable partition stats
by default:
1. There is a [class cast
exception|https://dev.azure.com/apachehudi/hudi-oss-ci/_build/results?buildId=151&view=logs&j=b1544eb9-7ff1-5db9-0187-3e05abf459bc&t=e0ae894b-41c9-5f4b-7ed2-bdf5243b02e7]
under certain curcumstance while collecting stats for log files. This happens
even for colstats (run
`[testReadPathsForOnlyLogFiles|https://github.com/apache/hudi/blob/33a02987b7fc385253dc0e0efbf112066b8cf190/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala#L885]`
test with column stats enabled).
{code:java}
Caused by: java.lang.ClassCastException:
org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to
org.apache.avro.generic.IndexedRecord
at org.apache.avro.generic.GenericData.getField(GenericData.java:858)
at org.apache.avro.generic.GenericData.compare(GenericData.java:1229)
at org.apache.avro.generic.GenericData.compare(GenericData.java:1148)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$null$1(HoodieTableMetadataUtil.java:239)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$collectColumnRangeMetadata$2(HoodieTableMetadataUtil.java:222)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.collectColumnRangeMetadata(HoodieTableMetadataUtil.java:219)
at
org.apache.hudi.io.HoodieAppendHandle.processAppendResult(HoodieAppendHandle.java:424)
at
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475)
{code}
was:
There are two issues that need to be fixed before we can enable partition stats
by default:
1. There is a [class cast
exception|https://dev.azure.com/apachehudi/hudi-oss-ci/_build/results?buildId=151&view=logs&j=b1544eb9-7ff1-5db9-0187-3e05abf459bc&t=e0ae894b-41c9-5f4b-7ed2-bdf5243b02e7]
under certain curcumstance. This happens even for colstats.
2. We need to limit the number of columns to aggregate stats by default
otherwise building and updating index can be slow if we for all columns.
> Enable partition stats by default
> ---------------------------------
>
> Key: HUDI-8178
> URL: https://issues.apache.org/jira/browse/HUDI-8178
> Project: Apache Hudi
> Issue Type: Task
> Reporter: Sagar Sumit
> Assignee: Lin Liu
> Priority: Blocker
> Fix For: 1.0.0
>
>
> There are two issues that need to be fixed before we can enable partition
> stats by default:
> 1. There is a [class cast
> exception|https://dev.azure.com/apachehudi/hudi-oss-ci/_build/results?buildId=151&view=logs&j=b1544eb9-7ff1-5db9-0187-3e05abf459bc&t=e0ae894b-41c9-5f4b-7ed2-bdf5243b02e7]
> under certain curcumstance while collecting stats for log files. This
> happens even for colstats (run
> `[testReadPathsForOnlyLogFiles|https://github.com/apache/hudi/blob/33a02987b7fc385253dc0e0efbf112066b8cf190/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala#L885]`
> test with column stats enabled).
>
> {code:java}
> Caused by: java.lang.ClassCastException:
> org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to
> org.apache.avro.generic.IndexedRecord
> at org.apache.avro.generic.GenericData.getField(GenericData.java:858)
> at org.apache.avro.generic.GenericData.compare(GenericData.java:1229)
> at org.apache.avro.generic.GenericData.compare(GenericData.java:1148)
> at
> org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$null$1(HoodieTableMetadataUtil.java:239)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> at
> org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$collectColumnRangeMetadata$2(HoodieTableMetadataUtil.java:222)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> at
> org.apache.hudi.metadata.HoodieTableMetadataUtil.collectColumnRangeMetadata(HoodieTableMetadataUtil.java:219)
> at
> org.apache.hudi.io.HoodieAppendHandle.processAppendResult(HoodieAppendHandle.java:424)
> at
> org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475)
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)