[ 
https://issues.apache.org/jira/browse/HUDI-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-8178:
------------------------------
    Description: 
There are two issues that need to be fixed before we can enable partition stats 
by default:
1. There is a [class cast 
exception|https://dev.azure.com/apachehudi/hudi-oss-ci/_build/results?buildId=151&view=logs&j=b1544eb9-7ff1-5db9-0187-3e05abf459bc&t=e0ae894b-41c9-5f4b-7ed2-bdf5243b02e7]
 under certain curcumstance while collecting stats for log files. This happens 
even for colstats (run 
`[testReadPathsForOnlyLogFiles|https://github.com/apache/hudi/blob/33a02987b7fc385253dc0e0efbf112066b8cf190/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala#L885]`
 test with column stats enabled).

 
{code:java}
Caused by: java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to 
org.apache.avro.generic.IndexedRecord
        at org.apache.avro.generic.GenericData.getField(GenericData.java:858)
        at org.apache.avro.generic.GenericData.compare(GenericData.java:1229)
        at org.apache.avro.generic.GenericData.compare(GenericData.java:1148)
        at 
org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$null$1(HoodieTableMetadataUtil.java:239)
        at java.util.ArrayList.forEach(ArrayList.java:1259)
        at 
org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$collectColumnRangeMetadata$2(HoodieTableMetadataUtil.java:222)
        at java.util.ArrayList.forEach(ArrayList.java:1259)
        at 
org.apache.hudi.metadata.HoodieTableMetadataUtil.collectColumnRangeMetadata(HoodieTableMetadataUtil.java:219)
        at 
org.apache.hudi.io.HoodieAppendHandle.processAppendResult(HoodieAppendHandle.java:424)
        at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475)
 {code}
 

 

  was:
There are two issues that need to be fixed before we can enable partition stats 
by default:
1. There is a [class cast 
exception|https://dev.azure.com/apachehudi/hudi-oss-ci/_build/results?buildId=151&view=logs&j=b1544eb9-7ff1-5db9-0187-3e05abf459bc&t=e0ae894b-41c9-5f4b-7ed2-bdf5243b02e7]
 under certain curcumstance. This happens even for colstats.

2. We need to limit the number of columns to aggregate stats by default 
otherwise building and updating index can be slow if we for all columns.


> Enable partition stats by default
> ---------------------------------
>
>                 Key: HUDI-8178
>                 URL: https://issues.apache.org/jira/browse/HUDI-8178
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Sagar Sumit
>            Assignee: Lin Liu
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> There are two issues that need to be fixed before we can enable partition 
> stats by default:
> 1. There is a [class cast 
> exception|https://dev.azure.com/apachehudi/hudi-oss-ci/_build/results?buildId=151&view=logs&j=b1544eb9-7ff1-5db9-0187-3e05abf459bc&t=e0ae894b-41c9-5f4b-7ed2-bdf5243b02e7]
>  under certain curcumstance while collecting stats for log files. This 
> happens even for colstats (run 
> `[testReadPathsForOnlyLogFiles|https://github.com/apache/hudi/blob/33a02987b7fc385253dc0e0efbf112066b8cf190/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala#L885]`
>  test with column stats enabled).
>  
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to 
> org.apache.avro.generic.IndexedRecord
>       at org.apache.avro.generic.GenericData.getField(GenericData.java:858)
>       at org.apache.avro.generic.GenericData.compare(GenericData.java:1229)
>       at org.apache.avro.generic.GenericData.compare(GenericData.java:1148)
>       at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$null$1(HoodieTableMetadataUtil.java:239)
>       at java.util.ArrayList.forEach(ArrayList.java:1259)
>       at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$collectColumnRangeMetadata$2(HoodieTableMetadataUtil.java:222)
>       at java.util.ArrayList.forEach(ArrayList.java:1259)
>       at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.collectColumnRangeMetadata(HoodieTableMetadataUtil.java:219)
>       at 
> org.apache.hudi.io.HoodieAppendHandle.processAppendResult(HoodieAppendHandle.java:424)
>       at 
> org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475)
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to