[ 
https://issues.apache.org/jira/browse/HUDI-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889803#comment-17889803
 ] 

sivabalan narayanan edited comment on HUDI-8178 at 10/15/24 6:39 PM:
---------------------------------------------------------------------

Fixes we had to make in the source code: 
1. 
FileSystemBackedTableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression
 - We should fix the partition stats to add the hoodie partition meta file. 

2. HoodieTableMetadataUtil.castAndCompare fixes. account for data type change 
(promotion) for fields. This impacts schema evolution cases.

3. MetadataPartitionType. utf casting to string. this needs to go in. 

4. ParquetUtils L 450. java.sql.Date -> java.time.LocalDate casting issues. 
since partition stats needs to merge ranges, we are hitting this issue. 
we might be missing col stats pruning tests on the query engine side for "Date" 
datatype cols. 

 


was (Author: shivnarayan):
Fixes we had to make in the source code: 
1. 
FileSystemBackedTableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression
- We should fix the partition stats to add the hoodie partition meta file. 

2. HoodieTableMetadataUtil.castAndCompare fixes. account for data type change 
(promotion) for fields. 

3. MetadataPartitionType. utf casting to string. this needs to go in. 

4. ParquetUtils L 450. java.sql.Date -> java.time.LocalDate casting issues. 
since partition stats needs to merge ranges, we are hitting this issue. 
we might be missing col stats pruning tests on the query engine side for "Date" 
datatype cols. 

 

> Enable partition stats by default
> ---------------------------------
>
>                 Key: HUDI-8178
>                 URL: https://issues.apache.org/jira/browse/HUDI-8178
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Sagar Sumit
>            Assignee: Lin Liu
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>
> There are two issues that need to be fixed before we can enable partition 
> stats by default:
> 1. There is a [class cast 
> exception|https://dev.azure.com/apachehudi/hudi-oss-ci/_build/results?buildId=151&view=logs&j=b1544eb9-7ff1-5db9-0187-3e05abf459bc&t=e0ae894b-41c9-5f4b-7ed2-bdf5243b02e7]
>  under certain curcumstance while collecting stats for log files. This 
> happens even for colstats (run 
> `[testReadPathsForOnlyLogFiles|https://github.com/apache/hudi/blob/33a02987b7fc385253dc0e0efbf112066b8cf190/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala#L885]`
>  test with column stats enabled).
>  
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to 
> org.apache.avro.generic.IndexedRecord
>       at org.apache.avro.generic.GenericData.getField(GenericData.java:858)
>       at org.apache.avro.generic.GenericData.compare(GenericData.java:1229)
>       at org.apache.avro.generic.GenericData.compare(GenericData.java:1148)
>       at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$null$1(HoodieTableMetadataUtil.java:239)
>       at java.util.ArrayList.forEach(ArrayList.java:1259)
>       at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$collectColumnRangeMetadata$2(HoodieTableMetadataUtil.java:222)
>       at java.util.ArrayList.forEach(ArrayList.java:1259)
>       at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.collectColumnRangeMetadata(HoodieTableMetadataUtil.java:219)
>       at 
> org.apache.hudi.io.HoodieAppendHandle.processAppendResult(HoodieAppendHandle.java:424)
>       at 
> org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475)
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to