lei-su-awx opened a new issue, #10816: URL: https://github.com/apache/hudi/issues/10816
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? yes - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** when I readStream a hudi table and do some transformations then writeStream to a new hudi table, the schemacommit file kept growing, and the Okhttp dispatcher throws OOM when handle it. I found the schemacommit file stores the same schema with different `version_id` in one json array. **To Reproduce** Steps to reproduce the behavior: 1. enable `hoodie.datasource.write.reconcile.schema` 2. enable `hoodie.schema.on.read.enable` 3. readStream from a hudi table then writeStream to a new hudi table **Expected behavior** schemacommit only stores the latest schema instead of all historical schema. **Environment Description** * Hudi version : 0.14.1 * Spark version : 3.4.1 * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : GCS * Running on Docker? (yes/no) : yes **Additional context** write hudi configuration: ``` write_streaming_hudi_options = { 'hoodie.table.name': table_name, 'hoodie.datasource.write.recordkey.field': f'{primary_keys}, __region', 'hoodie.datasource.write.precombine.field': precombine_field, 'hoodie.datasource.write.table.name': table_name, 'hoodie.datasource.write.partitionpath.field': '__date', 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.table.type': 'COPY_ON_WRITE', 'hoodie.datasource.write.reconcile.schema': 'true', 'hoodie.schema.on.read.enable': 'true', 'hoodie.parquet.compression.codec': 'snappy', 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.datasource.write.drop.partition.columns': 'true', 'hoodie.insert.shuffle.parallelism': '1000', 'hoodie.upsert.shuffle.parallelism': '1000', 'hoodie.datasource.write.payload.class': 'org.apache.hudi.common.model.DefaultHoodieRecordPayload', } ``` schema file on GCS:  schema file detail:   **Stacktrace** driver error log  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
