Vino1016 opened a new issue, #15215:
URL: https://github.com/apache/iceberg/issues/15215
### Query engine
engine: Flink 2.0.0
iceberg: 1.10.0
### Question
Description:
Hi, I’m encountering a performance warning in Iceberg:
"Performance degraded as records with different schema are generated for the
same table. Likely the DynamicRecord.schema is not reused. Reuse the same
instance if the record schema is the same to improve performance."
This warning occurs for every data record.
My Code Logic:
I process upstream data in a process function. The schema for the test data
is constant and does not change. Within this process, I construct a
DynamicRecord for each day of data.
// Data processing (simplified)
DynamicRecord dynamicRecord = new DynamicRecord(
tableIdentifier,
ICEBERG_BRANCH,
this.newestSchema,
genericRowData,
this.newestPartitionSpec,
DistributionMode.NONE,
1
);
dynamicRecord.setEqualityFields(xxxx.getEqualityFieldColumns());
dynamicRecord.setUpsertMode(true);
out.collect(dynamicRecord);
The tableIdentifier, newestSchema, and newestPartitionSpec are initialized
once in the open method and remain unchanged afterward.
Sink Configuration:
DynamicIcebergSink.forInput(dynamicRecordSingleOutputStreamOperator)
.cacheMaxSize(10)
.cacheRefreshMs(600000L)
.inputSchemasPerTableCacheMaxSize(10)
.immediateTableUpdate(true)
.overwrite(false)
.setAll(getSinkConfig(writeFormat))
.catalogLoader(TestS3Sink.getCatalogLoader(xxxx))
.generator(new DefaultDynamicRecordGenerator())
.append()
.setParallelism(1);
Custom Generator:
public class DefaultDynamicRecordGenerator implements
DynamicRecordGenerator<DynamicRecord> {
@Override
public void open(OpenContext openContext) throws Exception {
DynamicRecordGenerator.super.open(openContext);
}
@Override
public void generate(DynamicRecord dynamicRecord,
Collector<DynamicRecord> collector) throws Exception {
collector.collect(dynamicRecord);
}
}
Expected Behavior:
Since the schema does not change and remains the same object, Iceberg should
reuse the same DynamicRecord.schema instance, rather than treating it as a new
schema, which causes the warning.
Question:
Even though we are reusing the same schema instance, could this warning
still appear due to some internal handling by Iceberg?
Apart from the way the schema is passed, are there any other factors that
might cause Iceberg to treat it as a different schema, thus triggering this
warning?
I would appreciate any help or suggestions. Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]