TheR1sing3un commented on PR #12839: URL: https://github.com/apache/hudi/pull/12839#issuecomment-2710624995
> We already have the reference check for the `schema::equals` method, do you mean one of the schema comes from the incoming new record? During the lifetime of the JVM, there may be many tasks running, and those tasks who call the `getCachedSchema` first will first put their own created `Schema` variable into it, then other tasks will invoke `get()` but the reference does not match, it will use `Schema::equals` to compare. > Can a local `AvroSchemaCache` like cache solves the problem?(for the schema from input record and schema from the append handle, we always fetch it from the cache). I think it may not be very well implemented. For example, if we create a thread local cache at the thread level, but spark's executor uses a thread pool to schedule the received tasks, so a cache per thread will still have the same problem, because a thread may run many tasks one by one. Each task still creates its own `Schema` variable. If you want to implement a local cache, the scope of the cache can only be the task level, not the thread level and not the JVM level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
