KnightChess commented on code in PR #6020:
URL: https://github.com/apache/hudi/pull/6020#discussion_r927350287
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/payload/SqlTypedRecord.scala:
##########
@@ -53,6 +53,11 @@ object SqlTypedRecord {
private val avroDeserializerCache = CacheBuilder.newBuilder().build[Schema,
HoodieAvroDeserializer]()
+ private val avroDeserializerCacheLocal = new ThreadLocal[Cache[Schema,
HoodieAvroDeserializer]] {
+ override def initialValue(): Cache[Schema, HoodieAvroDeserializer] =
+ CacheBuilder.newBuilder().maximumSize(16).build[Schema,
HoodieAvroDeserializer]()
Review Comment:
> this looks not used at all? `avroDeserializerCache` still used for storing
the deserializer
> So, what are we try to fix here ? The schema key in the cache does not
work in multi-thread use case ?
reuse the same schema avroDeserializer in diff thread will cause the result
record diff from input record.
for example:
schema(key is id) : id int, name string, age int
there has two thread task use SqlTypedRecord to get sqlRow int the same
time.
task one record: 1, 'one', 18
task two record: 2, 'two', 19
if reuse the sanme avroDeserializer, after deserialize in the same time, may
has the following results:
task one result: 1, 'two', 19
task two result: 2, 'two', 19
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]