KnightChess commented on code in PR #6020:
URL: https://github.com/apache/hudi/pull/6020#discussion_r927350287


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/payload/SqlTypedRecord.scala:
##########
@@ -53,6 +53,11 @@ object SqlTypedRecord {
 
   private val avroDeserializerCache = CacheBuilder.newBuilder().build[Schema, 
HoodieAvroDeserializer]()
 
+  private val avroDeserializerCacheLocal = new ThreadLocal[Cache[Schema, 
HoodieAvroDeserializer]] {
+    override def initialValue(): Cache[Schema, HoodieAvroDeserializer] =
+      CacheBuilder.newBuilder().maximumSize(16).build[Schema, 
HoodieAvroDeserializer]()

Review Comment:
   > this looks not used at all? `avroDeserializerCache` still used for storing 
the deserializer
   > So, what are we try to fix here ? The schema key in the cache does not 
work in multi-thread use case ?
   
   reuse the same schema avroDeserializer in diff thread will cause the result 
record diff from input record.
   for example:
   schema(key is id) : id int, name string, age int
   there has two thread task use  SqlTypedRecord to get sqlRow int the same 
time.
   task one record:   1,  'one',  18
   task two record:   2,  'two',  19
   if reuse the sanme avroDeserializer, after deserialize in the same time, may 
has the following results:
   task one result:    1,  'two', 19
   task two result:    2,  'two', 19
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to