Caideyipi commented on PR #824:
URL: https://github.com/apache/tsfile/pull/824#issuecomment-4550686336

   I found a functional issue.
   
     `Tablet.serializedSize()` claims to return the exact serialized byte size, 
but it uses
     `ReadWriteIOUtils.sizeToWrite(insertTargetName)` to calculate string 
sizes. That helper uses `s.getBytes()`, which
     depends on the platform default charset. The actual serialization path 
uses `ReadWriteIOUtils.write(String, ...)`,
     which encodes strings with `TSFileConfig.STRING_CHARSET` (UTF-8).
   
     So when the device/table name, measurement name, or schema properties 
contain non-ASCII characters, `serializedSize()`
     can differ from the real serialized size if the process default charset is 
not UTF-8.
   
     This is probably not an issue when TsFile is used through IoTDB, because 
IoTDB startup sets the default charset. But
     TsFile can also be used independently, and in standalone usage this can 
make the size estimate incorrect and break the
     “exact size” guarantee.
   
     Suggested fix: make `ReadWriteIOUtils.sizeToWrite(String)` use 
`TSFileConfig.STRING_CHARSET`, consistent with the
     write path, and add a non-ASCII name test.
   
     There is also a CodeQL alert for integer narrowing/overflow in 
`serializedSizeOfTimes()`. Since this method is
     intended to return an exact byte size, that should probably be handled as 
well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to