I think this is a reasonable extension to `DataOutputSerializer`. Although 64 KB is not small, it is still possible to have long strings over that limit. There are already precedents of extended APIs `DataOutputSerializer`. E.g.
public void setPosition(int position) { Preconditions.checkArgument( position >= 0 && position <= this.position, "Position out of bounds."); this.position = position; } public void setPositionUnsafe(int position) { this.position = position; } On Fri, Jan 19, 2024 at 2:51 AM Péter Váry <peter.vary.apa...@gmail.com> wrote: > Hi Team, > > During the root cause analysis of an Iceberg serialization issue [1], we > have found that *DataOutputSerializer.writeUTF* has a hard limit on the > length of the string (64k). This is inherited from the > *DataOutput.writeUTF* > method, where the JDK specifically defines this limit [2]. > > For our use-case we need to enable the possibility to serialize longer UTF > strings, so we will need to define a *writeLongUTF* method with a similar > specification than the *writeUTF*, but without the length limit. > > My question is: > - Is it something which would be useful for every Flink user? Shall we add > this method to *DataOutputSerializer*? > - Is it very specific for Iceberg, and we should keep it in Iceberg > connector code? > > Thanks, > Peter > > [1] - https://github.com/apache/iceberg/issues/9410 > [2] - > > https://docs.oracle.com/javase/8/docs/api/java/io/DataOutput.html#writeUTF-java.lang.String- >