I think this is a reasonable extension to `DataOutputSerializer`. Although
64 KB is not small, it is still possible to have long strings over that
limit. There are already precedents of extended APIs
`DataOutputSerializer`. E.g.

public void setPosition(int position) {
    Preconditions.checkArgument(
            position >= 0 && position <= this.position, "Position out
of bounds.");
    this.position = position;
}

public void setPositionUnsafe(int position) {
    this.position = position;
}


On Fri, Jan 19, 2024 at 2:51 AM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> Hi Team,
>
> During the root cause analysis of an Iceberg serialization issue [1], we
> have found that *DataOutputSerializer.writeUTF* has a hard limit on the
> length of the string (64k). This is inherited from the
> *DataOutput.writeUTF*
> method, where the JDK specifically defines this limit [2].
>
> For our use-case we need to enable the possibility to serialize longer UTF
> strings, so we will need to define a *writeLongUTF* method with a similar
> specification than the *writeUTF*, but without the length limit.
>
> My question is:
> - Is it something which would be useful for every Flink user? Shall we add
> this method to *DataOutputSerializer*?
> - Is it very specific for Iceberg, and we should keep it in Iceberg
> connector code?
>
> Thanks,
> Peter
>
> [1] - https://github.com/apache/iceberg/issues/9410
> [2] -
>
> https://docs.oracle.com/javase/8/docs/api/java/io/DataOutput.html#writeUTF-java.lang.String-
>

Reply via email to