A common use case we're running into with beam rows is something like: - Read data from source X - Convert to Row - Encode row (generally for xlang)
In cases like this, I've noticed that we spend a significant (30%+) amount of time just decoding and re-encoding strings. Avro has a nice solution to this with its Utf8 class [1] which defers decoding the string until actually needed. I'm curious if there's been any thought around optimizing this in beam as well? It doesn't seem like it'd be hard to support it in the RowCoder implementation right now. [1] https://avro.apache.org/docs/1.4.1/api/java/org/apache/avro/util/Utf8.html
