+1 to option (2) On Mon, Feb 24, 2025 at 1:32 PM Danny McCormick via dev <dev@beam.apache.org> wrote:
> Thanks for looking into this! I think I like option (2) for the base > transform since it allows us to normalize across languages and get this > added with the lowest amount of effort, plus it doesn't stop us from adding > (1), or (3) in the future (though this may eventually require some more > complex forking depending on which coders each version of the IO supports). > > If we do (2), we could also eventually consider adding something like a > `DecodeFromBytes` transform which maps the byte string to a known type > (today this is doable with map though). > > Thanks, > Danny > > On Mon, Feb 24, 2025 at 1:20 PM Derrick Williams via dev < > dev@beam.apache.org> wrote: > >> I am currently working on normalizing TFRecordIO >> <https://github.com/apache/beam/issues/28692> and ran into a cycle >> dependency issue when converting the byte array to Beam Rows via avroutils >> library. We have come up with a few other options that we would like to >> figure out which is best or maybe there is a better one. >> >> Some quick background - java version doesn't allow passing in a coder, >> while the python one can. >> >> Current Options: >> 1. Give a user an option to pass in a coder. >> Benefits/Cons: More effort for Java users to use and would require >> changes in TFRecordIO, but would normalize coder usage across both. >> >> 2. Just return a byte string as a row with a single field record and a >> byte string type. >> Benefits/Cons: Should be simpler to implement. >> >> 3. Pass in a preset of coders. >> Benefits/Cons: No effort for users, but no flexibility in choosing a >> coder. >> >> 4. Just don't do Java normalization and only do it in Python. >> Benefits/Cons: RunInference library has to be done in Python, so most use >> cases will be in Python anyways. >> >> Slightly leaning toward Option 2, but any opinions or other ideas here? >> >> Thanks >> Derrick >> >> >>