I am currently working on normalizing TFRecordIO <https://github.com/apache/beam/issues/28692> and ran into a cycle dependency issue when converting the byte array to Beam Rows via avroutils library. We have come up with a few other options that we would like to figure out which is best or maybe there is a better one.
Some quick background - java version doesn't allow passing in a coder, while the python one can. Current Options: 1. Give a user an option to pass in a coder. Benefits/Cons: More effort for Java users to use and would require changes in TFRecordIO, but would normalize coder usage across both. 2. Just return a byte string as a row with a single field record and a byte string type. Benefits/Cons: Should be simpler to implement. 3. Pass in a preset of coders. Benefits/Cons: No effort for users, but no flexibility in choosing a coder. 4. Just don't do Java normalization and only do it in Python. Benefits/Cons: RunInference library has to be done in Python, so most use cases will be in Python anyways. Slightly leaning toward Option 2, but any opinions or other ideas here? Thanks Derrick