I like option (2) as well.

On Mon, Feb 24, 2025 at 11:00 AM Ahmed Abualsaud via dev
<dev@beam.apache.org> wrote:
>
> +1 to option (2)
>
> On Mon, Feb 24, 2025 at 1:32 PM Danny McCormick via dev <dev@beam.apache.org> 
> wrote:
>>
>> Thanks for looking into this! I think I like option (2) for the base 
>> transform since it allows us to normalize across languages and get this 
>> added with the lowest amount of effort, plus it doesn't stop us from adding 
>> (1), or (3) in the future (though this may eventually require some more 
>> complex forking depending on which coders each version of the IO supports).
>>
>> If we do (2), we could also eventually consider adding something like a 
>> `DecodeFromBytes` transform which maps the byte string to a known type 
>> (today this is doable with map though).
>>
>> Thanks,
>> Danny
>>
>> On Mon, Feb 24, 2025 at 1:20 PM Derrick Williams via dev 
>> <dev@beam.apache.org> wrote:
>>>
>>> I am currently working on normalizing TFRecordIO and ran into a cycle 
>>> dependency issue when converting the byte array to Beam Rows via avroutils 
>>> library.  We have come up with a few other options that we would like to 
>>> figure out which is best or maybe there is a better one.
>>>
>>> Some quick background - java version doesn't allow passing in a coder, 
>>> while the python one can.
>>>
>>> Current Options:
>>> 1. Give a user an option to pass in a coder.
>>> Benefits/Cons: More effort for Java users to use and would require changes 
>>> in TFRecordIO, but would normalize coder usage across both.
>>>
>>> 2. Just return a byte string as a row with a single field record and a byte 
>>> string type.
>>> Benefits/Cons: Should be simpler to implement.
>>>
>>> 3. Pass in a preset of coders.
>>> Benefits/Cons: No effort for users, but no flexibility in choosing a coder.
>>>
>>> 4. Just don't do Java normalization and only do it in Python.
>>> Benefits/Cons: RunInference library has to be done in Python, so most use 
>>> cases will be in Python anyways.
>>>
>>> Slightly leaning toward Option 2, but any opinions or other ideas here?
>>>
>>> Thanks
>>> Derrick
>>>
>>>

Reply via email to