Re: Best way to normalize TFRecordIO

Ahmed Abualsaud via dev Mon, 24 Feb 2025 10:58:48 -0800

+1 to option (2)

On Mon, Feb 24, 2025 at 1:32 PM Danny McCormick via dev <dev@beam.apache.org>
wrote:


> Thanks for looking into this! I think I like option (2) for the base
> transform since it allows us to normalize across languages and get this
> added with the lowest amount of effort, plus it doesn't stop us from adding
> (1), or (3) in the future (though this may eventually require some more
> complex forking depending on which coders each version of the IO supports).
>
> If we do (2), we could also eventually consider adding something like a
> `DecodeFromBytes` transform which maps the byte string to a known type
> (today this is doable with map though).
>
> Thanks,
> Danny
>
> On Mon, Feb 24, 2025 at 1:20 PM Derrick Williams via dev <
> dev@beam.apache.org> wrote:
>
>> I am currently working on normalizing TFRecordIO
>> <https://github.com/apache/beam/issues/28692> and ran into a cycle
>> dependency issue when converting the byte array to Beam Rows via avroutils
>> library.  We have come up with a few other options that we would like to
>> figure out which is best or maybe there is a better one.
>>
>> Some quick background - java version doesn't allow passing in a coder,
>> while the python one can.
>>
>> Current Options:
>> 1. Give a user an option to pass in a coder.
>> Benefits/Cons: More effort for Java users to use and would require
>> changes in TFRecordIO, but would normalize coder usage across both.
>>
>> 2. Just return a byte string as a row with a single field record and a
>> byte string type.
>> Benefits/Cons: Should be simpler to implement.
>>
>> 3. Pass in a preset of coders.
>> Benefits/Cons: No effort for users, but no flexibility in choosing a
>> coder.
>>
>> 4. Just don't do Java normalization and only do it in Python.
>> Benefits/Cons: RunInference library has to be done in Python, so most use
>> cases will be in Python anyways.
>>
>> Slightly leaning toward Option 2, but any opinions or other ideas here?
>>
>> Thanks
>> Derrick
>>
>>
>>

Re: Best way to normalize TFRecordIO

Reply via email to