That's great! many people have asked me about that and I'm glad to see this happening. Anyone know if there's something at work for the Java SDK (assuming I don't want to wait for Fn API support) ?
On Fri, Feb 24, 2017 at 8:44 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Fantastic ! > > That's a great addition and awesome to see that with Beam ! > > Regards > JB > > On 02/24/2017 02:51 AM, Robert Bradshaw wrote: > > One thing I'm really excited about this library is that it allows one to > > more easily express transforms on columnar data (which is useful beyond > > just ML). For example, if your input elements have two fields "x" and "y" > > then you can write functions like > > > > def preprocessing_fn(inputs): > > x_centered = tft.map(lambda x, mean: x - mean, inputs['x'], > > tft.mean(inputs['x'])) > > y_normalized = tft.scale_to_0_1(inputs['y']) > > return { > > 'x_centered': x_centered, > > 'y_normalized': y_normalized, > > 'x_centered_times_y_normalized': tft.map(operations.mul, > > x_centered, y_normalized) > > } > > > > # Read PCollection of dicts with 'x' and 'y' keys and numeric values > > input = p | Read(...) > > > > # output will contain dicts with 'x_centered', 'y_normalized', and > > 'x_centered_times_y_normalized' keys > > # with the expected values, and fn can be used to transform other data > > using the > > # statistics (mean, mins, and maxes) without re-analysis. > > output, fn = (input, schema) | > > beam_impl.AnalyzeAndTransformDataset(preprocessing_fn) > > > > This automatically injects the relevant global aggregations (which can be > > interleaved) and builds up tensorflow graphs to apply the transformations > > very efficiently. > > > > > > On Thu, Feb 23, 2017 at 4:55 PM, Davor Bonaci <da...@apache.org> wrote: > > > >> Beam and TensorFlow coming together -- a big deal for us! > >> > >> On Thu, Feb 23, 2017 at 3:49 PM, Ahmet Altay <al...@google.com.invalid> > >> wrote: > >> > >>> Hi all, > >>> > >>> Yesterday, there was an announcement from TensorFlow community about > the > >>> new tf.Transform library [1]. It is a library that allows users to > define > >>> pre-processing pipelines and run using large scale data processing > >>> frameworks. It is a library specifically designed to work with Apache > >> Beam. > >>> It is great to see Python SDK getting a larger ecosystem and increased > >>> usage. > >>> > >>> Also worth mentioning is, PMC member Robert Bradshaw was one of the > >>> contributors to this new library. > >>> > >>> Thank you, > >>> Ahmet > >>> > >>> [1] https://research.googleblog.com/2017/02/preprocessing-for-machine- > >>> learning-with.html > >>> > >> > > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >