Hi All, Cham and myself were trying to initiate the HDF5 support with the HDF5 team. It seems that their forum might be able to provide the required support. I have created a ticket on their system. https://forum.hdfgroup.org/ and will follow up after that to make sure that this is not being forgotten. Please let me know if you have any comments
Best, Eila On Fri, Mar 23, 2018 at 3:07 AM, Jean-Baptiste Onofré <[email protected]> wrote: > Hi all, > > Sorry for the delay, but I got issues with my e-mail provider (I was not > able to > send e-mails :( ). > > Last week during Beam Summit, I had the change to participate to the IO > brainstorming session. > > Here's the minute notes: > > 1. IOs set > We now have a decent number of IOs in Beam, and new are coming (ParquetIO, > RabbitMQIO). Users mentioned a new file format you could support: HDF5. It > would > be an Python IO. > I will create the Jira about HDF5. > Other IOs will also be in preparation, coming along with SDF support. > > 2. IOs and SDKs > This point was related to the portability layer: how can I use a Java IO in > Python or the opposite ? Today, most of the IOs are related to Java SDK, > and > it's a bit frustrating for Python SDK users. Users are looking forward > portability layer, however they also expressed some questions about Docker > requirements. I think we should prepare a clean answer to this point. > > 3. PCollection Headers > Users want more "dynamic" IOs, maybe that a IO behavior could change > depending > of the element they are considering in the PCollection. I introduced what > we are > using in Apache Camel: Message Headers. The Camel components endpoints > (equivalent of Beam IOs) can use the headers: for instance the camel-http > component can use a Camel.HTTP_URL header. We already discussed about > PCollection headers/hints/annotation/metadata (whatever the name we give) > and I > still think it would be a great feature for both IOs and even the runners. > I'm proposing to create a Jira about that, I will be more than happy to > work on > this one. > > 4. Schema > As you might know, we are working on adding schema support in PCollection. > This > feature can be leveraged by IOs. Especially, I think it would reduce the > "wrapping" made by IOs (like KafkaRecord, JmsRecord, ...) and easier data > convert. > > 5. Error Handling > Users would need a generic error handling in the IOs. Today the error > handling > is managed by each IOs. I introduced the error handler we are using in > Apache > Camel (sorry again ;)) and especially the default error handler features > like: > redelivery policy, recoverable/irrecoverable error handling, onWhen, > onException, whileTrue, ... > The error handler is not at component level but at routing engine level. We > could imagine something similar at pipeline level. > Thoughts ? > > I hope I didn't forget something ;) > > To summarize: > - I will create new Jiras for HDF5 and other new IOs > - We have to work on documentation/explanation about portability layer & > IOs > - I will start a separate thread for error handling discussion > - Nothing to do about schema: it has already started. > > Regards > JB > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com > -- Eila www.orielresearch.org https://www.meetup.com/Deep-Learning-In-Production/
