Many thanks to Yash and Jason for your answers about this. I will explore two alternatives for now:
- Using Hive as proposed by Jason. I'll read more on elephant-bird to check the way to import protobuf data into Hive. - I'm also reading about ways to convert protobuf files to parquet, a format Drill is able to use as a datasource. I believe I can do this using parquet-mr (https://github.com/Parquet/parquet-mr). Cristian > Hi Christian, > >While we do not have a native protobuf reader for Drill, we do support Hive >Serdes as an input format. This will not be the fastest way to get your >data into the Drill engine, but it should be less coding than writing a >record reader for drill. > >If you need performance and are up for learning a bit more about Drill, we >would certainly welcome a contribution of a protobuf reader and would be >happy to help you get started. > >-Jason Altekruse On Wed, Sep 3, 2014 at 10:58 AM, Yash Sharma <[email protected]> wrote: > Hey Cristian, currently we do not have protobuf readers in Drill. It would > however be possible to add new readers in Drill by creating new > RecordReaders. > > Yash. On Wed, Sep 3, 2014 at 1:09 PM, Cristian Espinoza < [email protected]> wrote: > Hi, > > I'm evaluating Drill and until now it looks great. My idea is to use it to > directly query some protocol buffers files so they appear to the rest of my > JEE app as a datasource. But I've been unable to find any information in > the documentation about the proper way to register the file system, > specifically the format I have to use. The docs present examples for csv, > json and parquet formats, but there's none about protobuf. > > Is this possible to do? According to Drill's description it may be. > > Many thanks in advance, > > Cristián Espinoza > > >
