You can augment the hive writer to record necessary metadata. Seems like that would be less likely to lead to surprises.
On Wednesday, October 29, 2014, Chen Song <[email protected]> wrote: > Hey Lukas > > I am not quite following your point? What do you mean by "add option to use > own compiled class or dynamic message." > > Chen > > On Sun, Oct 26, 2014 at 7:31 PM, lukas nalezenec < > [email protected] <javascript:;>> > wrote: > > > Hi, > > You are right, i will add option to use own compiled class or dynamic > > message. > > > > Lukas > > > > On Sun, Oct 26, 2014 at 8:27 PM, Chen Song <[email protected] > <javascript:;>> wrote: > > > > > Hi, > > > > > > I am new to Parquet and we have a complicated use case in which we want > > to > > > adopt Parquet as our storage format. > > > > > > Current: > > > > > > - The data is stored in Sequence files as Protobuf. > > > - We have map reduce jobs to write the data. Hive tables were > created > > > with Protobuf Serde using elephant-bird so people can query the data > > via > > > Hive. > > > - We enhance elephant-bird to add our own serializer so one can > write > > > data into table via Hive and data is stored in Sequence files as > > > Protobuf. > > > > > > > > > Future: > > > We want to use Parquet as the underlying storage format without losing > > > Protobuf abstraction at application layer. After a bit research and > > > practice, I have a few questions. > > > > > > - Say if Hive table is created as Parquet table, and data is written > > via > > > Hive. > > > - If I want to read data in map reduce jobs as Protobuf records, > can I > > > use ProtoParquetInputFormat in > > > > > > > > > https://github.com/Parquet/parquet-mr/blob/master/parquet-protobuf/src/main/java/parquet/proto/ProtoParquetInputFormat.java > > > ? > > > After looking at the API, it doesn't seem possible that I can > > > specific the > > > Protobuf class for the input path. Instead, > > > ProtoParquetInputFormat derives > > > the class from the footer of the underlying data. Is it fair to > > > day ProtoParquetInputFormat will only read data written > > > by ProtoParquetOutputFormat? Is there a way to work around this? > > > - If not, is there any out of the box Hive output format I can > use > > to > > > piggy back ProtoParquetOutputFormat? > > > - If data is written by map reduce job with > ProtoParquetOutputFormat. > > > Will read query in Hive work automatically? > > > > > > Thanks a lot in advance. Any suggestions would be appreciated. > > > > > > -- > > > Chen Song > > > > > > > > > -- > Chen Song > > > > > -- > Chen Song >
