You can augment the hive writer to record necessary metadata. Seems like
that would be less likely to lead to surprises.

On Wednesday, October 29, 2014, Chen Song <[email protected]> wrote:

> Hey Lukas
>
> I am not quite following your point? What do you mean by "add option to use
> own compiled class or dynamic message."
>
> Chen
>
> On Sun, Oct 26, 2014 at 7:31 PM, lukas nalezenec <
> [email protected] <javascript:;>>
> wrote:
>
> > Hi,
> > You are right, i will add option to use own compiled class or dynamic
> > message.
> >
> > Lukas
> >
> > On Sun, Oct 26, 2014 at 8:27 PM, Chen Song <[email protected]
> <javascript:;>> wrote:
> >
> > > Hi,
> > >
> > > I am new to Parquet and we have a complicated use case in which we want
> > to
> > > adopt Parquet as our storage format.
> > >
> > > Current:
> > >
> > >    - The data is stored in Sequence files as Protobuf.
> > >    - We have map reduce jobs to write the data. Hive tables were
> created
> > >    with Protobuf Serde using elephant-bird so people can query the data
> > via
> > >    Hive.
> > >    - We enhance elephant-bird to add our own serializer so one can
> write
> > >    data into table via Hive and data is stored in Sequence files as
> > > Protobuf.
> > >
> > >
> > > Future:
> > > We want to use Parquet as the underlying storage format without losing
> > > Protobuf abstraction at application layer. After a bit research and
> > > practice, I have a few questions.
> > >
> > >    - Say if Hive table is created as Parquet table, and data is written
> > via
> > >    Hive.
> > >    - If I want to read data in map reduce jobs as Protobuf records,
> can I
> > >       use ProtoParquetInputFormat in
> > >
> > >
> >
> https://github.com/Parquet/parquet-mr/blob/master/parquet-protobuf/src/main/java/parquet/proto/ProtoParquetInputFormat.java
> > > ?
> > >       After looking at the API, it doesn't seem possible that I can
> > > specific the
> > >       Protobuf class for the input path. Instead,
> > > ProtoParquetInputFormat derives
> > >       the class from the footer of the underlying data. Is it fair to
> > >       day ProtoParquetInputFormat will only read data written
> > >       by ProtoParquetOutputFormat? Is there a way to work around this?
> > >       - If not, is there any out of the box Hive output format I can
> use
> > to
> > >       piggy back ProtoParquetOutputFormat?
> > >    - If data is written by map reduce job with
> ProtoParquetOutputFormat.
> > >    Will read query in Hive work automatically?
> > >
> > > Thanks a lot in advance. Any suggestions would be appreciated.
> > >
> > > --
> > > Chen Song
> > >
> >
>
>
>
> --
> Chen Song
>
>
>
>
> --
> Chen Song
>

Reply via email to