Looking at this more carefully the other day with Jacques makes it seem that

a) as Owen says ORC has a more elaborate type structure.  The data stored
is equivalent (ref the Protobuf versus Avro versus Thrift discussions)
subject to the possibility of the null, null difference that Owen mentions

b) as the Dremel paper points out, access to a very rare repeated structure
inside a common repeated structure will require only traversal of the rare
element using Dremel type structures, but with repeat counts will require
an additional traversal of a much more dense column.  How much difference
this will make in practice is unknown, but there are clearly cases that you
can imagine that this will cause orders of magnitude difference in favor of
Parquet.  Those cases may, howver, be vanishingly rare.




On Mon, Apr 15, 2013 at 4:06 PM, Owen O'Malley <[email protected]> wrote:

> Just a bit saying whether the record was present or null. Note that this is
> strictly more expressive than the Parquet's format in that it can encode
> structures with all null values. I believe the Parquet encoder would
> discard a row of the form (null, null) since it wouldn't have any leaves to
> make it materialize.
>
> -- Owen
>
>
> On Wed, Apr 10, 2013 at 3:48 PM, Ted Dunning <[email protected]>
> wrote:
>
> > On Wed, Apr 10, 2013 at 10:17 AM, Owen O'Malley <[email protected]>
> > wrote:
> >
> > > Ted,
> > >    ORC does support nested structures and splits them into primitive
> > > columns.
> >
> >
> > Good to hear.
> >
> >
> > > ...
> > > create table Foo (
> > >   complex: struct<field1: int, field2: map<string, int>>
> > >   simple: timestamp
> > > );
> > >
> > > will end up with a prefix-order flattening of the columns:
> > >
> > > columns:
> > > 0 - top level record (struct, children: 1, 6)
> > > 1 - complex (struct, children: 2, 3)
> > >
> >
> > What is stored in column 1?
> >
>

Reply via email to