Is there any public information suggesting that Google moved away from
supporting nested data? Clearly BigQuery doesn't yet allow nested data, but
not sure that applies to Dremel.

There are challenges with one file per column. How do you ensure that a
single record is located on a single machine to avoid costly record
reconstruction?

On Fri, Sep 14, 2012 at 1:05 PM, David Gruzman <[email protected]>wrote:

> Hi All,
> I would like to discuss the question of what will be native format for
> drill. Original Google dremel paper defined their hierarchical columnar
> data format. Since then
> google shifted from hierarchical data format... So it is a question if it
> makes sense to stick with it?
> If we are also moving to simple flat format we need our own format we have
> to support "native". In case of Drill I would define that native support as
> "high performance".
> I think we can go to some kind of PAX format with comprehensive metadata in
> the header, so each file is completely self contained and can be understood
> and processed without any external data.
> Alternative is to have single file per column. As far as I remember from
> our OpenDremel work the main decision point is - if we can read one column
> from the  file without loading into node memory unnecessary data from other
> columns.
> With best regards,
> David
>



-- 
Tomer Shiran
Director of Product Management | MapR Technologies | 650-804-8657

Reply via email to