"Nested data is not yet implemented" in BigQuery (if I recall exact words correctly). Quoting speaker at the BigQuery presentation at Google Technology User Group last week in Googleplex (intentionally not citing speaker's name).
-ay On Sep 14, 2012, at 1:28 PM, David Gruzman <[email protected]> wrote: > I assume that evolution of BigQuery reflects resolution of Dremel... If > somebody have information on it it would be great. > Storage system should understand that all file comprising the horizontal > partition of the table are one logical entity, and should store them > together / in some proximity. I agree that PAX will be much more > convinient. The question is - is there performance penalty of PAX vs file > per column? > David > > On Fri, Sep 14, 2012 at 11:21 PM, Tomer Shiran <[email protected]> wrote: > >> Is there any public information suggesting that Google moved away from >> supporting nested data? Clearly BigQuery doesn't yet allow nested data, but >> not sure that applies to Dremel. >> >> There are challenges with one file per column. How do you ensure that a >> single record is located on a single machine to avoid costly record >> reconstruction? >> >> On Fri, Sep 14, 2012 at 1:05 PM, David Gruzman <[email protected] >>> wrote: >> >>> Hi All, >>> I would like to discuss the question of what will be native format for >>> drill. Original Google dremel paper defined their hierarchical columnar >>> data format. Since then >>> google shifted from hierarchical data format... So it is a question if it >>> makes sense to stick with it? >>> If we are also moving to simple flat format we need our own format we >> have >>> to support "native". In case of Drill I would define that native support >> as >>> "high performance". >>> I think we can go to some kind of PAX format with comprehensive metadata >> in >>> the header, so each file is completely self contained and can be >> understood >>> and processed without any external data. >>> Alternative is to have single file per column. As far as I remember from >>> our OpenDremel work the main decision point is - if we can read one >> column >>> from the file without loading into node memory unnecessary data from >> other >>> columns. >>> With best regards, >>> David >>> >> >> >> >> -- >> Tomer Shiran >> Director of Product Management | MapR Technologies | 650-804-8657 >>
