Re: parquet optimal file structure - flat vs nested

Zeming Yu Sun, 30 Apr 2017 14:08:42 -0700

I thought relational databases with 6 TB of data can be quite expensive?

On 1 May 2017 12:56 am, "Muthu Jayakumar" <bablo...@gmail.com> wrote:


> I am not sure if parquet is a good fit for this? This seems more like
> filter lookup than an aggregate like query. I am curious to see what others
> have to say.
> Would it be more efficient if a relational database with the right index
> (code field in the above case) to perform more efficiently (with spark that
> uses predicate push-down)?
> Hope this helps.
>
> Thanks,
> Muthu
>
> On Sun, Apr 30, 2017 at 1:45 AM, Zeming Yu <zemin...@gmail.com> wrote:
>
>> Another question: I need to store airport info in a parquet file and
>> present it when a user makes a query.
>>
>> For example:
>>
>> "airport": {
>>                                         "code": "TPE",
>>                                         "name": "Taipei (Taoyuan Intl.)",
>>                                         "longName": "Taipei, Taiwan
>> (TPE-Taoyuan Intl.)",
>>                                         "city": "Taipei",
>>                                         "localName": "Taoyuan Intl.",
>>                                         "airportCityState": "Taipei,
>> Taiwan"
>>
>>
>> Is it best practice to store just the coce "TPE" and then look up the
>> name "Taipei (Taoyuan Intl.)" from a relational database? Any alternatives?
>>
>> On Sun, Apr 30, 2017 at 6:34 PM, Jörn Franke <jornfra...@gmail.com>
>> wrote:
>>
>>> Depends on your queries, the data structure etc. generally flat is
>>> better, but if your query filter is on the highest level then you may have
>>> better performance with a nested structure, but it really depends
>>>
>>> > On 30. Apr 2017, at 10:19, Zeming Yu <zemin...@gmail.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > We're building a parquet based data lake. I was under the impression
>>> that flat files are more efficient than deeply nested files (say 3 or 4
>>> levels down). Is that correct?
>>> >
>>> > Thanks,
>>> > Zeming
>>>
>>
>>
>

Re: parquet optimal file structure - flat vs nested

Reply via email to