I thought relational databases with 6 TB of data can be quite expensive? On 1 May 2017 12:56 am, "Muthu Jayakumar" <bablo...@gmail.com> wrote:
> I am not sure if parquet is a good fit for this? This seems more like > filter lookup than an aggregate like query. I am curious to see what others > have to say. > Would it be more efficient if a relational database with the right index > (code field in the above case) to perform more efficiently (with spark that > uses predicate push-down)? > Hope this helps. > > Thanks, > Muthu > > On Sun, Apr 30, 2017 at 1:45 AM, Zeming Yu <zemin...@gmail.com> wrote: > >> Another question: I need to store airport info in a parquet file and >> present it when a user makes a query. >> >> For example: >> >> "airport": { >> "code": "TPE", >> "name": "Taipei (Taoyuan Intl.)", >> "longName": "Taipei, Taiwan >> (TPE-Taoyuan Intl.)", >> "city": "Taipei", >> "localName": "Taoyuan Intl.", >> "airportCityState": "Taipei, >> Taiwan" >> >> >> Is it best practice to store just the coce "TPE" and then look up the >> name "Taipei (Taoyuan Intl.)" from a relational database? Any alternatives? >> >> On Sun, Apr 30, 2017 at 6:34 PM, Jörn Franke <jornfra...@gmail.com> >> wrote: >> >>> Depends on your queries, the data structure etc. generally flat is >>> better, but if your query filter is on the highest level then you may have >>> better performance with a nested structure, but it really depends >>> >>> > On 30. Apr 2017, at 10:19, Zeming Yu <zemin...@gmail.com> wrote: >>> > >>> > Hi, >>> > >>> > We're building a parquet based data lake. I was under the impression >>> that flat files are more efficient than deeply nested files (say 3 or 4 >>> levels down). Is that correct? >>> > >>> > Thanks, >>> > Zeming >>> >> >> >