Ok, the data is a bit sensitive. I'll submit this when I have created a meaningful test set that I can distribute.
- Stefán On Sun, Jun 4, 2017 at 6:54 AM, rahul challapalli < [email protected]> wrote: > Jira is always the preferrable approach. Thank You. > > On Sat, Jun 3, 2017 at 1:38 PM, Stefán Baxter <[email protected]> > wrote: > > > Hi Rahul, > > > > Sure, but can I perhaps get the files to you directly? > > > > Regards, > > -Stefán > > > > On Sat, Jun 3, 2017 at 8:13 PM, rahul challapalli < > > [email protected]> wrote: > > > > > Can you please raise a jira and attach the required files? I can try to > > > reproduce it. > > > > > > Rahul > > > > > > On Jun 3, 2017 6:19 AM, "Stefán Baxter" <[email protected]> > > wrote: > > > > > > > Hi, > > > > > > > > I have a sample data set (a few million records) that is saved to > > parquet > > > > in 2 ways. A simple file structure with primary types to store > > dimensions > > > > and metrics (String, Double) and a using nested maps (String,String > and > > > > String,Double) respectively. > > > > > > > > Querying the data set with the simple types only: > > > > > > > > select roundTimeStamp(s.occurred_at,'PT1H') as `at`, > > sum(metrics_price) > > > as > > > > price, sum(metrics_kwh) as kwh from > > > > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` > as s > > > > group by roundTimeStamp(s.occurred_at,'PT1H') > > > > > > > > > > > > takes: *28.442 *sec. (dev. laptop x 1) > > > > > > > > > > > > Same query against the nested structure: > > > > > > > > select roundTimeStamp(s.occurred_at,'PT1H') as `at`, > > > sum(s.metrics.price) > > > > as price, sum(s.metricss.kwh) as kwh from > > > > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` > as s > > > > group by roundTimeStamp(s.occurred_at,'PT1H') > > > > > > > > takes: *719.810* sec. > > > > > > > > Event counting the number of records takes very, very long if there > is > > a > > > > nested structure involved. (select count(*) from) > > > > It does not behave like this on our production servers (1.8) put I > have > > > not > > > > run this particular test on them (their performance has never been an > > > > issue) > > > > I have these sample files available if anyone wishes to reproduces > this > > > > consistently. > > > > Regards, > > > > -Stefán > > > > > > > > > >
