Jira is always the preferrable approach. Thank You.

On Sat, Jun 3, 2017 at 1:38 PM, Stefán Baxter <[email protected]>
wrote:

> Hi Rahul,
>
> Sure, but can I perhaps get the files to you directly?
>
> Regards,
>  -Stefán
>
> On Sat, Jun 3, 2017 at 8:13 PM, rahul challapalli <
> [email protected]> wrote:
>
> > Can you please raise a jira and attach the required files? I can try to
> > reproduce it.
> >
> > Rahul
> >
> > On Jun 3, 2017 6:19 AM, "Stefán Baxter" <[email protected]>
> wrote:
> >
> > > Hi,
> > >
> > > I have a sample data set (a few million records) that is saved to
> parquet
> > > in 2 ways. A simple file structure with primary types to store
> dimensions
> > > and metrics (String, Double) and a using nested maps (String,String and
> > > String,Double) respectively.
> > >
> > > Querying the data set with the simple types only:
> > >
> > > select roundTimeStamp(s.occurred_at,'PT1H') as `at`,
> sum(metrics_price)
> > as
> > > price, sum(metrics_kwh) as kwh from
> > > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s
> > > group by roundTimeStamp(s.occurred_at,'PT1H')
> > >
> > >
> > > takes: *28.442 *sec. (dev. laptop x 1)
> > >
> > >
> > > Same query against the nested structure:
> > >
> > > select roundTimeStamp(s.occurred_at,'PT1H') as `at`,
> > sum(s.metrics.price)
> > > as price, sum(s.metricss.kwh) as kwh from
> > > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s
> > > group by roundTimeStamp(s.occurred_at,'PT1H')
> > >
> > > takes: *719.810* sec.
> > >
> > > Event counting the number of records takes very, very long if there is
> a
> > > nested structure involved. (select count(*) from)
> > > It does not behave like this on our production servers (1.8) put I have
> > not
> > > run this particular test on them (their performance has never been an
> > > issue)
> > > I have these sample files available if anyone wishes to reproduces this
> > > consistently.
> > > Regards,
> > >  -Stefán
> > >
> >
>

Reply via email to