What we really need is a list aggregator. The would make this a snap.
On Sep 18, 2017 4:38 PM, "Kunal Khatua" <[email protected]> wrote: > I've been looking at a way to use existing benchmarks converted into a > complex json document. > > Take for example TPCH benchmark, which has PKey-FKey relations. > > So for a JSON output for a query like this: > 0: jdbc:drill:schme=dfs.tpchDri1000> select r.r_NAME, n.n_NAME > . . . . . . . . . . . . . . . . . .> , r.r_REGIONKEY > . . . . . . . . . . . . . . . . . .> , n.n_NATIONKEY > . . . . . . . . . . . . . . . . . .> , n.n_REGIONKEY > . . . . . . . . . . . . . . . . . .> from nation n,region r > . . . . . . . . . . . . . . . . . .> where n.n_regionkey = r.r_regionkey > . . . . . . . . . . . . . . . . . .> order by r.r_NAME, n.n_NAME; > +--------------+-----------------+--------------+----------- > ---+--------------+ > | r_NAME | n_NAME | r_REGIONKEY | n_NATIONKEY | > n_REGIONKEY | > +--------------+-----------------+--------------+----------- > ---+--------------+ > | AFRICA | ALGERIA | 0 | 0 | 0 > | > | AFRICA | ETHIOPIA | 0 | 5 | 0 > | > | AFRICA | KENYA | 0 | 14 | 0 > | > | AFRICA | MOROCCO | 0 | 15 | 0 > | > | AFRICA | MOZAMBIQUE | 0 | 16 | 0 > | > | AMERICA | ARGENTINA | 1 | 1 | 1 > | > | AMERICA | BRAZIL | 1 | 2 | 1 > | > | AMERICA | CANADA | 1 | 3 | 1 > | > | AMERICA | PERU | 1 | 17 | 1 > | > | AMERICA | UNITED STATES | 1 | 24 | 1 > | > | ASIA | CHINA | 2 | 18 | 2 > | > | ASIA | INDIA | 2 | 8 | 2 > | > | ASIA | INDONESIA | 2 | 9 | 2 > | > | ASIA | JAPAN | 2 | 12 | 2 > | > | ASIA | VIETNAM | 2 | 21 | 2 > | > | EUROPE | FRANCE | 3 | 6 | 3 > | > | EUROPE | GERMANY | 3 | 7 | 3 > | > | EUROPE | ROMANIA | 3 | 19 | 3 > | > | EUROPE | RUSSIA | 3 | 22 | 3 > | > | EUROPE | UNITED KINGDOM | 3 | 23 | 3 > | > | MIDDLE EAST | EGYPT | 4 | 4 | 4 > | > | MIDDLE EAST | IRAN | 4 | 10 | 4 > | > | MIDDLE EAST | IRAQ | 4 | 11 | 4 > | > | MIDDLE EAST | JORDAN | 4 | 13 | 4 > | > | MIDDLE EAST | SAUDI ARABIA | 4 | 20 | 4 > | > +--------------+-----------------+--------------+----------- > ---+--------------+ > 25 rows selected (0.519 seconds) > > I'm wondering if I could get, say, 5 documents representing the 5 regions > and the nested structure within that representing the countries. > > Not the best usecase, I agree... but to distil it down to a simple > question, what I'm asking is whether there is a value in having some series > of simple steps that would reverse how that a JSON doc can be "flattened" > to a CSV format. > > It can't be as simple as just using an un-flatten operator, but close > enough. For e.g., I could have the data defined by defining the nesting > based on the ORDER BY operator, so that the final writer can stream through > the output and create the nested document accordingly. > > Just wondering the value of something like this. > > > -----Original Message----- > From: rahul challapalli [mailto:[email protected]] > Sent: Monday, September 18, 2017 4:02 PM > To: dev <[email protected]> > Subject: Re: Convert CSV to nested JSON > > Can you give an example? Converting CSV into nested JSON does not make > sense to me. > > On Mon, Sep 18, 2017 at 3:54 PM, Ted Dunning <[email protected]> > wrote: > > > What is the ultimate purpose here? > > > > > > > > On Mon, Sep 18, 2017 at 3:21 PM, Kunal Khatua <[email protected]> wrote: > > > > > I'm curious about whether there are any implementations of > > > converting CSV to a nested JSON format "automagically". > > > > > > Within Drill, I know that the CTAS route will basically convert each > > > row into a JSON document with depth=1, which is pretty much an obese > > > CSV data format. > > > > > > Is it worth having something like this, or is it too hard a problem > > > that it's best that users explicitly define and write the documents? > > > > > > ~ Kunal > > > > > > > > >
