On 11/08/14 02:52, Timothy Chen wrote:
Hi Luca,
Currently Drill supports the same way MongoDB inserts Json records,
which is each Json object is seperated by newlines,
{ "type": "Feature", "geometry".....}
{ "type": "Feature", "geometry".....}
I see: this is one of the way mongoimport works (the other being with an array of
Objects, hence proper JSON).
It's possible we can extend our options like MongoDB does
(http://zaiste.net/2012/08/importing_json_into_mongodb/)
either expecting it in a array or comma seperated and read the json
files through some added options.
Indeed.
I see Yash already filed a JIRA, feel free to contribute as well.
I think it is worth considering the wider picture here.
A JSON document, at its top level, is either an Array or another type of Object
(well, according to http://tools.ietf.org/html/rfc7158 it can be a value as well,
but this is beside the point I suppose); I think it would be safe to assume Drill
equates an Object (not Arrays) with a tuple, and Arrays as a vector of elements
having the same type.
The problem with this is defining what a tuple is:
1) Shall {"total_rows":2,"offset":0,"rows":[{"id":1}, {"id":2}]} be considered a
tuple, or a table-like structure containing 2 tuples (incidentally, this is what
a query to CouchDB would return) ?
2) Can Arrays be heterogeneous (in JSON nothing prevents that) ?
To simplify things, Drill may adopt a subset of JSON, with homogeneous Arrays, and
accept a finite number of input file formats... but this has to be explicitly stated.
Regards,
Luca Morandini
Data Architect - AURIN project
Melbourne eResearch Group
Department of Computing and Information Systems
University of Melbourne
Tel. +61 03 903 58 380
Skype: lmorandini