Hi Pradeeban, Can you post the detailed error message?
First set the option: > SET `exec.errors.verbose` = true; And then run the query. The detailed output will point us to where the error occurred. Thank you, Sudheesh > On Sep 12, 2016, at 10:47 AM, Pradeeban Kathiravelu <kk.pradee...@gmail.com> > wrote: > > Hi, > I have a complex json data stored in Mongo. > > The data has nested arrays as shown below ****. > > here when I try to access the geometry field in select queries, the Drill > query fails. > > select camic._id, camic.type, camic.parent_id, camic.randval, > camic.creation_date, camic.object_type, camic.x, camic.y, camic.normalized, > camic.bbox, camic.geometry, camic.footprint, camic.properties, > camic.provenance, camic.submit_date from mongo.CAMICROSCOPE.`testUAIM2` as > camic WHERE ((camic.provenance.image.case_id = 'TCGA-02-0001-01Z-00-DX1') > AND (camic.provenance.analysis.execution_id = 'tammy-test:7') AND > (camic.footprint >= 800) AND (camic.x >= 0) AND (camic.x <=1) AND (camic.y >> = 0) AND (camic.y <= 1)); > > would fail with a below error: > > > > > > *Error: SYSTEM ERROR: NullPointerExceptionFragment 0:0[Error Id: > 8cd950af-91fa-4cf8-865b-265f227c8e87 on llovizna:31010] (state=,code=0)* > > However, after removing the geometry from the select query, it would work > just fine. > > select camic._id, camic.type, camic.parent_id, camic.randval, > camic.creation_date, camic.object_type, camic.x, camic.y, camic.normalized, > camic.bbox, camic.footprint, camic.properties, camic.provenance, > camic.submit_date from mongo.CAMICROSCOPE.`testUAIM2` as camic WHERE > ((camic.provenance.image.case_id = 'TCGA-02-0001-01Z-00-DX1') AND > (camic.provenance.analysis.execution_id = 'tammy-test:7') AND > (camic.footprint >= 800) AND (camic.x >= 0) AND (camic.x <=1) AND (camic.y >> = 0) AND (camic.y <= 1)); > > > I know this is due to the array in the output. I am also aware of the > commonly suggested options. > 1. Using the array indexes in the select query: This is *impractical*. I do > not know how many elements I would have in this geojson - the coordinates. > It may be millions or as low as 3. > > 2. Flatten keyword: I am using Drill on top of Mongo - and finding an > interesting case where Drill outperforms certain queries in a distributed > execution than just using Mongo. Using Flatten basically kills all the > performance benefits I have with Drill otherwise. Flatten is just plain > *expensive* operation for the scale of my data (around 48 GB. But I can > split them into a few GB each). > > Now given that I have explained why I cannot use the commonly suggested > options to deal with the complex Drill queries involving arrays, what are > the other alternatives? > > I am thinking of using Protobufs with Drill to serialize the nested arrays. > Has anyone else tried it before, and any pointers, or anyother suggestions > on this overall requirement of querying the nested arrays? > > Thank you. > Regards, > Pradeeban. > > > **** The sample data in Mongo (data anonymized). >> db.testUAIM2.findOne() > { > "_id" : ObjectId("22"), > "bbox" : [ > 0.11, > 0.33, > 0.44, > 0.44 > ], > "*geometry*" : { > "type" : "Polygon", > "coordinates" : [ > [ > [ > 0.04272915795445442, > 0.9368849396705627 > ], > [ > 0.04272915795445442, > 0.9369083046913147 > ], > [ > 0.042739588767290115, > 0.9369083046913147 > ], > [ > 0.042739588767290115, > 0.9369550347328186 > ], > [ > 0.04275001958012581, > 0.9369550347328186 > ], > [ > 0.04275001958012581, > 0.9370251893997192 > ], > [ > 0.042760446667671204, > 0.9370251893997192 > ], > [ > 0.042760446667671204, > 0.9371420741081238 > ], > [ > 0.04275001958012581, > 0.9371420741081238 > ], > [ > 0.04275001958012581, > 0.9372121691703796 > ], > [ > 0.042739588767290115, > 0.9372121691703796 > ], > [ > 0.042739588767290115, > 0.9372823238372803 > ], > [ > 0.04272915795445442, > 0.9372823238372803 > ], > [ > 0.04272915795445442, > 0.9373056888580322 > ], > [ > 0.04272915795445442, > 0.9368849396705627 > ] > ] > ] > }, > "footprint" : 5, > "properties" : { > "scalar_features" : [ > { > "nv" : [ > { > "name" : "f", > "value" : 16.7036895751953 > } > ] > } > ] > }} > > > -- > Pradeeban Kathiravelu. > PhD Researcher, Erasmus Mundus Joint Doctorate in Distributed Computing, > INESC-ID Lisboa / Instituto Superior Técnico, Universidade de Lisboa, > Portugal. > Biomedical Informatics Software Engineer, Emory University School of > Medicine. > > Blog: [Llovizna] http://kkpradeeban.blogspot.com/ > LinkedIn: www.linkedin.com/pub/kathiravelu-pradeeban/12/b6a/b03